Swiss table 源码分析

背景

Google Abseil 实现了一个高性能的哈希表 Swiss table。至于性能有多高可以参考具体 benchmark，简单来说，除了迭代遍历操作不算出众以外，其它都很不错。它的设计亮点是使用 SIMD 加速的查找操作、缓存友好的线性探测，以及（相比标准库主流实现）更加紧凑的内存布局，这些你也可以参考官方给的设计笔记。剩下的就不废话了，看代码吧。

NOTES:

代码以 Abseil LTS branch, July 2024 版本为准。
Swiss table 的实现位于 absl::flat_hash_map。

索引页

由于我的代码阅读习惯倾向于深度优先（见目录页），作为文章来说肯定是阅读不友好的。

所以这里先做一个索引页，整理出目录页不方便查看的位置：

内存布局：CommonFields、默认构造、resize_impl、slot type 和 slot 初始化。
SIMD 操作：Group、ctrl bytes 和 prepare insert (non-SOO)。
探测算法：probe。
查找相关：find、erase 和 prepare（SOO/non-SOO）insert，多数操作会以查找为基础。
SOO 优化：HeapOrSoo 和 prepare insert SOO。
trivially relocatable 优化：transfer 和 resize_impl。
prefetch 优化：prefetch 和 find。
特定上下文优化：很多，比如 prepare insert after SOO 和 prepare insert non-SOO。

内存布局

类型关系

首先需要了解 absl::flat_hash_map 内部类的关系才能梳理内存布局。

template <class K, class V, class Hash = DefaultHashContainerHash<K>,
          class Eq = DefaultHashContainerEq<K>,
          class Allocator = std::allocator<std::pair<const K, V>>>
class ABSL_ATTRIBUTE_OWNER flat_hash_map
    : public absl::container_internal::raw_hash_map<
          absl::container_internal::FlatHashMapPolicy<K, V>, Hash, Eq,
          Allocator> {
  using Base = typename flat_hash_map::raw_hash_map;
  flat_hash_map() {}
  using Base::Base;
  using Base::begin;
  using Base::end;
  // ...
};

template <class Policy, class Hash, class Eq, class Alloc>
class raw_hash_map : public raw_hash_set<Policy, Hash, Eq, Alloc> {
  // ...
};

template <class Policy, class Hash, class Eq, class Alloc>
class raw_hash_set {
  // Bundle together CommonFields plus other objects which might be empty.
  // CompressedTuple will ensure that sizeof is not affected by any of the empty
  // fields that occur after CommonFields.
  absl::container_internal::CompressedTuple<CommonFields, hasher, key_equal,
                                            allocator_type>
      settings_{CommonFields::CreateDefault<SooEnabled()>(), hasher{},
                key_equal{}, allocator_type{}};
};

整个哈希表唯一的成员是继承于 raw_hash_set 基类的 settings_。从注释可以得知，这个唯一的成员作为 CompressedTuple 类型使用是为了空基类优化（EBO），因为容器的 hasher、key_equal 和 allocator_type 都可以是无状态的类型。另外需要关心的是 CommonFields 类型，这里才是常规意义上的成员。但是 CommonFields 类型的对象构造是以一个含 SooEnabled 标记的工厂函数去返回给 settings_，这也暗示了小对象优化（SOO）。

CompressedTuple

template <typename... Ts>
class ABSL_INTERNAL_COMPRESSED_TUPLE_DECLSPEC CompressedTuple
    : private internal_compressed_tuple::CompressedTupleImpl<
          CompressedTuple<Ts...>, absl::index_sequence_for<Ts...>,
          internal_compressed_tuple::ShouldAnyUseBase<Ts...>()> {
 private:
  template <int I>
  using ElemT = internal_compressed_tuple::ElemT<CompressedTuple, I>;

  template <int I>
  using StorageT = internal_compressed_tuple::Storage<ElemT<I>, I>;

  explicit constexpr CompressedTuple(const Ts&... base)
      : CompressedTuple::CompressedTupleImpl(absl::in_place, base...) {}

  template <int I>
  constexpr ElemT<I>& get() & {
    return StorageT<I>::get();
  }

  // ...
};

template <typename D, typename I, bool ShouldAnyUseBase>
struct ABSL_INTERNAL_COMPRESSED_TUPLE_DECLSPEC CompressedTupleImpl;

template <typename... Ts, size_t... I, bool ShouldAnyUseBase>
struct ABSL_INTERNAL_COMPRESSED_TUPLE_DECLSPEC CompressedTupleImpl<
    CompressedTuple<Ts...>, absl::index_sequence<I...>, ShouldAnyUseBase>
    : uses_inheritance,
      Storage<Ts, std::integral_constant<size_t, I>::value>... {
  constexpr CompressedTupleImpl() = default;
  template <typename... Vs>
  explicit constexpr CompressedTupleImpl(absl::in_place_t, Vs&&... args)
      : Storage<Ts, I>(absl::in_place, std::forward<Vs>(args))... {}
  friend CompressedTuple<Ts...>;
};

template <typename... Ts, size_t... I>
struct ABSL_INTERNAL_COMPRESSED_TUPLE_DECLSPEC CompressedTupleImpl<
    CompressedTuple<Ts...>, absl::index_sequence<I...>, false>
    : Storage<Ts, std::integral_constant<size_t, I>::value, false>... {
  constexpr CompressedTupleImpl() = default;
  template <typename... Vs>
  explicit constexpr CompressedTupleImpl(absl::in_place_t, Vs&&... args)
      : Storage<Ts, I, false>(absl::in_place, std::forward<Vs>(args))... {}
  friend CompressedTuple<Ts...>;
};

// The storage class provides two specializations:
//  - For empty classes, it stores T as a base class.
//  - For everything else, it stores T as a member.
template <typename T, size_t I, bool UseBase = ShouldUseBase<T>()>
struct Storage {
  T value;
  constexpr Storage() = default;
  template <typename V>
  explicit constexpr Storage(absl::in_place_t, V&& v)
      : value(std::forward<V>(v)) {}
  constexpr const T& get() const& { return value; }
  constexpr T& get() & { return value; }
  constexpr const T&& get() const&& { return std::move(*this).value; }
  constexpr T&& get() && { return std::move(*this).value; }
};

template <typename T, size_t I>
struct ABSL_INTERNAL_COMPRESSED_TUPLE_DECLSPEC Storage<T, I, true> : T {
  constexpr Storage() = default;

  template <typename V>
  explicit constexpr Storage(absl::in_place_t, V&& v) : T(std::forward<V>(v)) {}

  constexpr const T& get() const& { return *this; }
  constexpr T& get() & { return *this; }
  constexpr const T&& get() const&& { return std::move(*this); }
  constexpr T&& get() && { return std::move(*this); }
};

其实整个类的动机在前面已经说明了，也没有继续分析的必要。简单来说，CompressedTuple 通过 CompressedTupleImpl -> Storage -> T 的继承链路（类型推导 → 存储方式 → 实质类型）处理空类体积问题，然后访问成员是通过 index_sequence 编译时确定，因此使用方式也和 std::tuple 差不多（get<>()）。

CommonFields

// CommonFields hold the fields in raw_hash_set that do not depend
// on template parameters. This allows us to conveniently pass all
// of this state to helper functions as a single argument.
class CommonFields : public CommonFieldsGenerationInfo {
 public:
  CommonFields() : capacity_(0), size_(0), heap_or_soo_(EmptyGroup()) {}
  explicit CommonFields(soo_tag_t) : capacity_(SooCapacity()), size_(0) {}
  explicit CommonFields(full_soo_tag_t)
      : capacity_(SooCapacity()), size_(size_t{1} << HasInfozShift()) {}

  // ...

  // The number of slots in the backing array. This is always 2^N-1 for an
  // integer N. NOTE: we tried experimenting with compressing the capacity and
  // storing it together with size_: (a) using 6 bits to store the corresponding
  // power (N in 2^N-1), and (b) storing 2^N as the most significant bit of
  // size_ and storing size in the low bits. Both of these experiments were
  // regressions, presumably because we need capacity to do find operations.
  size_t capacity_;

  // The size and also has one bit that stores whether we have infoz.
  size_t size_;

  // Either the control/slots pointers or the SOO slot.
  HeapOrSoo heap_or_soo_;
};

从这里可以看出 CommonFields = [capacity, size, ?] 的内存布局，最后一个成员 heap_or_soo_ 表意模糊是因为又套了一层工程裹脚布（保命声明：作为一个基础库这是有必要的），先剧透一下：? = [control, slot_array]，这个注释也有提到。

说到注释，这里 Google 记录了一段有趣的负优化历史。他们尝试将 capacity_ 和 size_t 压缩到同一个 size_t 类型，毕竟对于二的幂使用 6 位就足够表示 N 了。但是实测下来反而是负反馈。

NOTES:

CommonFieldsGenerationInfo 是仅调试时使用，这里忽略。
这里还提到 infoz，这也是调试（统计）目的，忽略。
这里的布局并不完善，可以参考后续的 InitializeSlots 流程。
contrl 和 slot_array 的关系见上图（来源：abseil 文档）。

HeapOrSoo

// Manages the backing array pointers or the SOO slot. When raw_hash_set::is_soo
// is true, the SOO slot is stored in `soo_data`. Otherwise, we use `heap`.
union HeapOrSoo {
  HeapOrSoo() = default;
  explicit HeapOrSoo(ctrl_t* c) : heap(c) {}

  ctrl_t*& control() {
    ABSL_SWISSTABLE_IGNORE_UNINITIALIZED_RETURN(heap.control);
  }
  ctrl_t* control() const {
    ABSL_SWISSTABLE_IGNORE_UNINITIALIZED_RETURN(heap.control);
  }
  MaybeInitializedPtr& slot_array() {
    ABSL_SWISSTABLE_IGNORE_UNINITIALIZED_RETURN(heap.slot_array);
  }
  MaybeInitializedPtr slot_array() const {
    ABSL_SWISSTABLE_IGNORE_UNINITIALIZED_RETURN(heap.slot_array);
  }
  void* get_soo_data() {
    ABSL_SWISSTABLE_IGNORE_UNINITIALIZED_RETURN(soo_data);
  }
  const void* get_soo_data() const {
    ABSL_SWISSTABLE_IGNORE_UNINITIALIZED_RETURN(soo_data);
  }

  HeapPtrs heap;
  unsigned char soo_data[sizeof(HeapPtrs)];
};

struct HeapPtrs {
  HeapPtrs() = default;
  explicit HeapPtrs(ctrl_t* c) : control(c) {}

  // The control bytes (and, also, a pointer near to the base of the backing
  // array).
  //
  // This contains `capacity + 1 + NumClonedBytes()` entries, even
  // when the table is empty (hence EmptyGroup).
  //
  // Note that growth_info is stored immediately before this pointer.
  // May be uninitialized for SOO tables.
  ctrl_t* control;

  // The beginning of the slots, located at `SlotOffset()` bytes after
  // `control`. May be uninitialized for empty tables.
  // Note: we can't use `slots` because Qt defines "slots" as a macro.
  MaybeInitializedPtr slot_array;
};

注意 HeapOrSoo 是一个 union，因此是在两个指针大小内（control + slot_array）尝试做 SOO。后面再具体看怎么用。

初始化

default

从前面的构造函数可以看出，Base::Base 基本什么都不干，唯独需要 CommonFiled 构造。

class CommonFields /* ... */ {
  // 这里传递 raw_hash_set::SooEnabled()
  template <bool kSooEnabled>
  static CommonFields CreateDefault() {
    return kSooEnabled ? CommonFields{soo_tag_t{}} : CommonFields{};
  }
  // ...
};

class raw_hash_set /* ... */ {
  constexpr static bool SooEnabled() {
    // 首个条件默认为 true
    return PolicyTraits::soo_enabled() &&
           // 只与类型有关（运行时无关）
           sizeof(slot_type) <= sizeof(HeapOrSoo) &&
           alignof(slot_type) <= alignof(HeapOrSoo);
  }
  // ...
};

class CommonFields /* ... */ {
  explicit CommonFields(soo_tag_t) : capacity_(SooCapacity()), size_(0) {}
  // ...
};

// We only allow a maximum of 1 SOO element, which makes the implementation
// much simpler. Complications with multiple SOO elements include:
// - Satisfying the guarantee that erasing one element doesn't invalidate
//   iterators to other elements means we would probably need actual SOO
//   control bytes.
// - In order to prevent user code from depending on iteration order for small
//   tables, we would need to randomize the iteration order somehow.
constexpr size_t SooCapacity() { return 1; }

总之，由于构造过程默认使用了 SOO，而 SOO 由于复杂度只可能是规模为 1，所以 capacity_ = 1, size_ = 0。

initializer list

absl::flat_hash_map<int, int> example { {998, 244} };

那如果是通过初始化列表等方式去构造的话，会怎样？

  // Instead of accepting std::initializer_list<value_type> as the first
  // argument like std::unordered_set<value_type> does, we have two overloads
  // that accept std::initializer_list<T> and std::initializer_list<init_type>.
  // This is advantageous for performance.
  //
  //   // Turns {"abc", "def"} into std::initializer_list<std::string>, then
  //   // copies the strings into the set.
  //   std::unordered_set<std::string> s = {"abc", "def"};
  //
  //   // Turns {"abc", "def"} into std::initializer_list<const char*>, then
  //   // copies the strings into the set.
  //   absl::flat_hash_set<std::string> s = {"abc", "def"};
  //
  // The same trick is used in insert().
  //
  // The enabler is necessary to prevent this constructor from triggering where
  // the copy constructor is meant to be called.
  //
  //   absl::flat_hash_set<int> a, b{a};
  //
  // RequiresNotInit<T> is a workaround for gcc prior to 7.1.
  // 默认不走这里
  template <class T, RequiresNotInit<T> = 0, RequiresInsertable<T> = 0>
  raw_hash_set(std::initializer_list<T> init, size_t bucket_count = 0,
               const hasher& hash = hasher(), const key_equal& eq = key_equal(),
               const allocator_type& alloc = allocator_type())
      : raw_hash_set(init.begin(), init.end(), bucket_count, hash, eq, alloc) {}

  // 走这里，init_type = std::pair</*non const*/ key_type, mapped_type>
  raw_hash_set(std::initializer_list<init_type> init, size_t bucket_count = 0,
               const hasher& hash = hasher(), const key_equal& eq = key_equal(),
               const allocator_type& alloc = allocator_type())
      : raw_hash_set(init.begin(), init.end(), bucket_count, hash, eq, alloc) {}

  // 转换为迭代器风格
  template <class InputIter>
  raw_hash_set(InputIter first, InputIter last, size_t bucket_count = 0,
               const hasher& hash = hasher(), const key_equal& eq = key_equal(),
               const allocator_type& alloc = allocator_type())
      : raw_hash_set(SelectBucketCountForIterRange(first, last, bucket_count),
                     hash, eq, alloc) {
    insert(first, last);
  }

template <class InputIter>
size_t SelectBucketCountForIterRange(InputIter first, InputIter last,
                                     size_t bucket_count) {
  if (bucket_count != 0) {
    return bucket_count;
  }
  using InputIterCategory =
      typename std::iterator_traits<InputIter>::iterator_category;
  if (std::is_base_of<std::random_access_iterator_tag,
                      InputIterCategory>::value) {
    return GrowthToLowerboundCapacity(
        static_cast<size_t>(std::distance(first, last)));
  }
  return 0;
}

// Given `growth`, "unapplies" the load factor to find how large the capacity
// should be to stay within the load factor.
//
// This might not be a valid capacity and `NormalizeCapacity()` should be
// called on this.
inline size_t GrowthToLowerboundCapacity(size_t growth) {
  // `growth*8/7`
  // kWidth 表示 slots per group
  // SSE2 的 kWidth == 16，不符合条件
  if (Group::kWidth == 8 && growth == 7) {
    // x+(x-1)/7 does not work when x==7.
    return 8;
  }
  // 允许比 growth 多一点，反向使用 load factor (7/8)
  return growth + static_cast<size_t>((static_cast<int64_t>(growth) - 1) / 7);
}

  ABSL_ATTRIBUTE_NOINLINE explicit raw_hash_set(
      size_t bucket_count, const hasher& hash = hasher(),
      const key_equal& eq = key_equal(),
      const allocator_type& alloc = allocator_type())
      : settings_(CommonFields::CreateDefault<SooEnabled()>(), hash, eq,
                  alloc) {
    if (bucket_count > (SooEnabled() ? SooCapacity() : 0)) {
      resize(NormalizeCapacity(bucket_count));
    }
  }

在 SOO 的情况下，我们这里的示例为 bucket_count == 1，因此构造并不需要 resize()，只需 insert()。

NOTE: 注释里提到的初始化列表为 std::initializer_list<init_type> 则是设计笔记中提到的延迟构造。具体的 init_type 定义在这里，约等于 std::pair<Key, Value>，注意这里没有标准库要求的 const Key。

插入操作

insert

  template <class InputIt>
  void insert(InputIt first, InputIt last) {
    for (; first != last; ++first) emplace(*first);
  }

  // This overload kicks in if we can deduce the key from args. This enables us
  // to avoid constructing value_type if an entry with the same key already
  // exists.
  //
  // For example:
  //
  //   flat_hash_map<std::string, std::string> m = { {"abc", "def"} };
  //   // Creates no std::string copies and makes no heap allocations.
  //   m.emplace("abc", "xyz");
  template <class... Args, typename std::enable_if<
                               IsDecomposable<Args...>::value, int>::type = 0>
  std::pair<iterator, bool> emplace(Args&&... args)
      ABSL_ATTRIBUTE_LIFETIME_BOUND {
    // EmplaceDecomposable 只是一个 functor 类型，它的构造没有副作用
    return PolicyTraits::apply(EmplaceDecomposable{*this},
                               std::forward<Args>(args)...);
  }

  // This overload kicks in if we cannot deduce the key from args. It constructs
  // value_type unconditionally and then either moves it into the table or
  // destroys.
  template <class... Args, typename std::enable_if<
                               !IsDecomposable<Args...>::value, int>::type = 0>
  std::pair<iterator, bool> emplace(Args&&... args)
      ABSL_ATTRIBUTE_LIFETIME_BOUND {
    alignas(slot_type) unsigned char raw[sizeof(slot_type)];
    slot_type* slot = to_slot(&raw);

    construct(slot, std::forward<Args>(args)...);
    const auto& elem = PolicyTraits::element(slot);
    return PolicyTraits::apply(InsertSlot<true>{*this, std::move(*slot)}, elem);
  }

这里需要 PolicyTraits::apply 和 EmplaceDecomposable。

Policy

// Policy: a policy defines how to perform different operations on
// the slots of the hashtable (see hash_policy_traits.h for the full interface
// of policy).
template <class Policy, class Hash, class Eq, class Alloc>
class raw_hash_set {
  using PolicyTraits = hash_policy_traits<Policy>;
  // ...
};

template <class Policy, class Hash, class Eq, class Alloc>
class raw_hash_map : public raw_hash_set<Policy, Hash, Eq, Alloc> {
  // ...
};

class ABSL_ATTRIBUTE_OWNER flat_hash_map
    : public absl::container_internal::raw_hash_map<
          // 这里是具体的 Policy
          absl::container_internal::FlatHashMapPolicy<K, V>, Hash, Eq,
          Allocator> { /*...*/ };

由于 Abseil 库具有多个 map/set 实现（比如为了迎合标准库的指针稳定性要求，引入 node 变种），因此这里使用 Policy 抽离了一些不同的机制。同时这也是 map 可以继承 set 的原因。

// Defines how slots are initialized/destroyed/moved.
template <class Policy, class = void>
struct hash_policy_traits : common_policy_traits<Policy> {
  // Provides generalized access to the key for elements, both for elements in
  // the table and for elements that have not yet been inserted (or even
  // constructed).  We would like an API that allows us to say: `key(args...)`
  // but we cannot do that for all cases, so we use this more general API that
  // can be used for many things, including the following:
  //
  //   - Given an element in a table, get its key.
  //   - Given an element initializer, get its key.
  //   - Given `emplace()` arguments, get the element key.
  //
  // Implementations of this must adhere to a very strict technical
  // specification around aliasing and consuming arguments:
  //
  // Let `value_type` be the result type of `element()` without ref- and
  // cv-qualifiers. The first argument is a functor, the rest are constructor
  // arguments for `value_type`. Returns `std::forward<F>(f)(k, xs...)`, where
  // `k` is the element key, and `xs...` are the new constructor arguments for
  // `value_type`. It's allowed for `k` to alias `xs...`, and for both to alias
  // `ts...`. The key won't be touched once `xs...` are used to construct an
  // element; `ts...` won't be touched at all, which allows `apply()` to consume
  // any rvalues among them.
  //
  // If `value_type` is constructible from `Ts&&...`, `Policy::apply()` must not
  // trigger a hard compile error unless it originates from `f`. In other words,
  // `Policy::apply()` must be SFINAE-friendly. If `value_type` is not
  // constructible from `Ts&&...`, either SFINAE or a hard compile error is OK.
  //
  // If `Ts...` is `[cv] value_type[&]` or `[cv] init_type[&]`,
  // `Policy::apply()` must work. A compile error is not allowed, SFINAE or not.
  template <class F, class... Ts, class P = Policy>
  static auto apply(F&& f, Ts&&... ts)
      -> decltype(P::apply(std::forward<F>(f), std::forward<Ts>(ts)...)) {
    return P::apply(std::forward<F>(f), std::forward<Ts>(ts)...);
  }
  // ...
};

template <class K, class V>
struct FlatHashMapPolicy {
  template <class F, class... Args>
  static decltype(absl::container_internal::DecomposePair(
      std::declval<F>(), std::declval<Args>()...))
  apply(F&& f, Args&&... args) {
    return absl::container_internal::DecomposePair(std::forward<F>(f),
                                                   std::forward<Args>(args)...);
  }
  // ...
};

// A helper function for implementing apply() in map policies.
template <class F, class... Args>
auto DecomposePair(F&& f, Args&&... args)
    -> decltype(memory_internal::DecomposePairImpl(
        std::forward<F>(f), PairArgs(std::forward<Args>(args)...))) {
  return memory_internal::DecomposePairImpl(
      std::forward<F>(f), PairArgs(std::forward<Args>(args)...));
}

// Given arguments of an std::pair's constructor, PairArgs() returns a pair of
// tuples with references to the passed arguments. The tuples contain
// constructor arguments for the first and the second elements of the pair.
//
// The following two snippets are equivalent.
//
// 1. std::pair<F, S> p(args...);
//
// 2. auto a = PairArgs(args...);
//    std::pair<F, S> p(std::piecewise_construct,
//                      std::move(a.first), std::move(a.second));
inline std::pair<std::tuple<>, std::tuple<>> PairArgs() { return {}; }
template <class F, class S>
std::pair<std::tuple<F&&>, std::tuple<S&&>> PairArgs(F&& f, S&& s) {
  return {std::piecewise_construct, std::forward_as_tuple(std::forward<F>(f)),
          std::forward_as_tuple(std::forward<S>(s))};
}
template <class F, class S>
std::pair<std::tuple<const F&>, std::tuple<const S&>> PairArgs(
    const std::pair<F, S>& p) {
  return PairArgs(p.first, p.second);
}
template <class F, class S>
std::pair<std::tuple<F&&>, std::tuple<S&&>> PairArgs(std::pair<F, S>&& p) {
  return PairArgs(std::forward<F>(p.first), std::forward<S>(p.second));
}
template <class F, class S>
auto PairArgs(std::piecewise_construct_t, F&& f, S&& s)
    -> decltype(std::make_pair(memory_internal::TupleRef(std::forward<F>(f)),
                               memory_internal::TupleRef(std::forward<S>(s)))) {
  return std::make_pair(memory_internal::TupleRef(std::forward<F>(f)),
                        memory_internal::TupleRef(std::forward<S>(s)));
}

template <class F, class K, class V>
decltype(std::declval<F>()(std::declval<const K&>(), std::piecewise_construct,
                           std::declval<std::tuple<K>>(), std::declval<V>()))
DecomposePairImpl(F&& f, std::pair<std::tuple<K>, V> p) {
  const auto& key = std::get<0>(p.first);
  return std::forward<F>(f)(key, std::piecewise_construct, std::move(p.first),
                            std::move(p.second));
}

policy 看着有点啰嗦，忽略转发的话 apply(f, args...) 就是 f(args...)。然后作为 map policy，它还有必要从 args... 中分离出 key 和其它参数（注意 init_type 的定义，必须要手动分离），因此又写了一大堆 pair 和 tuple 的脏工作。总之它希望 map 特化 apply 能达成 f(key, args...) 的效果。

NOTE: 没细看，翻一半就转用调试器拿结果反推。

Decomposable

  struct EmplaceDecomposable {
    template <class K, class... Args>
    std::pair<iterator, bool> operator()(const K& key, Args&&... args) const {
      auto res = s.find_or_prepare_insert(key);
      if (res.second) {
        s.emplace_at(res.first, std::forward<Args>(args)...);
      }
      return res;
    }
    raw_hash_set& s;
  };

目前上下文的 apply 具体操作是 EmplaceDecomposable，字面意思是查找对应 key 然后 emplace_at。

slot

  // Constructs the value in the space pointed by the iterator. This only works
  // after an unsuccessful find_or_prepare_insert() and before any other
  // modifications happen in the raw_hash_set.
  //
  // PRECONDITION: iter was returned from find_or_prepare_insert(k), where k is
  // the key decomposed from `forward<Args>(args)...`, and the bool returned by
  // find_or_prepare_insert(k) was true.
  // POSTCONDITION: *m.iterator_at(i) == value_type(forward<Args>(args)...).
  template <class... Args>
  void emplace_at(iterator iter, Args&&... args) {
    construct(iter.slot(), std::forward<Args>(args)...);

    assert(PolicyTraits::apply(FindElement{*this}, *iter) == iter &&
           "constructed value does not match the lookup key");
  }

查找过程比较复杂，那就先跳过。假设我们找到了某个位置（得到了 iterator 迭代器），后面就做 construct()。整个过程非常简单，但是需要注意迭代器和 slot() 成员函数的复杂性。

  // 简化的迭代器
  class iterator : private /* ... */ {
    ctrl_t* control() const { return ctrl_; }
    slot_type* slot() const { return slot_; }

    // We use EmptyGroup() for default-constructed iterators so that they can
    // be distinguished from end iterators, which have nullptr ctrl_.
    ctrl_t* ctrl_ = EmptyGroup();
    // To avoid uninitialized member warnings, put slot_ in an anonymous union.
    // The member is not initialized on singleton and end iterators.
    union {
      slot_type* slot_;
    };
  };

迭代器本身使用了 union 的技巧了提供合法未初始化的 slot_ 成员，注意它的类型别名 slot_type*。

using slot_type = map_slot_type<K, V>;

// The internal storage type for key-value containers like flat_hash_map.
//
// It is convenient for the value_type of a flat_hash_map<K, V> to be
// pair<const K, V>; the "const K" prevents accidental modification of the key
// when dealing with the reference returned from find() and similar methods.
// However, this creates other problems; we want to be able to emplace(K, V)
// efficiently with move operations, and similarly be able to move a
// pair<K, V> in insert().
//
// The solution is this union, which aliases the const and non-const versions
// of the pair. This also allows flat_hash_map<const K, V> to work, even though
// that has the same efficiency issues with move in emplace() and insert() -
// but people do it anyway.
//
// If kMutableKeys is false, only the value member can be accessed.
//
// If kMutableKeys is true, key can be accessed through all slots while value
// and mutable_value must be accessed only via INITIALIZED slots. Slots are
// created and destroyed via mutable_value so that the key can be moved later.
//
// Accessing one of the union fields while the other is active is safe as
// long as they are layout-compatible, which is guaranteed by the definition of
// kMutableKeys. For C++11, the relevant section of the standard is
// https://timsong-cpp.github.io/cppwp/n3337/class.mem#19 (9.2.19)
template <class K, class V>
union map_slot_type {
  map_slot_type() {}
  ~map_slot_type() = delete;
  using value_type = std::pair<const K, V>;
  using mutable_value_type =
      std::pair<absl::remove_const_t<K>, absl::remove_const_t<V>>;

  value_type value;
  mutable_value_type mutable_value;
  absl::remove_const_t<K> key;
};

slot_type 是设计笔记中所提到的优化，也就是内部使用 mutable Key，外部表现为 const Key。注释提到通过 union 访问活跃对象以外的字段是安全的，不会违背严格别名规则。我已经忘掉这些规则了，就当它说得对。

construct

  template <typename... Args>
  inline void construct(slot_type* slot, Args&&... args) {
    PolicyTraits::construct(&alloc_ref(), slot, std::forward<Args>(args)...);
  }

  // PRECONDITION: `slot` is UNINITIALIZED
  // POSTCONDITION: `slot` is INITIALIZED
  template <class Alloc, class... Args>
  static void construct(Alloc* alloc, slot_type* slot, Args&&... args) {
    Policy::construct(alloc, slot, std::forward<Args>(args)...);
  }

  template <class Allocator, class... Args>
  static void construct(Allocator* alloc, slot_type* slot, Args&&... args) {
    slot_policy::construct(alloc, slot, std::forward<Args>(args)...);
  }

  // If pair<const K, V> and pair<K, V> are layout-compatible, we can accept one
  // or the other via slot_type. We are also free to access the key via
  // slot_type::key in this case.
  using kMutableKeys = memory_internal::IsLayoutCompatible<K, V>;

  template <class Allocator, class... Args>
  static void construct(Allocator* alloc, slot_type* slot, Args&&... args) {
    emplace(slot);
    if (kMutableKeys::value) {
      // 这里 absl 是直接 using std::... 实现（历史原因保留命名）
      absl::allocator_traits<Allocator>::construct(*alloc, &slot->mutable_value,
                                                   std::forward<Args>(args)...);
    } else {
      absl::allocator_traits<Allocator>::construct(*alloc, &slot->value,
                                                   std::forward<Args>(args)...);
    }
  }

  // 为了处理严格别名
  static void emplace(slot_type* slot) {
    // The construction of union doesn't do anything at runtime but it allows us
    // to access its members without violating aliasing rules.
    new (slot) slot_type;
  }

总之，insert() 流程找到 iterator.slot() 后，后续 construct() 就是一顿 placement new。

prepare insert

  // Attempts to find `key` in the table; if it isn't found, returns an iterator
  // where the value can be inserted into, with the control byte already set to
  // `key`'s H2. Returns a bool indicating whether an insertion can take place.
  template <class K>
  std::pair<iterator, bool> find_or_prepare_insert(const K& key) {
    AssertOnFind(key);
    // 走这里
    // is_soo() 要求当前 capacity() 不超过 SOO 范围
    if (is_soo()) return find_or_prepare_insert_soo(key);
    // 这个留到后面再解释
    return find_or_prepare_insert_non_soo(key);
  }

续上前面忽略的查找操作：在实际插入前需要查找对应的 key 从而构造出对应的 iterator。

这里区分两种情况，SOO 和 non-SOO。注意这种查找操作是为插入操作定制的，而常规查找接口为 find()。后续分两章来解释这种特殊的查找操作，为了避免歧义，暂称为预插入操作。

预插入（SOO）

  template <class K>
  std::pair<iterator, bool> find_or_prepare_insert_soo(const K& key) {
    // 空容器的首次执行
    if (empty()) {
      const HashtablezInfoHandle infoz = try_sample_soo();
      // 恒为 false
      if (infoz.IsSampled()) {
        resize_with_soo_infoz(infoz);
      } else {
        // 表示 SOO 已填充，size() = 1。后续 empty() 不再符合条件
        // 注意 size_ 成员和 size() 函数并不相等，后者才算外部接口
        common().set_full_soo();
        // 返回这里
        return {soo_iterator(), true};
      }
    // 插入后第二次执行，size() == 1
    } else if (PolicyTraits::apply(EqualElement<K>{key, eq_ref()},
                                   PolicyTraits::element(soo_slot()))) {
      // 如果有匹配（已存在）的 key
      return {soo_iterator(), false};
    // SOO 没有匹配
    } else {
      // 这是预插入阶段，因此没有位置就得扩容（扩容后不再 SOO）
      // NextCapacity(n) 按照 n = n * 2 + 1 扩容，得到 3
      resize(NextCapacity(SooCapacity()));
    }
    // 特定上下文的定位操作
    // 上面 resize() 操作肯定能确定 index==1 已占位
    // 那么只会在 0 和 2 下标处定位
    const size_t index =
        PrepareInsertAfterSoo(hash_ref()(key), sizeof(slot_type), common());
    return {iterator_at(index), true};
  }

  iterator soo_iterator() {
    // 不使用 generation_ptr，即 nullptr
    return {SooControl(), soo_slot(), common().generation_ptr()};
  }

  CommonFields& common() { return settings_.template get<0>(); }

~~想不明白，谷歌怎么老挂念它那小得可怜的 SOO 容量，毫无意义。~~

首次操作是比较简单的 SOO 场景，因为必为 empty()，所以直接构造就好。这里 soo_iterator 只是简单的从 common 拿到指针，然后返回。那么接下来就知道 slot 的地址，往里面插入即可。

同理，如果第二次插入操作可以匹配到 SOO slot，那也是直接构造，因为 SOO map 场景 capacity 不会超过 1，只需要 pair.second 提示已存在。如果这一次插入不能匹配，那么需要扩容。

扩容操作

  // Resizes table to the new capacity and move all elements to the new
  // positions accordingly.
  //
  // Note that for better performance instead of
  // find_first_non_full(common(), hash),
  // HashSetResizeHelper::FindFirstNonFullAfterResize(
  //    common(), old_capacity, hash)
  // can be called right after `resize`.
  void resize(size_t new_capacity) {
    raw_hash_set::resize_impl(common(), new_capacity, HashtablezInfoHandle{});
  }

resize_impl

  // Resizes set to the new capacity.
  // It is a static function in order to use its pointer in GetPolicyFunctions.
  ABSL_ATTRIBUTE_NOINLINE static void resize_impl(
      CommonFields& common, size_t new_capacity,
      HashtablezInfoHandle forced_infoz) {
    raw_hash_set* set = reinterpret_cast<raw_hash_set*>(&common);
    assert(IsValidCapacity(new_capacity));
    assert(!set->fits_in_soo(new_capacity));
    // 开启 SOO 且 capacity() 符合 SOO 要求
    const bool was_soo = set->is_soo();
    // slot 插入过元素
    const bool had_soo_slot = was_soo && !set->empty();
    // 得到 7 bit 大小的 H2 值，可以参考官方的设计笔记
    // hash_of(slot)
    // => PolicyTraits::apply(HashElement{hash_ref()}, PolicyTraits::element(slot))
    // => HashElement{hash_ref()}(key, static_cast<std::pair<const K, V>&>(slot->value))
    // => hash_ref()(key)
    //
    // H2(size_t hash): uint8_t
    // => hash & 0x7F
    // 如果此前不存在则返回空的 kEmpty (0b10000000)
    const ctrl_t soo_slot_h2 =
        had_soo_slot ? static_cast<ctrl_t>(H2(set->hash_of(set->soo_slot())))
                     : ctrl_t::kEmpty;
    HashSetResizeHelper resize_helper(common, was_soo, had_soo_slot,
                                      forced_infoz);
    // Initialize HashSetResizeHelper::old_heap_or_soo_. We can't do this in
    // HashSetResizeHelper constructor because it can't transfer slots when
    // transfer_uses_memcpy is false.
    if (PolicyTraits::transfer_uses_memcpy() || !had_soo_slot) {
      // old_heap_or_soo 的初始化，数据原样拷贝，从而使得后续 common 扩容再迁移
      resize_helper.old_heap_or_soo() = common.heap_or_soo();
    } else {
      // to_slot() 从 uchar[] 转为 map_slot_type*
      // 移动 slot，需要销毁操作
      set->transfer(set->to_slot(resize_helper.old_soo_data()),
                    set->soo_slot());
    }
    // 下面准备扩容
    common.set_capacity(new_capacity);
    // Note that `InitializeSlots` does different number initialization steps
    // depending on the values of `transfer_uses_memcpy` and capacities.
    // Refer to the comment in `InitializeSlots` for more details.
    const bool grow_single_group =
        // 执行分配、构造操作
        // 这里不会做销毁或解分配操作（看本函数最后一段代码），因为 common 可以是 SOO 形式
        resize_helper.InitializeSlots<CharAlloc, sizeof(slot_type),
                                      PolicyTraits::transfer_uses_memcpy(),
                                      SooEnabled(), alignof(slot_type)>(
            common, CharAlloc(set->alloc_ref()), soo_slot_h2, sizeof(key_type),
            sizeof(value_type));

    // In the SooEnabled() case, capacity is never 0 so we don't check.
    if (!SooEnabled() && resize_helper.old_capacity() == 0) {
      // InitializeSlots did all the work including infoz().RecordRehash().
      return;
    }
    assert(resize_helper.old_capacity() > 0);
    // Nothing more to do in this case.
    if (was_soo && !had_soo_slot) return;

    slot_type* new_slots = set->slot_array();
    if (grow_single_group) {
      if (PolicyTraits::transfer_uses_memcpy()) {
        // InitializeSlots did all the work.
        return;
      }
      if (was_soo) {
        // 和前面的 transfer 不同，这是 old_soo_data 到 new_slots 之间的操作
        // 只有 SOO slot 要处理
        set->transfer(new_slots + resize_helper.SooSlotIndex(),
                      to_slot(resize_helper.old_soo_data()));
        return;
      } else {
        // We want GrowSizeIntoSingleGroup to be called here in order to make
        // InitializeSlots not depend on PolicyTraits.
        resize_helper.GrowSizeIntoSingleGroup<PolicyTraits>(common,
                                                            set->alloc_ref());
      }
    // !grow_single_group
    } else {
      // InitializeSlots prepares control bytes to correspond to empty table.
      const auto insert_slot = [&](slot_type* slot) {
        size_t hash = PolicyTraits::apply(HashElement{set->hash_ref()},
                                          PolicyTraits::element(slot));
        // 这个函数和 target 变量留到后面再说
        // 从名字可知是一个查找过程，返回一个包含位置的信息
        auto target = find_first_non_full(common, hash);
        // 和前面的操作类似，找到 slot 位置（offset）就修改对应的 ctrl
        SetCtrl(common, target.offset, H2(hash), sizeof(slot_type));
        // 并且迁移对应的 slot 元素
        set->transfer(new_slots + target.offset, slot);
        return target.probe_length;
      };
      if (was_soo) {
        insert_slot(to_slot(resize_helper.old_soo_data()));
        return;
      } else {
        auto* old_slots = static_cast<slot_type*>(resize_helper.old_slots());
        size_t total_probe_length = 0;
        for (size_t i = 0; i != resize_helper.old_capacity(); ++i) {
          if (IsFull(resize_helper.old_ctrl()[i])) {
            total_probe_length += insert_slot(old_slots + i);
          }
        }
        common.infoz().RecordRehash(total_probe_length);
      }
    }
    // 最后的解分配工作
    resize_helper.DeallocateOld<alignof(slot_type)>(CharAlloc(set->alloc_ref()),
                                                    sizeof(slot_type));
  }

上面的函数细节很多，注释写了一点概况，更多细节见下面小节。

transfer

    // 如果可以使用 memcpy 进行转移，需要判断 Policy::transfer() 返回类型是否为 std::true_type
    // 这取决于：
    // FlatHashMapPolicy::slot_policy::transfer()
    // => container_internal::map_slot_policy<K, V>::transfer()
    // => absl::is_trivially_relocatable<std::pair<const K, V>>
    // 在 absl 的实现中它会 fallback 到 std::is_trivially_copyable
    if (PolicyTraits::transfer_uses_memcpy() || !had_soo_slot) {
      // old_heap_or_soo 的初始化
      resize_helper.old_heap_or_soo() = common.heap_or_soo();
    } else {
      // https://github.com/abseil/abseil-cpp/issues/755#issuecomment-670136413
      // transfer 使用 std::launder 的解释
      // to_slot 从 uchar[] 转为 map_slot_type*
      set->transfer(set->to_slot(resize_helper.old_soo_data()),
                    set->soo_slot());
    }

在前面的 resize_impl() 代码中，转移操作区分了两个版本，条件 transfer_uses_memcpy() 实际取决于 absl::is_trivially_relocatable<std::pair<const K, V>>，也就是移动和析构能否使用 memcpy() 替换。

| Trivial relocation                            | Relocation with move constructor               |
|-----------------------------------------------|------------------------------------------------|
| // allocate destination memory                | // allocate destination memory                 |
| dest = ::operator new(sizeof(Type));          | dest = ::operator new(sizeof(Type));           |
|                                               |                                                |
| // copy bytes                                 | // move construct                              |
| memcpy(dest, source,sizeof(Type));            | ::new(dest) Type(std::move(*source));          |
|                                               |                                                |
| // deallocate source                          | // destruct source                             |
| ::operator delete(source);                    | source->~Type();                               |
|                                               |                                                |
|                                               | // deallocate source                           |
|                                               | ::operator delete(source);                     |

这是一种优化手段，具体可以看下提案 P1144，更加具体的动机示例可以看这份演讲稿或者上表总结。目前使用这个条件来复制 common.heap_or_soo()，也就是 backing array。

  inline void transfer(slot_type* to, slot_type* from) {
    // 做了一些统计处理，不考虑的话就是直接调用 lambda
    common().RunWithReentrancyGuard(
        [&] { PolicyTraits::transfer(&alloc_ref(), to, from); });
  }
  template <class Allocator>
  static auto transfer(Allocator* alloc, slot_type* new_slot,
                       slot_type* old_slot) {
    return slot_policy::transfer(alloc, new_slot, old_slot);
  }

  template <class Allocator>
  static auto transfer(Allocator* alloc, slot_type* new_slot,
                       slot_type* old_slot) {
    // 前面提过了，transfer 取决于 relocatable
    auto is_relocatable =
        typename absl::is_trivially_relocatable<value_type>::type();

    // 为了处理 union 激活，可理解为空实现
    emplace(new_slot);
#if defined(__cpp_lib_launder) && __cpp_lib_launder >= 201606
    if (is_relocatable) {
      // value 使用了类型双关，为了避免问题直接拿 std::launder() 洗掉优化
      std::memcpy(static_cast<void*>(std::launder(&new_slot->value)),
                  static_cast<const void*>(&old_slot->value),
                  sizeof(value_type));
      return is_relocatable;
    }
#endif

    // 如果可以使用 mutable key
    if (kMutableKeys::value) {
      absl::allocator_traits<Allocator>::construct(
          *alloc, &new_slot->mutable_value, std::move(old_slot->mutable_value));
    } else {
      absl::allocator_traits<Allocator>::construct(*alloc, &new_slot->value,
                                                   std::move(old_slot->value));
    }
    destroy(alloc, old_slot);
    return is_relocatable;
  }

如果具备 trivially relocatable 条件，只需 std::memcpy()。

如果不具备 trivially relocatable 条件，那就需要显式地迁移 slot，也就是需要分配器策略去处理 construct() 和 destroy()。

InitializeSlots

  // Allocates a backing array for the hashtable.
  // Reads `capacity` and updates all other fields based on the result of
  // the allocation.
  //
  // It also may do the following actions:
  // 1. initialize control bytes
  // 2. initialize slots
  // 3. deallocate old slots.
  //
  // We are bundling a lot of functionality
  // in one ABSL_ATTRIBUTE_NOINLINE function in order to minimize binary code
  // duplication in raw_hash_set<>::resize.
  //
  // `c.capacity()` must be nonzero.
  // POSTCONDITIONS:
  //  1. CommonFields is initialized.
  //
  //  if IsGrowingIntoSingleGroupApplicable && TransferUsesMemcpy
  //    Both control bytes and slots are fully initialized.
  //    old_slots are deallocated.
  //    infoz.RecordRehash is called.
  //
  //  if IsGrowingIntoSingleGroupApplicable && !TransferUsesMemcpy
  //    Control bytes are fully initialized.
  //    infoz.RecordRehash is called.
  //    GrowSizeIntoSingleGroup must be called to finish slots initialization.
  //
  //  if !IsGrowingIntoSingleGroupApplicable
  //    Control bytes are initialized to empty table via ResetCtrl.
  //    raw_hash_set<>::resize must insert elements regularly.
  //    infoz.RecordRehash is called if old_capacity == 0.
  //
  //  Returns IsGrowingIntoSingleGroupApplicable result to avoid recomputation.
  // 这里注释已经说明了大概的思路，就是分配和构造新的 ctrl + slots，然后销毁旧副本
  // 也提到一些琐碎的实践，比如标记 noinline 减小二进制体积
  // 其他的复杂度就是区分不同的上下文进行微操优化
  template <typename Alloc, size_t SizeOfSlot, bool TransferUsesMemcpy,
            bool SooEnabled, size_t AlignOfSlot>
  ABSL_ATTRIBUTE_NOINLINE bool InitializeSlots(CommonFields& c, Alloc alloc,
                                               ctrl_t soo_slot_h2,
                                               size_t key_size,
                                               size_t value_size) {
    assert(c.capacity());
    // 空实现
    HashtablezInfoHandle infoz =
        ShouldSampleHashtablezInfo<Alloc>()
            ? SampleHashtablezInfo<SooEnabled>(SizeOfSlot, key_size, value_size,
                                               old_capacity_, was_soo_,
                                               forced_infoz_, c)
            : HashtablezInfoHandle{};

    const bool has_infoz = infoz.IsSampled();
    RawHashSetLayout layout(c.capacity(), AlignOfSlot, has_infoz);
    // 接近于 alloc::allocate()，只是为了处理对齐
    char* mem = static_cast<char*>(Allocate<BackingArrayAlignment(AlignOfSlot)>(
        &alloc, layout.alloc_size(SizeOfSlot)));
    const GenerationType old_generation = c.generation();
    c.set_generation_ptr(
        reinterpret_cast<GenerationType*>(mem + layout.generation_offset()));
    c.set_generation(NextGeneration(old_generation));
    // 可以看出 control 和 slot 共享同一块分配的 mem 内存地址
    c.set_control(reinterpret_cast<ctrl_t*>(mem + layout.control_offset()));
    // 从下面 RawHashSetLayout 可以推导出
    // mem = [GrowInfo, ctrl, [padding,] slots]
    // 其中：
    // * ctrl 所占字节数比 capacity 数值要大一点（capacity + Group::kWidth）
    //     因为这是要包含 cloned bytes
    // * GrowInfo 需要占用一个 size_t 大小，尚未初始化
    c.set_slots(mem + layout.slot_offset());
    // 处理 GrowInfo，记录数值 capacity()*7/8 - size()
    // 这里 7/8 是 Swiss Table 的最大负载因子
    // NOTE: 还是有点小心思，最高位是特殊删除标记
    ResetGrowthLeft(c);

    const bool grow_single_group =
        // 两个条件：
        // 1. capacity 不超过一个 group（layout.capacity <= Group::kWidth;）
        // 2. old capacity 小于 capacity，注释写道：为了更快拷贝 8 字节
        // 推测条件 2 的意思是 ctrl 处理 cloned bytes 不需要取模了
        // 以及扩容是至少倍增的，它需要一半的空间去 shuffle 周转
        IsGrowingIntoSingleGroupApplicable(old_capacity_, layout.capacity());
    // 这里面微操优化很多
    //
    // 如果之前是 SOO，就走第一条分支
    // 再次插入走第二条分支，
    // 超出 single group 单群小表限制后再到第三条分支
    if (SooEnabled && was_soo_ && grow_single_group) {
      // 见下面辅助实现的注释
      InitControlBytesAfterSoo(c.control(), soo_slot_h2, layout.capacity());
      // 这个条件见 resize_impl 的注释
      if (TransferUsesMemcpy && had_soo_slot_) {
        // 见下面辅助实现的注释
        TransferSlotAfterSoo(c, SizeOfSlot);
      }
      // SooEnabled implies that old_capacity_ != 0.
    // 运行时非 SOO 的情况（!was_soo_）
    //
    // 上面那条官方注释看着不太对劲，又有点道理
    // 虽然 SooEnabled 只与编译期 slot_type 类型别名有关
    // 但是确实能至少空出大小为 1 的 old_capacity_（HeapOrSoo）
    } else if ((SooEnabled || old_capacity_ != 0) && grow_single_group) {
      if (TransferUsesMemcpy) {
        // 处理 transferable objects
        // 这一行相当于同时做了：
        // * GrowIntoSingleGroupShuffleControlBytes(c.control(), c.capacity());
        // * GrowIntoSingleGroupShuffleTransferableSlots(c.slot_array(), slot_size);
        // 其实我没太看懂，见 else 的注释
        // 也可以见 prepare insert non-SOO 中的 case 1，使得查找甚至不需要判断是否为空
        GrowSizeIntoSingleGroupTransferable(c, SizeOfSlot);
        DeallocateOld<AlignOfSlot>(alloc, SizeOfSlot);
      } else {
        // 看了一眼注释很复杂，是近期做的改动，把 control bytes 打散了
        // 不知道是出于什么目的，commit 没写
        // https://github.com/abseil/abseil-cpp/commit/0ef87fa0c1d9cff4c231a2a3af854f0b5edfa92b
        // 字面理解是「SIMD 的存在使得一个 group 内的元素顺序并不重要」
        // 确实有道理，但是动机是为了啥？
        //
        // UPDATE: 噢找到了，.h/L1680 说道：为了不让你猜到数据是啥，提高安全性
        GrowIntoSingleGroupShuffleControlBytes(c.control(), layout.capacity());
      }
    // 其它条件，比如 !grow_single_group
    // 初始化 ctrl 为 kEmpty 和 kSentinel
    } else {
      ResetCtrl(c, SizeOfSlot);
    }

    c.set_has_infoz(has_infoz);
    if (has_infoz) {
      infoz.RecordStorageChanged(c.size(), layout.capacity());
      if ((SooEnabled && was_soo_) || grow_single_group || old_capacity_ == 0) {
        infoz.RecordRehash(0);
      }
      c.set_infoz(infoz);
    }
    return grow_single_group;
  }

///////////////////////////////////////////////////////////////// 下面的都是辅助实现

// Helper class for computing offsets and allocation size of hash set fields.
// 计算 backing array 各个成员的起始偏移
class RawHashSetLayout {
 public:
  explicit RawHashSetLayout(size_t capacity, size_t slot_align, bool has_infoz)
      : capacity_(capacity),
        // ControlOffset() 返回 sizeof(HashtablezInfoHandle)? + sizeof(GrowthInfo)
        // 这里只考虑后者 sizeof(GrowthInfo)
        control_offset_(ControlOffset(has_infoz)),
        // NumControlBytes(capacity) 返回 capacity + 1 + NumClonedBytes()
        // NumClonedBytes() 等于 Group::kWidth - 1
        generation_offset_(control_offset_ + NumControlBytes(capacity)),
        slot_offset_(
            // NumGenerationBytes() 返回 0
            // 可能在前面需要留下 padding
            (generation_offset_ + NumGenerationBytes() + slot_align - 1) &
            (~slot_align + 1)) {
    assert(IsValidCapacity(capacity));
  }

  // Returns the capacity of a table.
  size_t capacity() const { return capacity_; }

  // Returns precomputed offset from the start of the backing allocation of
  // control.
  size_t control_offset() const { return control_offset_; }

  // Given the capacity of a table, computes the offset (from the start of the
  // backing allocation) of the generation counter (if it exists).
  size_t generation_offset() const { return generation_offset_; }

  // Given the capacity of a table, computes the offset (from the start of the
  // backing allocation) at which the slots begin.
  size_t slot_offset() const { return slot_offset_; }

  // Given the capacity of a table, computes the total size of the backing
  // array.
  size_t alloc_size(size_t slot_size) const {
    return slot_offset_ + capacity_ * slot_size;
  }

 private:
  size_t capacity_;
  size_t control_offset_;
  size_t generation_offset_;
  size_t slot_offset_;
};

// Returns the number of "cloned control bytes".
//
// This is the number of control bytes that are present both at the beginning
// of the control byte array and at the end, such that we can create a
// `Group::kWidth`-width probe window starting from any control byte.
constexpr size_t NumClonedBytes() { return Group::kWidth - 1; }

// Allocates at least n bytes aligned to the specified alignment.
// Alignment must be a power of 2. It must be positive.
//
// Note that many allocators don't honor alignment requirements above certain
// threshold (usually either alignof(std::max_align_t) or alignof(void*)).
// Allocate() doesn't apply alignment corrections. If the underlying allocator
// returns insufficiently alignment pointer, that's what you are going to get.
template <size_t Alignment, class Alloc>
void* Allocate(Alloc* alloc, size_t n) {
  static_assert(Alignment > 0, "");
  assert(n && "n must be positive");
  using M = AlignedType<Alignment>;
  using A = typename absl::allocator_traits<Alloc>::template rebind_alloc<M>;
  using AT = typename absl::allocator_traits<Alloc>::template rebind_traits<M>;
  // On macOS, "mem_alloc" is a #define with one argument defined in
  // rpc/types.h, so we can't name the variable "mem_alloc" and initialize it
  // with the "foo(bar)" syntax.
  A my_mem_alloc(*alloc);
  void* p = AT::allocate(my_mem_alloc, (n + sizeof(M) - 1) / sizeof(M));
  assert(reinterpret_cast<uintptr_t>(p) % Alignment == 0 &&
         "allocator does not respect alignment");
  return p;
}

inline void ResetGrowthLeft(CommonFields& common) {
  // 初始化 grow_info 唯一的成员，其数值为传入的参数
  common.growth_info().InitGrowthLeftNoDeleted(
      CapacityToGrowth(common.capacity()) - common.size());
}

// General notes on capacity/growth methods below:
// - We use 7/8th as maximum load factor. For 16-wide groups, that gives an
//   average of two empty slots per group.
// - For (capacity+1) >= Group::kWidth, growth is 7/8*capacity.
// - For (capacity+1) < Group::kWidth, growth == capacity. In this case, we
//   never need to probe (the whole table fits in one group) so we don't need a
//   load factor less than 1.

// Given `capacity`, applies the load factor; i.e., it returns the maximum
// number of values we should put into the table before a resizing rehash.
// 返回 7/8 的 capacity。这里 7/8 是设计好的负载因子
inline size_t CapacityToGrowth(size_t capacity) {
  assert(IsValidCapacity(capacity));
  // `capacity*7/8`
  if (Group::kWidth == 8 && capacity == 7) {
    // x-x/8 does not work when x==7.
    return 6;
  }
  return capacity - capacity / 8;
}

///////////////////////////////////////////////////////////////// 不同的上下文

// If the table was SOO, initializes new control bytes. `h2` is the control
// byte corresponding to the full slot. Must be called only if
// IsGrowingIntoSingleGroupApplicable returned true.
// Requires: `had_soo_slot_ || h2 == ctrl_t::kEmpty`.
void HashSetResizeHelper::InitControlBytesAfterSoo(ctrl_t* new_ctrl, ctrl_t h2,
                                                   size_t new_capacity) {
  assert(is_single_group(new_capacity));
  // ctrl 全部初始化为 kEmpty
  std::memset(new_ctrl, static_cast<int8_t>(ctrl_t::kEmpty),
              NumControlBytes(new_capacity));
  assert(HashSetResizeHelper::SooSlotIndex() == 1);
  // This allows us to avoid branching on had_soo_slot_.
  assert(had_soo_slot_ || h2 == ctrl_t::kEmpty);
  // 固定 index=1 设为原来的 H2
  // 后面那个是 cloned byte
  new_ctrl[1] = new_ctrl[new_capacity + 2] = h2;
  // 哨兵字节
  new_ctrl[new_capacity] = ctrl_t::kSentinel;
}

void HashSetResizeHelper::TransferSlotAfterSoo(CommonFields& c,
                                               size_t slot_size) {
  assert(was_soo_);
  assert(had_soo_slot_);
  assert(is_single_group(c.capacity()));
  // 对应下标的 slot 初始化，注意 SooSlotIndex() 固定为 1
  std::memcpy(SlotAddress(c.slot_array(), SooSlotIndex(), slot_size),
              old_soo_data(), slot_size);
  // 配合 ASAN 算法，对 non-full slot 投毒，略
  PoisonSingleGroupEmptySlots(c, slot_size);
}

初始化会通过 allocator 分配新的 backing array，具体内存布局为 [GrowInfo, ctrl, [padding,] slots]，这四个「成员」共用一次 allocate 就能完成分配。其中：

ctrl 所占字节数比 capacity 数值要大一点（capacity + Group::kWidth），因为这是要包含 cloned bytes。通常 SSE2 版本的 kWidth 设为 16。
GrowInfo 需要占用一个 size_t 大小，简单来说就是提前算好下次扩容前还能容纳几个元素，这个值已经考虑了负载因子。注意最高位是特殊的删除标记，表示 Topmost bit signal whenever there are deleted slots。
slots 所占字节数为 capacity * sizeof(slot_type)，但是考虑对齐的话可能还会在前面多出 padding 区域。

分配后会进行「构造」，ctrl[capacity] 设为 kSentinel，ctrl[1] 和 cloned byte 都设为迁移元素的 H2 哈希值，其他区域均为 kEmpty。注意这里下标 1 是固定的，因此原有的（唯一）元素会迁移到 slot 1。

NOTE: H2 是完整哈希值的低 7 位，面向 ctrl。剩余的高 57 位则是 H1，后面会看到。

prepare insert after SOO

size_t PrepareInsertAfterSoo(size_t hash, size_t slot_size,
                             CommonFields& common) {
  assert(common.capacity() == NextCapacity(SooCapacity()));
  // After resize from capacity 1 to 3, we always have exactly the slot with
  // index 1 occupied, so we need to insert either at index 0 or index 2.
  assert(HashSetResizeHelper::SooSlotIndex() == 1);
  // 增加 size() 一个单位
  PrepareInsertCommon(common);
  // H1(hash, ctrl)
  // => (hash >> 7) ^ (reinterpret_cast<uintptr_t>(ctrl) >> 12)
  // 一些小优化，H1 还会对 ctrl 加盐（使得扩容时改动哈希值的分布），2^^12 刚好是页表大小
  // 注意这里是 H1() & 2，所以下标要么是 0，要么是 2
  const size_t offset = H1(hash, common.control()) & 2;
  // 减少 growth_info 一个单位
  common.growth_info().OverwriteEmptyAsFull();
  // 等价于 ctrl[offset] = ctrl[cloned] = H2(hash)
  SetCtrlInSingleGroupTable(common, offset, H2(hash), slot_size);
  common.infoz().RecordInsert(hash, /*distance_from_desired=*/0);
  // 返回下标，给迭代器用
  return offset;
}

///////////////////////////////////////////////////////////////// 一些辅助函数

// 整个函数等价于 size() += 1
inline void PrepareInsertCommon(CommonFields& common) {
  common.increment_size();
  common.maybe_increment_generation_on_insert();
}

// Extracts the H1 portion of a hash: 57 bits mixed with a per-table salt.
inline size_t H1(size_t hash, const ctrl_t* ctrl) {
  return (hash >> 7) ^ PerTableSalt(ctrl);
}

// Returns a per-table, hash salt, which changes on resize. This gets mixed into
// H1 to randomize iteration order per-table.
//
// The seed consists of the ctrl_ pointer, which adds enough entropy to ensure
// non-determinism of iteration order in most cases.
inline size_t PerTableSalt(const ctrl_t* ctrl) {
  // The low bits of the pointer have little or no entropy because of
  // alignment. We shift the pointer to try to use higher entropy bits. A
  // good number seems to be 12 bits, because that aligns with page size.
  return reinterpret_cast<uintptr_t>(ctrl) >> 12;
}

// Like SetCtrl, but in a single group table, we can save some operations when
// setting the cloned control byte.
inline void SetCtrlInSingleGroupTable(const CommonFields& c, size_t i, ctrl_t h,
                                      size_t slot_size) {
  assert(is_single_group(c.capacity()));
  DoSanitizeOnSetCtrl(c, i, h, slot_size);
  ctrl_t* ctrl = c.control();
  ctrl[i] = h;
  ctrl[i + c.capacity() + 1] = h;
}

扩容执行后，还需要为迭代器定位到一个具体的下标。这里的一点小心思是针对特定上下文的优化，因为明确扩容后的下标 1 是已被占位的，所以只需通过 H1 从 0 和 2 当中定位即可。

这里也有点小心思，H1(hash = H(key)) = (hash >> 7) ^ salt(ctrl)。也就是说 H1 还会对 ctrl 加盐，使得扩容时（因 ctrl 指针变动而）重平衡哈希值的分布。这里的盐值取 uintptr_t(ctrl) >> 12，而 \(2^{12}\) 是一个页表的大小，尽量避免分配器低位对齐造成影响。

预插入（non-SOO）

  template <class K>
  std::pair<iterator, bool> find_or_prepare_insert_non_soo(const K& key) {
    assert(!is_soo());
    // 因为下面要进行计算哈希，这里要异步预取堆上 ctrl 到缓存行
    prefetch_heap_block();
    auto hash = hash_ref()(key);
    // 生成一个 probe seq，用于后续线性探测
    auto seq = probe(common(), hash);
    const ctrl_t* ctrl = control();
    while (true) {
      // 使用 SIMD 加速载入部分 ctrl 到 Group
      Group g{ctrl + seq.offset()};
      // 如果批量命中，再往块中查找
      for (uint32_t i : g.Match(H2(hash))) {
        if (ABSL_PREDICT_TRUE(PolicyTraits::apply(
                EqualElement<K>{key, eq_ref()},
                PolicyTraits::element(slot_array() + seq.offset(i)))))
          // 找到就可以收工
          return {iterator_at(seq.offset(i)), false};
      }
      // 如果这里有空位
      auto mask_empty = g.MaskEmpty();
      if (ABSL_PREDICT_TRUE(mask_empty)) {
        size_t target = seq.offset(
            // 等价于 uint32_t 类型的 mask_empty.LowestBitSet()，其他参数为调试用途
            GetInsertionOffset(mask_empty, capacity(), hash, control()));
        return {iterator_at(PrepareInsertNonSoo(common(), hash,
                                                // FindInfo 等价于 {offset, probe_length}
                                                FindInfo{target, seq.index()},
                                                // 提供一堆 policy 函数接口
                                                GetPolicyFunctions())),
                true};
      }
      seq.next();
      // 肯定有容身之处！
      assert(seq.index() <= capacity() && "full table!");
    }
  }

prefetch

  // Prefetch the heap-allocated memory region to resolve potential TLB and
  // cache misses. This is intended to overlap with execution of calculating the
  // hash for a key.
  // 异步预取非 SOO 的 ctrl bytes
  void prefetch_heap_block() const {
    assert(!is_soo());
#if ABSL_HAVE_BUILTIN(__builtin_prefetch) || defined(__GNUC__)
    __builtin_prefetch(control(), 0, 1);
#endif
  }

似乎有点用的小心思。因为接下来的操作是计算哈希，不如顺手发出 ctrl 异步载入缓存行的指令。这里的 ctrl 可能是足够长的，不能行内覆盖到具体的位置，不过也算尽力了。

probe

// Begins a probing operation on `common.control`, using `hash`.
inline probe_seq<Group::kWidth> probe(const ctrl_t* ctrl, const size_t capacity,
                                      size_t hash) {
  return probe_seq<Group::kWidth>(H1(hash, ctrl), capacity);
}
inline probe_seq<Group::kWidth> probe(const CommonFields& common, size_t hash) {
  return probe(common.control(), common.capacity(), hash);
}

// The state for a probe sequence.
//
// Currently, the sequence is a triangular progression of the form
//
//   p(i) := Width * (i^2 + i)/2 + hash (mod mask + 1)
//
// The use of `Width` ensures that each probe step does not overlap groups;
// the sequence effectively outputs the addresses of *groups* (although not
// necessarily aligned to any boundary). The `Group` machinery allows us
// to check an entire group with minimal branching.
//
// Wrapping around at `mask + 1` is important, but not for the obvious reason.
// As described above, the first few entries of the control byte array
// are mirrored at the end of the array, which `Group` will find and use
// for selecting candidates. However, when those candidates' slots are
// actually inspected, there are no corresponding slots for the cloned bytes,
// so we need to make sure we've treated those offsets as "wrapping around".
//
// It turns out that this probe sequence visits every group exactly once if the
// number of groups is a power of two, since (i^2+i)/2 is a bijection in
// Z/(2^m). See https://en.wikipedia.org/wiki/Quadratic_probing
// 有两个要点：
// * 环绕到最后前因为有 cloned bytes，所以 SIMD 依然可以找对人
// * 探测过程既不会重复，也不会缺漏
template <size_t Width>
class probe_seq {
 public:
  // Creates a new probe sequence using `hash` as the initial value of the
  // sequence and `mask` (usually the capacity of the table) as the mask to
  // apply to each value in the progression.
  probe_seq(size_t hash, size_t mask) {
    assert(((mask + 1) & mask) == 0 && "not a mask");
    mask_ = mask;
    // 初始为 hash
    offset_ = hash & mask_;
  }

  // The offset within the table, i.e., the value `p(i)` above.
  size_t offset() const { return offset_; }
  size_t offset(size_t i) const { return (offset_ + i) & mask_; }

  void next() {
    // index 序列为  0+{w, 2w, 3w, 4w, 5w, ...}
    // offset 序列为 H+{w, 3w, 6w, 10w, 15w, ...}
    // 总之就是 hash 加上 index 的前缀和（别忘了取模）
    // 也就是注释说的 triangular progression
    index_ += Width;
    offset_ += index_;
    offset_ &= mask_;
  }
  // 0-based probe index, a multiple of `Width`.
  size_t index() const { return index_; }

 private:
  size_t mask_;
  size_t offset_;
  size_t index_ = 0;
};

probe() 是延迟计算的，只是返回一个表达式对象。

Group

#ifdef ABSL_INTERNAL_HAVE_SSE2
using Group = GroupSse2Impl;
using GroupFullEmptyOrDeleted = GroupSse2Impl;
#elif /* ... */
#endif

// Group 实现类
struct GroupSse2Impl {
  static constexpr size_t kWidth = 16;  // the number of slots per group

  explicit GroupSse2Impl(const ctrl_t* pos) {
    // 载入 16 个 ctrl byte
    ctrl = _mm_loadu_si128(reinterpret_cast<const __m128i*>(pos));
  }

  // Returns a bitmask representing the positions of slots that match hash.
  BitMask<uint16_t, kWidth> Match(h2_t hash) const {
    auto match = _mm_set1_epi8(static_cast<char>(hash));
    return BitMask<uint16_t, kWidth>(
        // _mm_cmpeq_epi8 字节相等时 match 设为 0xff，不相等设为 0x00
        // _mm_movemask_epi8 压缩向量的每个最高位布尔值为标量的低 16 位
        static_cast<uint16_t>(_mm_movemask_epi8(_mm_cmpeq_epi8(match, ctrl))));
  }

  // Returns a bitmask representing the positions of empty slots.
  NonIterableBitMask<uint16_t, kWidth> MaskEmpty() const {
// SSSE3 要指定 -march，默认编译器不提供支持
#ifdef ABSL_INTERNAL_HAVE_SSSE3
    // This only works because ctrl_t::kEmpty is -128.
    // 这是精心设计过的数值
    // 假设 r = _mm_sign_epi8(a, b)，那么 r[i] = b[i] < 0 ? -a[i] : a[i];
    // 虽然 i8 是不存在 128 的数值，但是一个数值 x 的负数可以解释为 ~x + 1
    // 那么 -128 (0b 1000 0000) 的环绕仍是 -128
    // 所以只有 kEmpty 最高位保持 1，然后通过 _mm_movemask_epi8 筛选出来
    return NonIterableBitMask<uint16_t, kWidth>(
        static_cast<uint16_t>(_mm_movemask_epi8(_mm_sign_epi8(ctrl, ctrl))));
#else
    // 这种写法比较直白了，但是要三条向量指令
    auto match = _mm_set1_epi8(static_cast<char>(ctrl_t::kEmpty));
    return NonIterableBitMask<uint16_t, kWidth>(
        static_cast<uint16_t>(_mm_movemask_epi8(_mm_cmpeq_epi8(match, ctrl))));
#endif
  }

  // ...

  __m128i ctrl;
};

// An abstract bitmask, such as that emitted by a SIMD instruction.
//
// Specifically, this type implements a simple bitset whose representation is
// controlled by `SignificantBits` and `Shift`. `SignificantBits` is the number
// of abstract bits in the bitset, while `Shift` is the log-base-two of the
// width of an abstract bit in the representation.
// This mask provides operations for any number of real bits set in an abstract
// bit. To add iteration on top of that, implementation must guarantee no more
// than the most significant real bit is set in a set abstract bit.
template <class T, int SignificantBits, int Shift = 0>
class NonIterableBitMask {
 public:
  explicit NonIterableBitMask(T mask) : mask_(mask) {}

  explicit operator bool() const { return this->mask_ != 0; }

  // ...

  T mask_;
};

Group 是以 SIMD 宽度来设计的。目前会用到的 SIMD 操作有 Match() 和 MaskEmpty()，都写好注释了。这也解释了 ctrl_t::kEmpty 必须为 -128 的原因。

ctrl_t

// The values here are selected for maximum performance. See the static asserts
// below for details.

// A `ctrl_t` is a single control byte, which can have one of four
// states: empty, deleted, full (which has an associated seven-bit h2_t value)
// and the sentinel. They have the following bit patterns:
//
//      empty: 1 0 0 0 0 0 0 0
//    deleted: 1 1 1 1 1 1 1 0
//       full: 0 h h h h h h h  // h represents the hash bits.
//   sentinel: 1 1 1 1 1 1 1 1
//
// These values are specifically tuned for SSE-flavored SIMD.
// The static_asserts below detail the source of these choices.
//
// We use an enum class so that when strict aliasing is enabled, the compiler
// knows ctrl_t doesn't alias other types.
enum class ctrl_t : int8_t {
  kEmpty = -128,   // 0b10000000
  kDeleted = -2,   // 0b11111110
  kSentinel = -1,  // 0b11111111
};
static_assert(
    (static_cast<int8_t>(ctrl_t::kEmpty) &
     static_cast<int8_t>(ctrl_t::kDeleted) &
     static_cast<int8_t>(ctrl_t::kSentinel) & 0x80) != 0,
     // 非哈希值（H2）都会固定符号位为 1。同时这也是 H2 取 7 bit 的原因，因为要丢弃符号位
     // 注：谷歌之前说过（忘记来源了 orz），7 bit 只是一种实现既定的设计，后面可能会改
    "Special markers need to have the MSB to make checking for them efficient");
static_assert(
    ctrl_t::kEmpty < ctrl_t::kSentinel && ctrl_t::kDeleted < ctrl_t::kSentinel,
    // 保证 kEmpty 和 kDeleted 均小于 kSentinel
    // 有些场景不需要严格区分空还是已删除，只需对比一次 kSentinel 数值就能完成判断
    // 这也是 IsEmptyOrDeleted() 的实现
    "ctrl_t::kEmpty and ctrl_t::kDeleted must be smaller than "
    "ctrl_t::kSentinel to make the SIMD test of IsEmptyOrDeleted() efficient");
static_assert(
    ctrl_t::kSentinel == static_cast<ctrl_t>(-1),
    // 虽然没太明白意图，但是 pcmpeqd 相等时是赋值 all-one 到寄存器的，也就是仍等于 -1
    "ctrl_t::kSentinel must be -1 to elide loading it from memory into SIMD "
    "registers (pcmpeqd xmm, xmm)");
static_assert(ctrl_t::kEmpty == static_cast<ctrl_t>(-128),
              // 前面 Group 解释过这个设计
              "ctrl_t::kEmpty must be -128 to make the SIMD check for its "
              "existence efficient (psignb xmm, xmm)");
static_assert(
    (~static_cast<int8_t>(ctrl_t::kEmpty) &
     ~static_cast<int8_t>(ctrl_t::kDeleted) &
     // 使用了最低位 0 的共享
     static_cast<int8_t>(ctrl_t::kSentinel) & 0x7F) != 0,
    // 用于 _mm_cmpgt_epi8(_mm_set1_epi8(kSentinel), ctrl) 对比，并生成 mask
    // 从实现来看，似乎没有这个 shared bit 必要？
    "ctrl_t::kEmpty and ctrl_t::kDeleted must share an unset bit that is not "
    "shared by ctrl_t::kSentinel to make the scalar test for "
    "MaskEmptyOrDeleted() efficient");
static_assert(ctrl_t::kDeleted == static_cast<ctrl_t>(-2),
              // -2 是用算法凑出来的，具体见 ConvertSpecialToEmptyAndFullToDeleted()
              // 这个函数可以达成效果：
              // * kDeleted -> kEmpty
              // * kEmpty -> kEmpty
              // * _ -> kDeleted
              //
              // 实现方式：
              //   _mm_or_si128(_mm_shuffle_epi8(_mm_set1_epi8(126), ctrl), _mm_set1_epi8(-128))
              //   其中 126 = 0b 0111 1110
              //   又因为 shuffle 操作区分符号位，所以：
              //     对于非哈希值，0 | -128 == kEmpty
              //     对于哈希值，126 | -128 == kDeleted
              "ctrl_t::kDeleted must be -2 to make the implementation of "
              "ConvertSpecialToEmptyAndFullToDeleted efficient");

其实 static_assert() 提示了 ctrl_t 不同数值的设计，官方解释并不算详细，所以加了点注释。

// Helpers for checking the state of a control byte.
inline bool IsEmpty(ctrl_t c) { return c == ctrl_t::kEmpty; }
inline bool IsFull(ctrl_t c) {
  // Cast `c` to the underlying type instead of casting `0` to `ctrl_t` as `0`
  // is not a value in the enum. Both ways are equivalent, but this way makes
  // linters happier.
  return static_cast<std::underlying_type_t<ctrl_t>>(c) >= 0;
}
inline bool IsDeleted(ctrl_t c) { return c == ctrl_t::kDeleted; }
inline bool IsEmptyOrDeleted(ctrl_t c) { return c < ctrl_t::kSentinel; }

这里顺手看下 IsFull() 和 IsEmptyOrDeleted() 的实现：

full：符号位必须为 0，因此只需判断 >= 0。
empty|deleted：这两个值均小于 kSentinel，因此比较即可。

至于为什么 ctrl_t::kDeleted 一定是 -2，我在上面注释也解释了。

prepare insert non-SOO

// Given the hash of a value not currently in the table and the first empty
// slot in the probe sequence, finds a viable slot index to insert it at.
//
// In case there's no space left, the table can be resized or rehashed
// (for tables with deleted slots, see FindInsertPositionWithGrowthOrRehash).
//
// In the case of absence of deleted slots and positive growth_left, the element
// can be inserted in the provided `target` position.
//
// When the table has deleted slots (according to GrowthInfo), the target
// position will be searched one more time using `find_first_non_full`.
//
// REQUIRES: Table is not SOO.
// REQUIRES: At least one non-full slot available.
// REQUIRES: `target` is a valid empty position to insert.
size_t PrepareInsertNonSoo(CommonFields& common, size_t hash, FindInfo target,
                           const PolicyFunctions& policy) {
  // When there are no deleted slots in the table
  // and growth_left is positive, we can insert at the first
  // empty slot in the probe sequence (target).
  // 未曾删除且有增长空间的场景是热路径（case 0），可以直接插入首个 empty slot
  const bool use_target_hint =
      // Optimization is disabled when generations are enabled.
      // We have to rehash even sparse tables randomly in such mode.
      !SwisstableGenerationsEnabled() &&
      // growth_info 没有高位删除标记，并且数值大于 0
      common.growth_info().HasNoDeletedAndGrowthLeft();
  // 删除过元素，或者没有余位，这里使用 unlikely 标记
  // 这些冷路径都需要重新确认 target
  if (ABSL_PREDICT_FALSE(!use_target_hint)) {
    // Notes about optimized mode when generations are disabled:
    // We do not enter this branch if table has no deleted slots
    // and growth_left is positive.
    // We enter this branch in the following cases listed in decreasing
    // frequency:
    // 1. Table without deleted slots (>95% cases) that needs to be resized.
    // 2. Table with deleted slots that has space for the inserting element.
    // 3. Table with deleted slots that needs to be rehashed or resized.
    // 看样子这些冷路径的 if-else 分支是按照场景频率来设计的
    // 虽然我不知道概率数据是怎么测出来的，能不能靠这些负载反推出谷歌的商业机密？
    //
    // case 1: 没删过，没余位（likely）
    if (ABSL_PREDICT_TRUE(common.growth_info().HasNoGrowthLeftAndNoDeleted())) {
      const size_t old_capacity = common.capacity();
      // 见之前的扩容操作
      policy.resize(common, NextCapacity(old_capacity), HashtablezInfoHandle{});
      // 一个特定上下文的查找操作，可以对比 case 2 使用通用的 find_first_non_full()
      target = HashSetResizeHelper::FindFirstNonFullAfterResize(
          common, old_capacity, hash);
    } else {
      // Note: the table may have no deleted slots here when generations
      // are enabled.
      const bool rehash_for_bug_detection =
          common.should_rehash_for_bug_detection_on_insert();
      // 不管，默认 false
      if (rehash_for_bug_detection) {
        // Move to a different heap allocation in order to detect bugs.
        const size_t cap = common.capacity();
        policy.resize(common,
                      common.growth_left() > 0 ? cap : NextCapacity(cap),
                      HashtablezInfoHandle{});
      }
      // case 2: 删过，有余位
      if (ABSL_PREDICT_TRUE(common.growth_left() > 0)) {
        target = find_first_non_full(common, hash);
      // case 3: 删过，没余位
      } else {
        target = FindInsertPositionWithGrowthOrRehash(common, hash, policy);
      }
    }
  }
  // case 0 热路径
  //
  // 等价于 size() += 1
  PrepareInsertCommon(common);
  // 等价于 grow_info -= IsEmpty(common.control()[target.offset])
  common.growth_info().OverwriteControlAsFull(common.control()[target.offset]);
  SetCtrl(common, target.offset, H2(hash), policy.slot_size);
  common.infoz().RecordInsert(hash, target.probe_length);
  return target.offset;
}

~~TODO: 《界轨》通关中，小部分遗留代码先放着。~~

~~挖坑不填坑，没人比我更卑鄙~~

UPDATE: 我回来了。垃圾游戏浪费时间，继续补完文章。

这里的插入 target 细分了很多的 case，主要就看有没有删过和多余的空间。

case 0

size_t PrepareInsertNonSoo(CommonFields& common, size_t hash, FindInfo target,
                           const PolicyFunctions& policy) {
  // ...

  // 等价于 size() += 1
  PrepareInsertCommon(common);
  // 等价于 grow_info -= IsEmpty(common.control()[target.offset])
  common.growth_info().OverwriteControlAsFull(common.control()[target.offset]);
  // 前面的扩容操作见过该函数，就是设置 ctrl[target.offset] = h2
  // 除此以外，还可能处理 cloned bytes，具体见下面
  SetCtrl(common, target.offset, H2(hash), policy.slot_size);
  common.infoz().RecordInsert(hash, target.probe_length);
  return target.offset;
}

// Overload for setting to an occupied `h2_t` rather than a special `ctrl_t`.
inline void SetCtrl(const CommonFields& c, size_t i, h2_t h, size_t slot_size) {
  SetCtrl(c, i, static_cast<ctrl_t>(h), slot_size);
}

// Sets `ctrl[i]` to `h`.
//
// Unlike setting it directly, this function will perform bounds checks and
// mirror the value to the cloned tail if necessary.
inline void SetCtrl(const CommonFields& c, size_t i, ctrl_t h,
                    size_t slot_size) {
  DoSanitizeOnSetCtrl(c, i, h, slot_size);
  ctrl_t* ctrl = c.control();
  ctrl[i] = h;
  // 这个写法很绕，用于处理 cloned bytes 同时显式避免分支判断
  // 对于 i < NumClonedBytes 的情况，赋值到对应 cloned byte
  // 对于 i >= NumClonedBytes 的情况，实际不会有任何影响
  ctrl[((i - NumClonedBytes()) & c.capacity()) +
       (NumClonedBytes() & c.capacity())] = h;
}

在 case 0 的上下文，刚才的 prepare insert non-SOO 流程可以简化到以上形式。只处理 ctrl 信息。

NOTE: 可以对比 setCtrl() 和 SetCtrlInSingleGroupTable() 实现，后者作为特定上下文的优化，进一步免去了 & 运算。

case 1

后续的 case 均会重新定位 target，并且定位后再次垂落到 case 0 相同流程。

size_t PrepareInsertNonSoo(CommonFields& common, size_t hash, FindInfo target,
                           const PolicyFunctions& policy) {
  // ...
  if (ABSL_PREDICT_FALSE(!use_target_hint)) {
    // case 1: 没删过，没余位（likely）
    if (ABSL_PREDICT_TRUE(common.growth_info().HasNoGrowthLeftAndNoDeleted())) {
      const size_t old_capacity = common.capacity();
      // 见之前的扩容操作
      policy.resize(common, NextCapacity(old_capacity), HashtablezInfoHandle{});
      // 一个特定上下文的查找操作，可以对比 case 2 使用通用的 find_first_non_full()
      target = HashSetResizeHelper::FindFirstNonFullAfterResize(
          common, old_capacity, hash);
    }
    // ...
  }
  // case 0...
}

  // Optimized for small groups version of `find_first_non_full`.
  // Beneficial only right after calling `raw_hash_set::resize`.
  // It is safe to call in case capacity is big or was not changed, but there
  // will be no performance benefit.
  // It has implicit assumption that `resize` will call
  // `GrowSizeIntoSingleGroup*` in case `IsGrowingIntoSingleGroupApplicable`.
  // Falls back to `find_first_non_full` in case of big groups.
  static FindInfo FindFirstNonFullAfterResize(const CommonFields& c,
                                              size_t old_capacity,
                                              size_t hash) {
    // 不符合单群概念，这个函数在前面扩容阶段也出现过
    if (!IsGrowingIntoSingleGroupApplicable(old_capacity, c.capacity())) {
      // 垂落到 case 2 的核心函数
      return find_first_non_full(c, hash);
    }
    // Find a location for the new element non-deterministically.
    // Note that any position is correct.
    // It will located at `half_old_capacity` or one of the other
    // empty slots with approximately 50% probability each.
    // 这里说的应该是考虑的数据被猜中的安全性
    // 任何位置都是正确的？
    size_t offset = probe(c, hash).offset();

    // Note that we intentionally use unsigned int underflow.
    // 如果 offset 足够小，则会溢出满足条件
    // 否则就是位于右半边必为空
    if (offset - (old_capacity + 1) >= old_capacity) {
      // Offset fall on kSentinel or into the mostly occupied first half.
      // 这里原来是放置哨兵位置的，扩容后不需要存在两个哨兵
      // Examples:
      // old_ctrl = 0S0EEEEEEE...
      // new_ctrl = S0EEEEEEEE...
      //
      // old_ctrl = 01S01EEEEE...
      // new_ctrl = 1S01EEEEEE...
      //
      // old_ctrl = 0123456S0123456EE...
      // new_ctrl = 456S0123?????????...
      // 注意上面 S 的位置
      offset = old_capacity / 2;
    }
    // 为什么确定这个 offset 肯定是 kEmpty 的呢？
    // 这要看回前面提到的 commit 0ef87fa0c1d9cff4c231a2a3af854f0b5edfa92b
    // 在扩容阶段的 GrowIntoSingleGroupShuffleControlBytes 函数当中
    // 本来是哨兵（shuffle 导致）的 ctrl[old_capacity / 2] 被设为 kEmpty
    assert(IsEmpty(c.control()[offset]));
    return FindInfo{offset, 0};
  }

case 1 相比 case 2 比较费解，建议先看 case 2，它也是 case 1 的后备方案。case 1 是针对单群做了特殊处理，谷歌认为这种做法对于单群更有效率且更加安全。这些前提是建立在前面的 shuffle 操作（比较占版面没放出来，建议自行翻阅 commit 记录）的基础上，在这种上下文的查找甚至不需要判断是否为空。

NOTES:

为什么不找 kDeleted 的位置？因为 case 1 不存在这种 slot。
为什么只应用于单群？谷歌在注释提到可以应用于其他场景，但是没有性能优势。

case 2

// Probes an array of control bits using a probe sequence derived from `hash`,
// and returns the offset corresponding to the first deleted or empty slot.
//
// Behavior when the entire table is full is undefined.
//
// NOTE: this function must work with tables having both empty and deleted
// slots in the same group. Such tables appear during `erase()`.
template <typename = void>
inline FindInfo find_first_non_full(const CommonFields& common, size_t hash) {
  auto seq = probe(common, hash);
  const ctrl_t* ctrl = common.control();
  // 第一次探测，不使用 SIMD 操作
  if (IsEmptyOrDeleted(ctrl[seq.offset()]) &&
      // 不知道干嘛的，非调试时不存在该条件
      !ShouldInsertBackwards(common.capacity(), hash, ctrl)) {
    return {seq.offset(), /*probe_length=*/0};
  }
  // 后续探测，使用 SIMD 操作反复查询
  while (true) {
    GroupFullEmptyOrDeleted g{ctrl + seq.offset()};
    // 直至找到 kEmpty | kDeleted
    auto mask = g.MaskEmptyOrDeleted();
    if (mask) {
      return {
          seq.offset(GetInsertionOffset(mask, common.capacity(), hash, ctrl)),
          seq.index()};
    }
    seq.next();
    assert(seq.index() <= common.capacity() && "full table!");
  }
}

case 2 相比 case 1 并不需要扩容，因为确定有余位可插入。而可插入到地方既可以是 kEmpty，也可以是 kDeleted。过程非常朴素，就是找位置，这些查找用的 SIMD 操作前面也有解释过。

case 3

// Called whenever the table needs to vacate empty slots either by removing
// tombstones via rehash or growth.
ABSL_ATTRIBUTE_NOINLINE
FindInfo FindInsertPositionWithGrowthOrRehash(CommonFields& common, size_t hash,
                                              const PolicyFunctions& policy) {
  const size_t cap = common.capacity();
  if (cap > Group::kWidth &&
      // Do these calculations in 64-bit to avoid overflow.
      // 比较接近 7/8 负载因子，数值设计在下面解释了（真·工作周边）
      common.size() * uint64_t{32} <= cap * uint64_t{25}) {
    // Squash DELETED without growing if there is enough capacity.
    //
    // Rehash in place if the current size is <= 25/32 of capacity.
    // Rationale for such a high factor: 1) DropDeletesWithoutResize() is
    // faster than resize, and 2) it takes quite a bit of work to add
    // tombstones.  In the worst case, seems to take approximately 4
    // insert/erase pairs to create a single tombstone and so if we are
    // rehashing because of tombstones, we can afford to rehash-in-place as
    // long as we are reclaiming at least 1/8 the capacity without doing more
    // than 2X the work.  (Where "work" is defined to be size() for rehashing
    // or rehashing in place, and 1 for an insert or erase.)  But rehashing in
    // place is faster per operation than inserting or even doubling the size
    // of the table, so we actually afford to reclaim even less space from a
    // resize-in-place.  The decision is to rehash in place if we can reclaim
    // at about 1/8th of the usable capacity (specifically 3/28 of the
    // capacity) which means that the total cost of rehashing will be a small
    // fraction of the total work.
    //
    // Here is output of an experiment using the BM_CacheInSteadyState
    // benchmark running the old case (where we rehash-in-place only if we can
    // reclaim at least 7/16*capacity) vs. this code (which rehashes in place
    // if we can recover 3/32*capacity).
    //
    // Note that although in the worst-case number of rehashes jumped up from
    // 15 to 190, but the number of operations per second is almost the same.
    //
    // Abridged output of running BM_CacheInSteadyState benchmark from
    // raw_hash_set_benchmark.   N is the number of insert/erase operations.
    //
    //      | OLD (recover >= 7/16        | NEW (recover >= 3/32)
    // size |    N/s LoadFactor NRehashes |    N/s LoadFactor NRehashes
    //  448 | 145284       0.44        18 | 140118       0.44        19
    //  493 | 152546       0.24        11 | 151417       0.48        28
    //  538 | 151439       0.26        11 | 151152       0.53        38
    //  583 | 151765       0.28        11 | 150572       0.57        50
    //  628 | 150241       0.31        11 | 150853       0.61        66
    //  672 | 149602       0.33        12 | 150110       0.66        90
    //  717 | 149998       0.35        12 | 149531       0.70       129
    //  762 | 149836       0.37        13 | 148559       0.74       190
    //  807 | 149736       0.39        14 | 151107       0.39        14
    //  852 | 150204       0.42        15 | 151019       0.42        15
    DropDeletesWithoutResize(common, policy);
  } else {
    // Otherwise grow the container.
    policy.resize(common, NextCapacity(cap), HashtablezInfoHandle{});
  }
  // This function is typically called with tables containing deleted slots.
  // The table will be big and `FindFirstNonFullAfterResize` will always
  // fallback to `find_first_non_full`. So using `find_first_non_full` directly.
  // 回到了 case 2
  return find_first_non_full(common, hash);
}

void DropDeletesWithoutResize(CommonFields& common,
                              const PolicyFunctions& policy) {
  void* set = &common;
  void* slot_array = common.slot_array();
  const size_t capacity = common.capacity();
  assert(IsValidCapacity(capacity));
  assert(!is_small(capacity));
  // 它把话都说完了，我也没啥好说的
  // Algorithm:
  // - mark all DELETED slots as EMPTY
  // - mark all FULL slots as DELETED
  // - for each slot marked as DELETED
  //     hash = Hash(element)
  //     target = find_first_non_full(hash)
  //     if target is in the same group
  //       mark slot as FULL
  //     else if target is EMPTY
  //       transfer element to target
  //       mark slot as EMPTY
  //       mark target as FULL
  //     else if target is DELETED
  //       swap current element with target element
  //       mark target as FULL
  //       repeat procedure for current slot with moved from element (target)
  ctrl_t* ctrl = common.control();
  ConvertDeletedToEmptyAndFullToDeleted(ctrl, capacity);
  const void* hash_fn = policy.hash_fn(common);
  auto hasher = policy.hash_slot;
  auto transfer = policy.transfer;
  const size_t slot_size = policy.slot_size;

  size_t total_probe_length = 0;
  void* slot_ptr = SlotAddress(slot_array, 0, slot_size);

  // The index of an empty slot that can be used as temporary memory for
  // the swap operation.
  constexpr size_t kUnknownId = ~size_t{};
  size_t tmp_space_id = kUnknownId;

  for (size_t i = 0; i != capacity;
       ++i, slot_ptr = NextSlot(slot_ptr, slot_size)) {
    assert(slot_ptr == SlotAddress(slot_array, i, slot_size));
    if (IsEmpty(ctrl[i])) {
      tmp_space_id = i;
      continue;
    }
    if (!IsDeleted(ctrl[i])) continue;
    const size_t hash = (*hasher)(hash_fn, slot_ptr);
    const FindInfo target = find_first_non_full(common, hash);
    const size_t new_i = target.offset;
    total_probe_length += target.probe_length;

    // Verify if the old and new i fall within the same group wrt the hash.
    // If they do, we don't need to move the object as it falls already in the
    // best probe we can.
    const size_t probe_offset = probe(common, hash).offset();
    const auto probe_index = [probe_offset, capacity](size_t pos) {
      return ((pos - probe_offset) & capacity) / Group::kWidth;
    };

    // Element doesn't move.
    if (ABSL_PREDICT_TRUE(probe_index(new_i) == probe_index(i))) {
      SetCtrl(common, i, H2(hash), slot_size);
      continue;
    }

    void* new_slot_ptr = SlotAddress(slot_array, new_i, slot_size);
    if (IsEmpty(ctrl[new_i])) {
      // Transfer element to the empty spot.
      // SetCtrl poisons/unpoisons the slots so we have to call it at the
      // right time.
      SetCtrl(common, new_i, H2(hash), slot_size);
      (*transfer)(set, new_slot_ptr, slot_ptr);
      SetCtrl(common, i, ctrl_t::kEmpty, slot_size);
      // Initialize or change empty space id.
      tmp_space_id = i;
    } else {
      assert(IsDeleted(ctrl[new_i]));
      SetCtrl(common, new_i, H2(hash), slot_size);
      // Until we are done rehashing, DELETED marks previously FULL slots.

      if (tmp_space_id == kUnknownId) {
        tmp_space_id = FindEmptySlot(i + 1, capacity, ctrl);
      }
      void* tmp_space = SlotAddress(slot_array, tmp_space_id, slot_size);
      SanitizerUnpoisonMemoryRegion(tmp_space, slot_size);

      // Swap i and new_i elements.
      (*transfer)(set, tmp_space, new_slot_ptr);
      (*transfer)(set, new_slot_ptr, slot_ptr);
      (*transfer)(set, slot_ptr, tmp_space);

      SanitizerPoisonMemoryRegion(tmp_space, slot_size);

      // repeat the processing of the ith slot
      --i;
      slot_ptr = PrevSlot(slot_ptr, slot_size);
    }
  }
  ResetGrowthLeft(common);
  common.infoz().RecordRehash(total_probe_length);
}

case 3 的上下文是删过但是又没有余位，因为这里可行的操作是做压缩（drop）处理以尽可能避开扩容。具体的数值设计很难权衡，总之是靠 benchmark 得出的，具体看官方注释。另外压缩用的算法也已经有很好的总结了，见 DropDeletesWithoutResize() 开头部分。

删除操作

TL;DR: find + alloc::destroy + 移除 metadata（修改大小、标记 kDeleted 等）。

查找操作

  template <class K = key_type>
  iterator find(const key_arg<K>& key) ABSL_ATTRIBUTE_LIFETIME_BOUND {
    AssertOnFind(key);
    if (is_soo()) return find_soo(key);
    prefetch_heap_block();
    return find_non_soo(key, hash_ref()(key));
  }

供外部使用的查找操作非常直接。如果看过前面的流程，基本都能猜到要做什么。

find SOO

  template <class K = key_type>
  iterator find_soo(const key_arg<K>& key) {
    assert(is_soo());
    return empty() || !PolicyTraits::apply(EqualElement<K>{key, eq_ref()},
                                           PolicyTraits::element(soo_slot()))
               ? end()
               : soo_iterator();
  }

SOO 是固定位置的，因此只需对 soo_slot() 进行 apply EqualElement 判定并返回 SOO 迭代器。

find non-SOO

  template <class K = key_type>
  iterator find_non_soo(const key_arg<K>& key, size_t hash) {
    assert(!is_soo());
    auto seq = probe(common(), hash);
    const ctrl_t* ctrl = control();
    while (true) {
      Group g{ctrl + seq.offset()};
      for (uint32_t i : g.Match(H2(hash))) {
        if (ABSL_PREDICT_TRUE(PolicyTraits::apply(
                EqualElement<K>{key, eq_ref()},
                PolicyTraits::element(slot_array() + seq.offset(i)))))
          return iterator_at(seq.offset(i));
      }
      if (ABSL_PREDICT_TRUE(g.MaskEmpty())) return end();
      seq.next();
      assert(seq.index() <= capacity() && "full table!");
    }
  }

non-SOO 的操作和前面 find_or_prepare_insert_non_soo() 相同，就是 SIMD 的批处理和探测。

THE END

不想调试了，就到此结束吧。反正没人在意具体实现。

其实从这些流程当中也可以理解 Swiss table 的高性能实现思路：就是先有一个好的想法，剩下全靠卷细节。也正是如此，这篇文章除了细节以外也没啥好总结的，RTFSC！