背景
Google Abseil 实现了一个高性能的哈希表 Swiss table。至于性能有多高可以参考具体 benchmark,简单来说,除了迭代遍历操作不算出众以外,其它都很不错。它的设计亮点是使用 SIMD 加速的查找操作、缓存友好的线性探测,以及(相比标准库主流实现)更加紧凑的内存布局,这些你也可以参考官方给的设计笔记。剩下的就不废话了,看代码吧。
NOTES:
- 代码以 Abseil LTS branch, July 2024 版本为准。
- Swiss table 的实现位于 absl::flat_hash_map。
Index
由于我的代码阅读习惯倾向于深度优先(见目录页),作为文章来说肯定是阅读不友好的。
所以这里先做一个索引页,整理出目录页不方便查看的位置:
- 内存布局:CommonFields、默认构造、resize_impl、slot type 和 slot 初始化。
- SIMD 操作:Group、ctrl bytes 和 prepare insert (non-SOO)。
- 探测算法:probe。
- 查找相关:find、erase 和 prepare insert(SOO/non-SOO)等,多数操作都会以查找为基础。
- SOO 优化:HeapOrSoo 和 prepare insert SOO。
- trivially relocatable 优化:transfer 和 resize_impl。
- prefetch 优化:prefetch 和 find。
- 特定上下文优化:很多,比如 prepare insert after SOO 和 prepare insert non-SOO。
内存布局
类型关系
首先需要了解 absl::flat_hash_map
内部类的关系才能梳理内存布局。
template <class K, class V, class Hash = DefaultHashContainerHash<K>,
class Eq = DefaultHashContainerEq<K>,
class Allocator = std::allocator<std::pair<const K, V>>>
class ABSL_ATTRIBUTE_OWNER flat_hash_map
: public absl::container_internal::raw_hash_map<
absl::container_internal::FlatHashMapPolicy<K, V>, Hash, Eq,
Allocator> {
using Base = typename flat_hash_map::raw_hash_map;
flat_hash_map() {}
using Base::Base;
using Base::begin;
using Base::end;
// ...
};
template <class Policy, class Hash, class Eq, class Alloc>
class raw_hash_map : public raw_hash_set<Policy, Hash, Eq, Alloc> {
// ...
};
template <class Policy, class Hash, class Eq, class Alloc>
class raw_hash_set {
// Bundle together CommonFields plus other objects which might be empty.
// CompressedTuple will ensure that sizeof is not affected by any of the empty
// fields that occur after CommonFields.
absl::container_internal::CompressedTuple<CommonFields, hasher, key_equal,
allocator_type>
settings_{CommonFields::CreateDefault<SooEnabled()>(), hasher{},
key_equal{}, allocator_type{}};
};
整个哈希表唯一的成员是继承于 raw_hash_set
基类的 settings_
。从注释可以得知,这个唯一的成员作为 CompressedTuple
类型使用是为了空基类优化(EBO),因为容器的 hasher
、key_equal
和 allocator_type
都可以是无状态的类型。另外需要关心的是 CommonFields
类型,这里才是常规意义上的成员。但是 CommonFields
类型的对象构造是以一个含 SooEnabled
标记的工厂函数去返回给 settings_
,这也暗示了小对象优化(SOO)。
CompressedTuple
template <typename... Ts>
class ABSL_INTERNAL_COMPRESSED_TUPLE_DECLSPEC CompressedTuple
: private internal_compressed_tuple::CompressedTupleImpl<
CompressedTuple<Ts...>, absl::index_sequence_for<Ts...>,
internal_compressed_tuple::ShouldAnyUseBase<Ts...>()> {
private:
template <int I>
using ElemT = internal_compressed_tuple::ElemT<CompressedTuple, I>;
template <int I>
using StorageT = internal_compressed_tuple::Storage<ElemT<I>, I>;
explicit constexpr CompressedTuple(const Ts&... base)
: CompressedTuple::CompressedTupleImpl(absl::in_place, base...) {}
template <int I>
constexpr ElemT<I>& get() & {
return StorageT<I>::get();
}
// ...
};
template <typename D, typename I, bool ShouldAnyUseBase>
struct ABSL_INTERNAL_COMPRESSED_TUPLE_DECLSPEC CompressedTupleImpl;
template <typename... Ts, size_t... I, bool ShouldAnyUseBase>
struct ABSL_INTERNAL_COMPRESSED_TUPLE_DECLSPEC CompressedTupleImpl<
CompressedTuple<Ts...>, absl::index_sequence<I...>, ShouldAnyUseBase>
: uses_inheritance,
Storage<Ts, std::integral_constant<size_t, I>::value>... {
constexpr CompressedTupleImpl() = default;
template <typename... Vs>
explicit constexpr CompressedTupleImpl(absl::in_place_t, Vs&&... args)
: Storage<Ts, I>(absl::in_place, std::forward<Vs>(args))... {}
friend CompressedTuple<Ts...>;
};
template <typename... Ts, size_t... I>
struct ABSL_INTERNAL_COMPRESSED_TUPLE_DECLSPEC CompressedTupleImpl<
CompressedTuple<Ts...>, absl::index_sequence<I...>, false>
: Storage<Ts, std::integral_constant<size_t, I>::value, false>... {
constexpr CompressedTupleImpl() = default;
template <typename... Vs>
explicit constexpr CompressedTupleImpl(absl::in_place_t, Vs&&... args)
: Storage<Ts, I, false>(absl::in_place, std::forward<Vs>(args))... {}
friend CompressedTuple<Ts...>;
};
// The storage class provides two specializations:
// - For empty classes, it stores T as a base class.
// - For everything else, it stores T as a member.
template <typename T, size_t I, bool UseBase = ShouldUseBase<T>()>
struct Storage {
T value;
constexpr Storage() = default;
template <typename V>
explicit constexpr Storage(absl::in_place_t, V&& v)
: value(std::forward<V>(v)) {}
constexpr const T& get() const& { return value; }
constexpr T& get() & { return value; }
constexpr const T&& get() const&& { return std::move(*this).value; }
constexpr T&& get() && { return std::move(*this).value; }
};
template <typename T, size_t I>
struct ABSL_INTERNAL_COMPRESSED_TUPLE_DECLSPEC Storage<T, I, true> : T {
constexpr Storage() = default;
template <typename V>
explicit constexpr Storage(absl::in_place_t, V&& v) : T(std::forward<V>(v)) {}
constexpr const T& get() const& { return *this; }
constexpr T& get() & { return *this; }
constexpr const T&& get() const&& { return std::move(*this); }
constexpr T&& get() && { return std::move(*this); }
};
其实整个类的动机在前面已经说明了,也没有继续分析的必要。简单来说,CompressedTuple
通过 CompressedTupleImpl -> Storage -> T
的继承链路(类型推导 → 存储方式 → 实质类型)处理空类体积问题,然后访问成员是通过 index_sequence
编译时确定,因此使用方式也和 std::tuple
差不多(get<>()
)。
CommonFields
// CommonFields hold the fields in raw_hash_set that do not depend
// on template parameters. This allows us to conveniently pass all
// of this state to helper functions as a single argument.
class CommonFields : public CommonFieldsGenerationInfo {
public:
CommonFields() : capacity_(0), size_(0), heap_or_soo_(EmptyGroup()) {}
explicit CommonFields(soo_tag_t) : capacity_(SooCapacity()), size_(0) {}
explicit CommonFields(full_soo_tag_t)
: capacity_(SooCapacity()), size_(size_t{1} << HasInfozShift()) {}
// ...
// The number of slots in the backing array. This is always 2^N-1 for an
// integer N. NOTE: we tried experimenting with compressing the capacity and
// storing it together with size_: (a) using 6 bits to store the corresponding
// power (N in 2^N-1), and (b) storing 2^N as the most significant bit of
// size_ and storing size in the low bits. Both of these experiments were
// regressions, presumably because we need capacity to do find operations.
size_t capacity_;
// The size and also has one bit that stores whether we have infoz.
size_t size_;
// Either the control/slots pointers or the SOO slot.
HeapOrSoo heap_or_soo_;
};
从这里可以看出 CommonFields = [capacity, size, ?]
的内存布局,最后一个成员 heap_or_soo_
表意模糊是因为又套了一层工程裹脚布(保命声明:作为一个基础库这是有必要的),先剧透一下:? = [control, slot_array]
,这个注释也有提到。
说到注释,这里 Google 记录了一段有趣的负优化历史。他们尝试将 capacity_
和 size_t
压缩到同一个 size_t
类型,毕竟对于二的幂使用 6 位就足够表示 N 了。但是实测下来反而是负反馈。
NOTES:
CommonFieldsGenerationInfo
是仅调试时使用,这里忽略。- 这里还提到
infoz
,这也是调试(统计)目的,忽略。 - 这里的布局并不完善,可以参考后续的 InitializeSlots 流程。
- contrl 和 slot_array 的关系见上图(来源:abseil 文档)。
HeapOrSoo
// Manages the backing array pointers or the SOO slot. When raw_hash_set::is_soo
// is true, the SOO slot is stored in `soo_data`. Otherwise, we use `heap`.
union HeapOrSoo {
HeapOrSoo() = default;
explicit HeapOrSoo(ctrl_t* c) : heap(c) {}
ctrl_t*& control() {
ABSL_SWISSTABLE_IGNORE_UNINITIALIZED_RETURN(heap.control);
}
ctrl_t* control() const {
ABSL_SWISSTABLE_IGNORE_UNINITIALIZED_RETURN(heap.control);
}
MaybeInitializedPtr& slot_array() {
ABSL_SWISSTABLE_IGNORE_UNINITIALIZED_RETURN(heap.slot_array);
}
MaybeInitializedPtr slot_array() const {
ABSL_SWISSTABLE_IGNORE_UNINITIALIZED_RETURN(heap.slot_array);
}
void* get_soo_data() {
ABSL_SWISSTABLE_IGNORE_UNINITIALIZED_RETURN(soo_data);
}
const void* get_soo_data() const {
ABSL_SWISSTABLE_IGNORE_UNINITIALIZED_RETURN(soo_data);
}
HeapPtrs heap;
unsigned char soo_data[sizeof(HeapPtrs)];
};
struct HeapPtrs {
HeapPtrs() = default;
explicit HeapPtrs(ctrl_t* c) : control(c) {}
// The control bytes (and, also, a pointer near to the base of the backing
// array).
//
// This contains `capacity + 1 + NumClonedBytes()` entries, even
// when the table is empty (hence EmptyGroup).
//
// Note that growth_info is stored immediately before this pointer.
// May be uninitialized for SOO tables.
ctrl_t* control;
// The beginning of the slots, located at `SlotOffset()` bytes after
// `control`. May be uninitialized for empty tables.
// Note: we can't use `slots` because Qt defines "slots" as a macro.
MaybeInitializedPtr slot_array;
};
注意 HeapOrSoo
是一个 union
,因此是在两个指针大小内(control
+ slot_array
)尝试做 SOO。后面再具体看怎么用。
初始化
default
从前面的构造函数可以看出,Base::Base
基本什么都不干,唯独需要 CommonFiled
构造。
class CommonFields /* ... */ {
// 这里传递 raw_hash_set::SooEnabled()
template <bool kSooEnabled>
static CommonFields CreateDefault() {
return kSooEnabled ? CommonFields{soo_tag_t{}} : CommonFields{};
}
// ...
};
class raw_hash_set /* ... */ {
constexpr static bool SooEnabled() {
// 首个条件默认为 true
return PolicyTraits::soo_enabled() &&
// 只与类型有关(运行时无关)
sizeof(slot_type) <= sizeof(HeapOrSoo) &&
alignof(slot_type) <= alignof(HeapOrSoo);
}
// ...
};
class CommonFields /* ... */ {
explicit CommonFields(soo_tag_t) : capacity_(SooCapacity()), size_(0) {}
// ...
};
// We only allow a maximum of 1 SOO element, which makes the implementation
// much simpler. Complications with multiple SOO elements include:
// - Satisfying the guarantee that erasing one element doesn't invalidate
// iterators to other elements means we would probably need actual SOO
// control bytes.
// - In order to prevent user code from depending on iteration order for small
// tables, we would need to randomize the iteration order somehow.
constexpr size_t SooCapacity() { return 1; }
总之,由于构造过程默认使用了 SOO,而 SOO 由于复杂度只可能是规模为 1,所以 capacity_ = 1, size_ = 0
。
initializer list
absl::flat_hash_map<int, int> example { {998, 244} };
那如果是通过初始化列表等方式去构造的话,会怎样?
// Instead of accepting std::initializer_list<value_type> as the first
// argument like std::unordered_set<value_type> does, we have two overloads
// that accept std::initializer_list<T> and std::initializer_list<init_type>.
// This is advantageous for performance.
//
// // Turns {"abc", "def"} into std::initializer_list<std::string>, then
// // copies the strings into the set.
// std::unordered_set<std::string> s = {"abc", "def"};
//
// // Turns {"abc", "def"} into std::initializer_list<const char*>, then
// // copies the strings into the set.
// absl::flat_hash_set<std::string> s = {"abc", "def"};
//
// The same trick is used in insert().
//
// The enabler is necessary to prevent this constructor from triggering where
// the copy constructor is meant to be called.
//
// absl::flat_hash_set<int> a, b{a};
//
// RequiresNotInit<T> is a workaround for gcc prior to 7.1.
// 默认不走这里
template <class T, RequiresNotInit<T> = 0, RequiresInsertable<T> = 0>
raw_hash_set(std::initializer_list<T> init, size_t bucket_count = 0,
const hasher& hash = hasher(), const key_equal& eq = key_equal(),
const allocator_type& alloc = allocator_type())
: raw_hash_set(init.begin(), init.end(), bucket_count, hash, eq, alloc) {}
// 走这里,init_type = std::pair</*non const*/ key_type, mapped_type>
raw_hash_set(std::initializer_list<init_type> init, size_t bucket_count = 0,
const hasher& hash = hasher(), const key_equal& eq = key_equal(),
const allocator_type& alloc = allocator_type())
: raw_hash_set(init.begin(), init.end(), bucket_count, hash, eq, alloc) {}
// 转换为迭代器风格
template <class InputIter>
raw_hash_set(InputIter first, InputIter last, size_t bucket_count = 0,
const hasher& hash = hasher(), const key_equal& eq = key_equal(),
const allocator_type& alloc = allocator_type())
: raw_hash_set(SelectBucketCountForIterRange(first, last, bucket_count),
hash, eq, alloc) {
insert(first, last);
}
template <class InputIter>
size_t SelectBucketCountForIterRange(InputIter first, InputIter last,
size_t bucket_count) {
if (bucket_count != 0) {
return bucket_count;
}
using InputIterCategory =
typename std::iterator_traits<InputIter>::iterator_category;
if (std::is_base_of<std::random_access_iterator_tag,
InputIterCategory>::value) {
return GrowthToLowerboundCapacity(
static_cast<size_t>(std::distance(first, last)));
}
return 0;
}
// Given `growth`, "unapplies" the load factor to find how large the capacity
// should be to stay within the load factor.
//
// This might not be a valid capacity and `NormalizeCapacity()` should be
// called on this.
inline size_t GrowthToLowerboundCapacity(size_t growth) {
// `growth*8/7`
// kWidth 表示 slots per group
// SSE2 的 kWidth == 16,不符合条件
if (Group::kWidth == 8 && growth == 7) {
// x+(x-1)/7 does not work when x==7.
return 8;
}
// 允许比 growth 多一点,反向使用 load factor (7/8)
return growth + static_cast<size_t>((static_cast<int64_t>(growth) - 1) / 7);
}
ABSL_ATTRIBUTE_NOINLINE explicit raw_hash_set(
size_t bucket_count, const hasher& hash = hasher(),
const key_equal& eq = key_equal(),
const allocator_type& alloc = allocator_type())
: settings_(CommonFields::CreateDefault<SooEnabled()>(), hash, eq,
alloc) {
if (bucket_count > (SooEnabled() ? SooCapacity() : 0)) {
resize(NormalizeCapacity(bucket_count));
}
}
在 SOO 的情况下,我们这里的示例为 bucket_count == 1
,因此构造并不需要 resize()
,只需 insert()
。
NOTE: 注释里提到的初始化列表为 std::initializer_list<init_type>
则是设计笔记中提到的延迟构造。具体的 init_type
定义在这里,约等于 std::pair<Key, Value>
,注意这里没有标准库要求的 const Key
。
插入操作
insert
template <class InputIt>
void insert(InputIt first, InputIt last) {
for (; first != last; ++first) emplace(*first);
}
// This overload kicks in if we can deduce the key from args. This enables us
// to avoid constructing value_type if an entry with the same key already
// exists.
//
// For example:
//
// flat_hash_map<std::string, std::string> m = { {"abc", "def"} };
// // Creates no std::string copies and makes no heap allocations.
// m.emplace("abc", "xyz");
template <class... Args, typename std::enable_if<
IsDecomposable<Args...>::value, int>::type = 0>
std::pair<iterator, bool> emplace(Args&&... args)
ABSL_ATTRIBUTE_LIFETIME_BOUND {
// EmplaceDecomposable 只是一个 functor 类型,它的构造没有副作用
return PolicyTraits::apply(EmplaceDecomposable{*this},
std::forward<Args>(args)...);
}
// This overload kicks in if we cannot deduce the key from args. It constructs
// value_type unconditionally and then either moves it into the table or
// destroys.
template <class... Args, typename std::enable_if<
!IsDecomposable<Args...>::value, int>::type = 0>
std::pair<iterator, bool> emplace(Args&&... args)
ABSL_ATTRIBUTE_LIFETIME_BOUND {
alignas(slot_type) unsigned char raw[sizeof(slot_type)];
slot_type* slot = to_slot(&raw);
construct(slot, std::forward<Args>(args)...);
const auto& elem = PolicyTraits::element(slot);
return PolicyTraits::apply(InsertSlot<true>{*this, std::move(*slot)}, elem);
}
这里需要 PolicyTraits::apply
和 EmplaceDecomposable
。
Policy
// Policy: a policy defines how to perform different operations on
// the slots of the hashtable (see hash_policy_traits.h for the full interface
// of policy).
template <class Policy, class Hash, class Eq, class Alloc>
class raw_hash_set {
using PolicyTraits = hash_policy_traits<Policy>;
// ...
};
template <class Policy, class Hash, class Eq, class Alloc>
class raw_hash_map : public raw_hash_set<Policy, Hash, Eq, Alloc> {
// ...
};
class ABSL_ATTRIBUTE_OWNER flat_hash_map
: public absl::container_internal::raw_hash_map<
// 这里是具体的 Policy
absl::container_internal::FlatHashMapPolicy<K, V>, Hash, Eq,
Allocator> { /*...*/ };
由于 Abseil 库具有多个 map/set 实现(比如为了迎合标准库的指针稳定性要求,引入 node 变种),因此这里使用 Policy 抽离了一些不同的机制。同时这也是 map 可以继承 set 的原因。
// Defines how slots are initialized/destroyed/moved.
template <class Policy, class = void>
struct hash_policy_traits : common_policy_traits<Policy> {
// Provides generalized access to the key for elements, both for elements in
// the table and for elements that have not yet been inserted (or even
// constructed). We would like an API that allows us to say: `key(args...)`
// but we cannot do that for all cases, so we use this more general API that
// can be used for many things, including the following:
//
// - Given an element in a table, get its key.
// - Given an element initializer, get its key.
// - Given `emplace()` arguments, get the element key.
//
// Implementations of this must adhere to a very strict technical
// specification around aliasing and consuming arguments:
//
// Let `value_type` be the result type of `element()` without ref- and
// cv-qualifiers. The first argument is a functor, the rest are constructor
// arguments for `value_type`. Returns `std::forward<F>(f)(k, xs...)`, where
// `k` is the element key, and `xs...` are the new constructor arguments for
// `value_type`. It's allowed for `k` to alias `xs...`, and for both to alias
// `ts...`. The key won't be touched once `xs...` are used to construct an
// element; `ts...` won't be touched at all, which allows `apply()` to consume
// any rvalues among them.
//
// If `value_type` is constructible from `Ts&&...`, `Policy::apply()` must not
// trigger a hard compile error unless it originates from `f`. In other words,
// `Policy::apply()` must be SFINAE-friendly. If `value_type` is not
// constructible from `Ts&&...`, either SFINAE or a hard compile error is OK.
//
// If `Ts...` is `[cv] value_type[&]` or `[cv] init_type[&]`,
// `Policy::apply()` must work. A compile error is not allowed, SFINAE or not.
template <class F, class... Ts, class P = Policy>
static auto apply(F&& f, Ts&&... ts)
-> decltype(P::apply(std::forward<F>(f), std::forward<Ts>(ts)...)) {
return P::apply(std::forward<F>(f), std::forward<Ts>(ts)...);
}
// ...
};
template <class K, class V>
struct FlatHashMapPolicy {
template <class F, class... Args>
static decltype(absl::container_internal::DecomposePair(
std::declval<F>(), std::declval<Args>()...))
apply(F&& f, Args&&... args) {
return absl::container_internal::DecomposePair(std::forward<F>(f),
std::forward<Args>(args)...);
}
// ...
};
// A helper function for implementing apply() in map policies.
template <class F, class... Args>
auto DecomposePair(F&& f, Args&&... args)
-> decltype(memory_internal::DecomposePairImpl(
std::forward<F>(f), PairArgs(std::forward<Args>(args)...))) {
return memory_internal::DecomposePairImpl(
std::forward<F>(f), PairArgs(std::forward<Args>(args)...));
}
// Given arguments of an std::pair's constructor, PairArgs() returns a pair of
// tuples with references to the passed arguments. The tuples contain
// constructor arguments for the first and the second elements of the pair.
//
// The following two snippets are equivalent.
//
// 1. std::pair<F, S> p(args...);
//
// 2. auto a = PairArgs(args...);
// std::pair<F, S> p(std::piecewise_construct,
// std::move(a.first), std::move(a.second));
inline std::pair<std::tuple<>, std::tuple<>> PairArgs() { return {}; }
template <class F, class S>
std::pair<std::tuple<F&&>, std::tuple<S&&>> PairArgs(F&& f, S&& s) {
return {std::piecewise_construct, std::forward_as_tuple(std::forward<F>(f)),
std::forward_as_tuple(std::forward<S>(s))};
}
template <class F, class S>
std::pair<std::tuple<const F&>, std::tuple<const S&>> PairArgs(
const std::pair<F, S>& p) {
return PairArgs(p.first, p.second);
}
template <class F, class S>
std::pair<std::tuple<F&&>, std::tuple<S&&>> PairArgs(std::pair<F, S>&& p) {
return PairArgs(std::forward<F>(p.first), std::forward<S>(p.second));
}
template <class F, class S>
auto PairArgs(std::piecewise_construct_t, F&& f, S&& s)
-> decltype(std::make_pair(memory_internal::TupleRef(std::forward<F>(f)),
memory_internal::TupleRef(std::forward<S>(s)))) {
return std::make_pair(memory_internal::TupleRef(std::forward<F>(f)),
memory_internal::TupleRef(std::forward<S>(s)));
}
template <class F, class K, class V>
decltype(std::declval<F>()(std::declval<const K&>(), std::piecewise_construct,
std::declval<std::tuple<K>>(), std::declval<V>()))
DecomposePairImpl(F&& f, std::pair<std::tuple<K>, V> p) {
const auto& key = std::get<0>(p.first);
return std::forward<F>(f)(key, std::piecewise_construct, std::move(p.first),
std::move(p.second));
}
policy 看着有点啰嗦,忽略转发的话 apply(f, args...)
就是 f(args...)
。然后作为 map policy,它还有必要从 args...
中分离出 key 和其它参数(注意 init_type
的定义,必须要手动分离),因此又写了一大堆 pair 和 tuple 的脏工作。总之它希望 map 特化 apply 能达成 f(key, args...)
的效果。
NOTE: 没细看,翻一半就转用调试器拿结果反推。
Decomposable
struct EmplaceDecomposable {
template <class K, class... Args>
std::pair<iterator, bool> operator()(const K& key, Args&&... args) const {
auto res = s.find_or_prepare_insert(key);
if (res.second) {
s.emplace_at(res.first, std::forward<Args>(args)...);
}
return res;
}
raw_hash_set& s;
};
目前上下文的 apply 具体操作是 EmplaceDecomposable
,字面意思是查找对应 key 然后 emplace_at
。
slot
// Constructs the value in the space pointed by the iterator. This only works
// after an unsuccessful find_or_prepare_insert() and before any other
// modifications happen in the raw_hash_set.
//
// PRECONDITION: iter was returned from find_or_prepare_insert(k), where k is
// the key decomposed from `forward<Args>(args)...`, and the bool returned by
// find_or_prepare_insert(k) was true.
// POSTCONDITION: *m.iterator_at(i) == value_type(forward<Args>(args)...).
template <class... Args>
void emplace_at(iterator iter, Args&&... args) {
construct(iter.slot(), std::forward<Args>(args)...);
assert(PolicyTraits::apply(FindElement{*this}, *iter) == iter &&
"constructed value does not match the lookup key");
}
查找过程比较复杂,那就先跳过。假设我们找到了某个位置(得到了 iterator
迭代器),后面就做 construct()
。整个过程非常简单,但是需要注意迭代器和 slot()
成员函数的复杂性。
// 简化的迭代器
class iterator : private /* ... */ {
ctrl_t* control() const { return ctrl_; }
slot_type* slot() const { return slot_; }
// We use EmptyGroup() for default-constructed iterators so that they can
// be distinguished from end iterators, which have nullptr ctrl_.
ctrl_t* ctrl_ = EmptyGroup();
// To avoid uninitialized member warnings, put slot_ in an anonymous union.
// The member is not initialized on singleton and end iterators.
union {
slot_type* slot_;
};
};
迭代器本身使用了 union
的技巧了提供合法未初始化的 slot_
成员,注意它的类型别名 slot_type*
。
using slot_type = map_slot_type<K, V>;
// The internal storage type for key-value containers like flat_hash_map.
//
// It is convenient for the value_type of a flat_hash_map<K, V> to be
// pair<const K, V>; the "const K" prevents accidental modification of the key
// when dealing with the reference returned from find() and similar methods.
// However, this creates other problems; we want to be able to emplace(K, V)
// efficiently with move operations, and similarly be able to move a
// pair<K, V> in insert().
//
// The solution is this union, which aliases the const and non-const versions
// of the pair. This also allows flat_hash_map<const K, V> to work, even though
// that has the same efficiency issues with move in emplace() and insert() -
// but people do it anyway.
//
// If kMutableKeys is false, only the value member can be accessed.
//
// If kMutableKeys is true, key can be accessed through all slots while value
// and mutable_value must be accessed only via INITIALIZED slots. Slots are
// created and destroyed via mutable_value so that the key can be moved later.
//
// Accessing one of the union fields while the other is active is safe as
// long as they are layout-compatible, which is guaranteed by the definition of
// kMutableKeys. For C++11, the relevant section of the standard is
// https://timsong-cpp.github.io/cppwp/n3337/class.mem#19 (9.2.19)
template <class K, class V>
union map_slot_type {
map_slot_type() {}
~map_slot_type() = delete;
using value_type = std::pair<const K, V>;
using mutable_value_type =
std::pair<absl::remove_const_t<K>, absl::remove_const_t<V>>;
value_type value;
mutable_value_type mutable_value;
absl::remove_const_t<K> key;
};
slot_type
是设计笔记中所提到的优化,也就是内部使用 mutable Key
,外部表现为 const Key
。注释提到通过 union
访问活跃对象以外的字段是安全的,不会违背严格别名规则。我已经忘掉这些规则了,就当它说得对。
construct
template <typename... Args>
inline void construct(slot_type* slot, Args&&... args) {
PolicyTraits::construct(&alloc_ref(), slot, std::forward<Args>(args)...);
}
// PRECONDITION: `slot` is UNINITIALIZED
// POSTCONDITION: `slot` is INITIALIZED
template <class Alloc, class... Args>
static void construct(Alloc* alloc, slot_type* slot, Args&&... args) {
Policy::construct(alloc, slot, std::forward<Args>(args)...);
}
template <class Allocator, class... Args>
static void construct(Allocator* alloc, slot_type* slot, Args&&... args) {
slot_policy::construct(alloc, slot, std::forward<Args>(args)...);
}
// If pair<const K, V> and pair<K, V> are layout-compatible, we can accept one
// or the other via slot_type. We are also free to access the key via
// slot_type::key in this case.
using kMutableKeys = memory_internal::IsLayoutCompatible<K, V>;
template <class Allocator, class... Args>
static void construct(Allocator* alloc, slot_type* slot, Args&&... args) {
emplace(slot);
if (kMutableKeys::value) {
// 这里 absl 是直接 using std::... 实现(历史原因保留命名)
absl::allocator_traits<Allocator>::construct(*alloc, &slot->mutable_value,
std::forward<Args>(args)...);
} else {
absl::allocator_traits<Allocator>::construct(*alloc, &slot->value,
std::forward<Args>(args)...);
}
}
// 为了处理严格别名
static void emplace(slot_type* slot) {
// The construction of union doesn't do anything at runtime but it allows us
// to access its members without violating aliasing rules.
new (slot) slot_type;
}
总之,insert()
流程找到 iterator.slot()
后,后续 construct()
就是一顿 placement new。
prepare insert
// Attempts to find `key` in the table; if it isn't found, returns an iterator
// where the value can be inserted into, with the control byte already set to
// `key`'s H2. Returns a bool indicating whether an insertion can take place.
template <class K>
std::pair<iterator, bool> find_or_prepare_insert(const K& key) {
AssertOnFind(key);
// 走这里
// is_soo() 要求当前 capacity() 不超过 SOO 范围
if (is_soo()) return find_or_prepare_insert_soo(key);
// 这个留到后面再解释
return find_or_prepare_insert_non_soo(key);
}
续上前面忽略的查找操作:在实际插入前需要查找对应的 key 从而构造出对应的 iterator。
这里区分两种情况,SOO 和 non-SOO。注意这种查找操作是为插入操作定制的,而常规查找接口为 find()
。后续分两章来解释这种特殊的查找操作,为了避免歧义,暂称为预插入操作。
预插入(SOO)
template <class K>
std::pair<iterator, bool> find_or_prepare_insert_soo(const K& key) {
// 空容器的首次执行
if (empty()) {
const HashtablezInfoHandle infoz = try_sample_soo();
// 恒为 false
if (infoz.IsSampled()) {
resize_with_soo_infoz(infoz);
} else {
// 表示 SOO 已填充,size() = 1。后续 empty() 不再符合条件
// 注意 size_ 成员和 size() 函数并不相等,后者才算外部接口
common().set_full_soo();
// 返回这里
return {soo_iterator(), true};
}
// 插入后第二次执行,size() == 1
} else if (PolicyTraits::apply(EqualElement<K>{key, eq_ref()},
PolicyTraits::element(soo_slot()))) {
// 如果有匹配(已存在)的 key
return {soo_iterator(), false};
// SOO 没有匹配
} else {
// 这是预插入阶段,因此没有位置就得扩容(扩容后不再 SOO)
// NextCapacity(n) 按照 n = n * 2 + 1 扩容,得到 3
resize(NextCapacity(SooCapacity()));
}
// 特定上下文的定位操作
// 上面 resize() 操作肯定能确定 index==1 已占位
// 那么只会在 0 和 2 下标处定位
const size_t index =
PrepareInsertAfterSoo(hash_ref()(key), sizeof(slot_type), common());
return {iterator_at(index), true};
}
iterator soo_iterator() {
// 不使用 generation_ptr,即 nullptr
return {SooControl(), soo_slot(), common().generation_ptr()};
}
CommonFields& common() { return settings_.template get<0>(); }
想不明白,谷歌怎么老挂念它那小得可怜的 SOO 容量,毫无意义。
首次操作是比较简单的 SOO 场景,因为必为 empty()
,所以直接构造就好。这里 soo_iterator
只是简单的从 common 拿到指针,然后返回。那么接下来就知道 slot 的地址,往里面插入即可。
同理,如果第二次插入操作可以匹配到 SOO slot,那也是直接构造,因为 SOO map 场景 capacity 不会超过 1,只需要 pair.second 提示已存在。如果这一次插入不能匹配,那么需要扩容。
扩容操作
// Resizes table to the new capacity and move all elements to the new
// positions accordingly.
//
// Note that for better performance instead of
// find_first_non_full(common(), hash),
// HashSetResizeHelper::FindFirstNonFullAfterResize(
// common(), old_capacity, hash)
// can be called right after `resize`.
void resize(size_t new_capacity) {
raw_hash_set::resize_impl(common(), new_capacity, HashtablezInfoHandle{});
}
resize_impl
// Resizes set to the new capacity.
// It is a static function in order to use its pointer in GetPolicyFunctions.
ABSL_ATTRIBUTE_NOINLINE static void resize_impl(
CommonFields& common, size_t new_capacity,
HashtablezInfoHandle forced_infoz) {
raw_hash_set* set = reinterpret_cast<raw_hash_set*>(&common);
assert(IsValidCapacity(new_capacity));
assert(!set->fits_in_soo(new_capacity));
// 开启 SOO 且 capacity() 符合 SOO 要求
const bool was_soo = set->is_soo();
// slot 插入过元素
const bool had_soo_slot = was_soo && !set->empty();
// 得到 7 bit 大小的 H2 值,可以参考官方的设计笔记
// hash_of(slot)
// => PolicyTraits::apply(HashElement{hash_ref()}, PolicyTraits::element(slot))
// => HashElement{hash_ref()}(key, static_cast<std::pair<const K, V>&>(slot->value))
// => hash_ref()(key)
//
// H2(size_t hash): uint8_t
// => hash & 0x7F
// 如果此前不存在则返回空的 kEmpty (0b10000000)
const ctrl_t soo_slot_h2 =
had_soo_slot ? static_cast<ctrl_t>(H2(set->hash_of(set->soo_slot())))
: ctrl_t::kEmpty;
HashSetResizeHelper resize_helper(common, was_soo, had_soo_slot,
forced_infoz);
// Initialize HashSetResizeHelper::old_heap_or_soo_. We can't do this in
// HashSetResizeHelper constructor because it can't transfer slots when
// transfer_uses_memcpy is false.
if (PolicyTraits::transfer_uses_memcpy() || !had_soo_slot) {
// old_heap_or_soo 的初始化,数据原样拷贝,从而使得后续 common 扩容再迁移
resize_helper.old_heap_or_soo() = common.heap_or_soo();
} else {
// to_slot() 从 uchar[] 转为 map_slot_type*
// 移动 slot,需要销毁操作
set->transfer(set->to_slot(resize_helper.old_soo_data()),
set->soo_slot());
}
// 下面准备扩容
common.set_capacity(new_capacity);
// Note that `InitializeSlots` does different number initialization steps
// depending on the values of `transfer_uses_memcpy` and capacities.
// Refer to the comment in `InitializeSlots` for more details.
const bool grow_single_group =
// 执行分配、构造操作
// 这里不会做销毁或解分配操作(看本函数最后一段代码),因为 common 可以是 SOO 形式
resize_helper.InitializeSlots<CharAlloc, sizeof(slot_type),
PolicyTraits::transfer_uses_memcpy(),
SooEnabled(), alignof(slot_type)>(
common, CharAlloc(set->alloc_ref()), soo_slot_h2, sizeof(key_type),
sizeof(value_type));
// In the SooEnabled() case, capacity is never 0 so we don't check.
if (!SooEnabled() && resize_helper.old_capacity() == 0) {
// InitializeSlots did all the work including infoz().RecordRehash().
return;
}
assert(resize_helper.old_capacity() > 0);
// Nothing more to do in this case.
if (was_soo && !had_soo_slot) return;
slot_type* new_slots = set->slot_array();
if (grow_single_group) {
if (PolicyTraits::transfer_uses_memcpy()) {
// InitializeSlots did all the work.
return;
}
if (was_soo) {
// 和前面的 transfer 不同,这是 old_soo_data 到 new_slots 之间的操作
// 只有 SOO slot 要处理
set->transfer(new_slots + resize_helper.SooSlotIndex(),
to_slot(resize_helper.old_soo_data()));
return;
} else {
// We want GrowSizeIntoSingleGroup to be called here in order to make
// InitializeSlots not depend on PolicyTraits.
resize_helper.GrowSizeIntoSingleGroup<PolicyTraits>(common,
set->alloc_ref());
}
// !grow_single_group
} else {
// InitializeSlots prepares control bytes to correspond to empty table.
const auto insert_slot = [&](slot_type* slot) {
size_t hash = PolicyTraits::apply(HashElement{set->hash_ref()},
PolicyTraits::element(slot));
// 这个函数和 target 变量留到后面再说
// 从名字可知是一个查找过程,返回一个包含位置的信息
auto target = find_first_non_full(common, hash);
// 和前面的操作类似,找到 slot 位置(offset)就修改对应的 ctrl
SetCtrl(common, target.offset, H2(hash), sizeof(slot_type));
// 并且迁移对应的 slot 元素
set->transfer(new_slots + target.offset, slot);
return target.probe_length;
};
if (was_soo) {
insert_slot(to_slot(resize_helper.old_soo_data()));
return;
} else {
auto* old_slots = static_cast<slot_type*>(resize_helper.old_slots());
size_t total_probe_length = 0;
for (size_t i = 0; i != resize_helper.old_capacity(); ++i) {
if (IsFull(resize_helper.old_ctrl()[i])) {
total_probe_length += insert_slot(old_slots + i);
}
}
common.infoz().RecordRehash(total_probe_length);
}
}
// 最后的解分配工作
resize_helper.DeallocateOld<alignof(slot_type)>(CharAlloc(set->alloc_ref()),
sizeof(slot_type));
}
上面的函数细节很多,注释写了一点概况,更多细节见下面小节。
transfer
// 如果可以使用 memcpy 进行转移,需要判断 Policy::transfer() 返回类型是否为 std::true_type
// 这取决于:
// FlatHashMapPolicy::slot_policy::transfer()
// => container_internal::map_slot_policy<K, V>::transfer()
// => absl::is_trivially_relocatable<std::pair<const K, V>>
// 在 absl 的实现中它会 fallback 到 std::is_trivially_copyable
if (PolicyTraits::transfer_uses_memcpy() || !had_soo_slot) {
// old_heap_or_soo 的初始化
resize_helper.old_heap_or_soo() = common.heap_or_soo();
} else {
// https://github.com/abseil/abseil-cpp/issues/755#issuecomment-670136413
// transfer 使用 std::launder 的解释
// to_slot 从 uchar[] 转为 map_slot_type*
set->transfer(set->to_slot(resize_helper.old_soo_data()),
set->soo_slot());
}
在前面的 resize_impl()
代码中,转移操作区分了两个版本,条件 transfer_uses_memcpy()
实际取决于 absl::is_trivially_relocatable<std::pair<const K, V>>
,也就是移动和析构能否使用 memcpy()
替换。
| Trivial relocation | Relocation with move constructor |
|-----------------------------------------------|------------------------------------------------|
| // allocate destination memory | // allocate destination memory |
| dest = ::operator new(sizeof(Type)); | dest = ::operator new(sizeof(Type)); |
| | |
| // copy bytes | // move construct |
| memcpy(dest, source,sizeof(Type)); | ::new(dest) Type(std::move(*source)); |
| | |
| // deallocate source | // destruct source |
| ::operator delete(source); | source->~Type(); |
| | |
| | // deallocate source |
| | ::operator delete(source); |
这是一种优化手段,具体可以看下提案 P1144,更加具体的动机示例可以看这份演讲稿或者上表总结。目前使用这个条件来复制 common.heap_or_soo()
,也就是 backing array。
inline void transfer(slot_type* to, slot_type* from) {
// 做了一些统计处理,不考虑的话就是直接调用 lambda
common().RunWithReentrancyGuard(
[&] { PolicyTraits::transfer(&alloc_ref(), to, from); });
}
template <class Allocator>
static auto transfer(Allocator* alloc, slot_type* new_slot,
slot_type* old_slot) {
return slot_policy::transfer(alloc, new_slot, old_slot);
}
template <class Allocator>
static auto transfer(Allocator* alloc, slot_type* new_slot,
slot_type* old_slot) {
// 前面提过了,transfer 取决于 relocatable
auto is_relocatable =
typename absl::is_trivially_relocatable<value_type>::type();
// 为了处理 union 激活,可理解为空实现
emplace(new_slot);
#if defined(__cpp_lib_launder) && __cpp_lib_launder >= 201606
if (is_relocatable) {
// value 使用了类型双关,为了避免问题直接拿 std::launder() 洗掉优化
std::memcpy(static_cast<void*>(std::launder(&new_slot->value)),
static_cast<const void*>(&old_slot->value),
sizeof(value_type));
return is_relocatable;
}
#endif
// 如果可以使用 mutable key
if (kMutableKeys::value) {
absl::allocator_traits<Allocator>::construct(
*alloc, &new_slot->mutable_value, std::move(old_slot->mutable_value));
} else {
absl::allocator_traits<Allocator>::construct(*alloc, &new_slot->value,
std::move(old_slot->value));
}
destroy(alloc, old_slot);
return is_relocatable;
}
如果具备 trivially relocatable 条件,只需 std::memcpy()
。
如果不具备 trivially relocatable 条件,那就需要显式地迁移 slot,也就是需要分配器策略去处理 construct()
和 destroy()
。
InitializeSlots
// Allocates a backing array for the hashtable.
// Reads `capacity` and updates all other fields based on the result of
// the allocation.
//
// It also may do the following actions:
// 1. initialize control bytes
// 2. initialize slots
// 3. deallocate old slots.
//
// We are bundling a lot of functionality
// in one ABSL_ATTRIBUTE_NOINLINE function in order to minimize binary code
// duplication in raw_hash_set<>::resize.
//
// `c.capacity()` must be nonzero.
// POSTCONDITIONS:
// 1. CommonFields is initialized.
//
// if IsGrowingIntoSingleGroupApplicable && TransferUsesMemcpy
// Both control bytes and slots are fully initialized.
// old_slots are deallocated.
// infoz.RecordRehash is called.
//
// if IsGrowingIntoSingleGroupApplicable && !TransferUsesMemcpy
// Control bytes are fully initialized.
// infoz.RecordRehash is called.
// GrowSizeIntoSingleGroup must be called to finish slots initialization.
//
// if !IsGrowingIntoSingleGroupApplicable
// Control bytes are initialized to empty table via ResetCtrl.
// raw_hash_set<>::resize must insert elements regularly.
// infoz.RecordRehash is called if old_capacity == 0.
//
// Returns IsGrowingIntoSingleGroupApplicable result to avoid recomputation.
// 这里注释已经说明了大概的思路,就是分配和构造新的 ctrl + slots,然后销毁旧副本
// 也提到一些琐碎的实践,比如标记 noinline 减小二进制体积
// 其他的复杂度就是区分不同的上下文进行微操优化
template <typename Alloc, size_t SizeOfSlot, bool TransferUsesMemcpy,
bool SooEnabled, size_t AlignOfSlot>
ABSL_ATTRIBUTE_NOINLINE bool InitializeSlots(CommonFields& c, Alloc alloc,
ctrl_t soo_slot_h2,
size_t key_size,
size_t value_size) {
assert(c.capacity());
// 空实现
HashtablezInfoHandle infoz =
ShouldSampleHashtablezInfo<Alloc>()
? SampleHashtablezInfo<SooEnabled>(SizeOfSlot, key_size, value_size,
old_capacity_, was_soo_,
forced_infoz_, c)
: HashtablezInfoHandle{};
const bool has_infoz = infoz.IsSampled();
RawHashSetLayout layout(c.capacity(), AlignOfSlot, has_infoz);
// 接近于 alloc::allocate(),只是为了处理对齐
char* mem = static_cast<char*>(Allocate<BackingArrayAlignment(AlignOfSlot)>(
&alloc, layout.alloc_size(SizeOfSlot)));
const GenerationType old_generation = c.generation();
c.set_generation_ptr(
reinterpret_cast<GenerationType*>(mem + layout.generation_offset()));
c.set_generation(NextGeneration(old_generation));
// 可以看出 control 和 slot 共享同一块分配的 mem 内存地址
c.set_control(reinterpret_cast<ctrl_t*>(mem + layout.control_offset()));
// 从下面 RawHashSetLayout 可以推导出
// mem = [GrowInfo, ctrl, [padding,] slots]
// 其中:
// * ctrl 所占字节数比 capacity 数值要大一点(capacity + Group::kWidth)
// 因为这是要包含 cloned bytes
// * GrowInfo 需要占用一个 size_t 大小,尚未初始化
c.set_slots(mem + layout.slot_offset());
// 处理 GrowInfo,记录数值 capacity()*7/8 - size()
// 这里 7/8 是 Swiss Table 的最大负载因子
// NOTE: 还是有点小心思,最高位是特殊删除标记
ResetGrowthLeft(c);
const bool grow_single_group =
// 两个条件:
// 1. capacity 不超过一个 group(layout.capacity <= Group::kWidth;)
// 2. old capacity 小于 capacity,注释写道:为了更快拷贝 8 字节
// 推测条件 2 的意思是 ctrl 处理 cloned bytes 不需要取模了
// 以及扩容是至少倍增的,它需要一半的空间去 shuffle 周转
IsGrowingIntoSingleGroupApplicable(old_capacity_, layout.capacity());
// 这里面微操优化很多
//
// 如果之前是 SOO,就走第一条分支
// 再次插入走第二条分支,
// 超出 single group 单群小表限制后再到第三条分支
if (SooEnabled && was_soo_ && grow_single_group) {
// 见下面辅助实现的注释
InitControlBytesAfterSoo(c.control(), soo_slot_h2, layout.capacity());
// 这个条件见 resize_impl 的注释
if (TransferUsesMemcpy && had_soo_slot_) {
// 见下面辅助实现的注释
TransferSlotAfterSoo(c, SizeOfSlot);
}
// SooEnabled implies that old_capacity_ != 0.
// 运行时非 SOO 的情况(!was_soo_)
//
// 上面那条官方注释看着不太对劲,又有点道理
// 虽然 SooEnabled 只与编译期 slot_type 类型别名有关
// 但是确实能至少空出大小为 1 的 old_capacity_(HeapOrSoo)
} else if ((SooEnabled || old_capacity_ != 0) && grow_single_group) {
if (TransferUsesMemcpy) {
// 处理 transferable objects
// 这一行相当于同时做了:
// * GrowIntoSingleGroupShuffleControlBytes(c.control(), c.capacity());
// * GrowIntoSingleGroupShuffleTransferableSlots(c.slot_array(), slot_size);
// 其实我没太看懂,见 else 的注释
// 也可以见 prepare insert non-SOO 中的 case 1,使得查找甚至不需要判断是否为空
GrowSizeIntoSingleGroupTransferable(c, SizeOfSlot);
DeallocateOld<AlignOfSlot>(alloc, SizeOfSlot);
} else {
// 看了一眼注释很复杂,是近期做的改动,把 control bytes 打散了
// 不知道是出于什么目的,commit 没写
// https://github.com/abseil/abseil-cpp/commit/0ef87fa0c1d9cff4c231a2a3af854f0b5edfa92b
// 字面理解是「SIMD 的存在使得一个 group 内的元素顺序并不重要」
// 确实有道理,但是动机是为了啥?
//
// UPDATE: 噢找到了,.h/L1680 说道:为了不让你猜到数据是啥,提高安全性
GrowIntoSingleGroupShuffleControlBytes(c.control(), layout.capacity());
}
// 其它条件,比如 !grow_single_group
// 初始化 ctrl 为 kEmpty 和 kSentinel
} else {
ResetCtrl(c, SizeOfSlot);
}
c.set_has_infoz(has_infoz);
if (has_infoz) {
infoz.RecordStorageChanged(c.size(), layout.capacity());
if ((SooEnabled && was_soo_) || grow_single_group || old_capacity_ == 0) {
infoz.RecordRehash(0);
}
c.set_infoz(infoz);
}
return grow_single_group;
}
///////////////////////////////////////////////////////////////// 下面的都是辅助实现
// Helper class for computing offsets and allocation size of hash set fields.
// 计算 backing array 各个成员的起始偏移
class RawHashSetLayout {
public:
explicit RawHashSetLayout(size_t capacity, size_t slot_align, bool has_infoz)
: capacity_(capacity),
// ControlOffset() 返回 sizeof(HashtablezInfoHandle)? + sizeof(GrowthInfo)
// 这里只考虑后者 sizeof(GrowthInfo)
control_offset_(ControlOffset(has_infoz)),
// NumControlBytes(capacity) 返回 capacity + 1 + NumClonedBytes()
// NumClonedBytes() 等于 Group::kWidth - 1
generation_offset_(control_offset_ + NumControlBytes(capacity)),
slot_offset_(
// NumGenerationBytes() 返回 0
// 可能在前面需要留下 padding
(generation_offset_ + NumGenerationBytes() + slot_align - 1) &
(~slot_align + 1)) {
assert(IsValidCapacity(capacity));
}
// Returns the capacity of a table.
size_t capacity() const { return capacity_; }
// Returns precomputed offset from the start of the backing allocation of
// control.
size_t control_offset() const { return control_offset_; }
// Given the capacity of a table, computes the offset (from the start of the
// backing allocation) of the generation counter (if it exists).
size_t generation_offset() const { return generation_offset_; }
// Given the capacity of a table, computes the offset (from the start of the
// backing allocation) at which the slots begin.
size_t slot_offset() const { return slot_offset_; }
// Given the capacity of a table, computes the total size of the backing
// array.
size_t alloc_size(size_t slot_size) const {
return slot_offset_ + capacity_ * slot_size;
}
private:
size_t capacity_;
size_t control_offset_;
size_t generation_offset_;
size_t slot_offset_;
};
// Returns the number of "cloned control bytes".
//
// This is the number of control bytes that are present both at the beginning
// of the control byte array and at the end, such that we can create a
// `Group::kWidth`-width probe window starting from any control byte.
constexpr size_t NumClonedBytes() { return Group::kWidth - 1; }
// Allocates at least n bytes aligned to the specified alignment.
// Alignment must be a power of 2. It must be positive.
//
// Note that many allocators don't honor alignment requirements above certain
// threshold (usually either alignof(std::max_align_t) or alignof(void*)).
// Allocate() doesn't apply alignment corrections. If the underlying allocator
// returns insufficiently alignment pointer, that's what you are going to get.
template <size_t Alignment, class Alloc>
void* Allocate(Alloc* alloc, size_t n) {
static_assert(Alignment > 0, "");
assert(n && "n must be positive");
using M = AlignedType<Alignment>;
using A = typename absl::allocator_traits<Alloc>::template rebind_alloc<M>;
using AT = typename absl::allocator_traits<Alloc>::template rebind_traits<M>;
// On macOS, "mem_alloc" is a #define with one argument defined in
// rpc/types.h, so we can't name the variable "mem_alloc" and initialize it
// with the "foo(bar)" syntax.
A my_mem_alloc(*alloc);
void* p = AT::allocate(my_mem_alloc, (n + sizeof(M) - 1) / sizeof(M));
assert(reinterpret_cast<uintptr_t>(p) % Alignment == 0 &&
"allocator does not respect alignment");
return p;
}
inline void ResetGrowthLeft(CommonFields& common) {
// 初始化 grow_info 唯一的成员,其数值为传入的参数
common.growth_info().InitGrowthLeftNoDeleted(
CapacityToGrowth(common.capacity()) - common.size());
}
// General notes on capacity/growth methods below:
// - We use 7/8th as maximum load factor. For 16-wide groups, that gives an
// average of two empty slots per group.
// - For (capacity+1) >= Group::kWidth, growth is 7/8*capacity.
// - For (capacity+1) < Group::kWidth, growth == capacity. In this case, we
// never need to probe (the whole table fits in one group) so we don't need a
// load factor less than 1.
// Given `capacity`, applies the load factor; i.e., it returns the maximum
// number of values we should put into the table before a resizing rehash.
// 返回 7/8 的 capacity。这里 7/8 是设计好的负载因子
inline size_t CapacityToGrowth(size_t capacity) {
assert(IsValidCapacity(capacity));
// `capacity*7/8`
if (Group::kWidth == 8 && capacity == 7) {
// x-x/8 does not work when x==7.
return 6;
}
return capacity - capacity / 8;
}
///////////////////////////////////////////////////////////////// 不同的上下文
// If the table was SOO, initializes new control bytes. `h2` is the control
// byte corresponding to the full slot. Must be called only if
// IsGrowingIntoSingleGroupApplicable returned true.
// Requires: `had_soo_slot_ || h2 == ctrl_t::kEmpty`.
void HashSetResizeHelper::InitControlBytesAfterSoo(ctrl_t* new_ctrl, ctrl_t h2,
size_t new_capacity) {
assert(is_single_group(new_capacity));
// ctrl 全部初始化为 kEmpty
std::memset(new_ctrl, static_cast<int8_t>(ctrl_t::kEmpty),
NumControlBytes(new_capacity));
assert(HashSetResizeHelper::SooSlotIndex() == 1);
// This allows us to avoid branching on had_soo_slot_.
assert(had_soo_slot_ || h2 == ctrl_t::kEmpty);
// 固定 index=1 设为原来的 H2
// 后面那个是 cloned byte
new_ctrl[1] = new_ctrl[new_capacity + 2] = h2;
// 哨兵字节
new_ctrl[new_capacity] = ctrl_t::kSentinel;
}
void HashSetResizeHelper::TransferSlotAfterSoo(CommonFields& c,
size_t slot_size) {
assert(was_soo_);
assert(had_soo_slot_);
assert(is_single_group(c.capacity()));
// 对应下标的 slot 初始化,注意 SooSlotIndex() 固定为 1
std::memcpy(SlotAddress(c.slot_array(), SooSlotIndex(), slot_size),
old_soo_data(), slot_size);
// 配合 ASAN 算法,对 non-full slot 投毒,略
PoisonSingleGroupEmptySlots(c, slot_size);
}
初始化会通过 allocator 分配新的 backing array,具体内存布局为 [GrowInfo, ctrl, [padding,] slots]
,这四个「成员」共用一次 allocate 就能完成分配。其中:
ctrl
所占字节数比 capacity 数值要大一点(capacity + Group::kWidth),因为这是要包含 cloned bytes。通常 SSE2 版本的 kWidth 设为 16。GrowInfo
需要占用一个size_t
大小,简单来说就是提前算好下次扩容前还能容纳几个元素,这个值已经考虑了负载因子。注意最高位是特殊的删除标记,表示 Topmost bit signal whenever there are deleted slots。slots
所占字节数为capacity * sizeof(slot_type)
,但是考虑对齐的话可能还会在前面多出padding
区域。
分配后会进行「构造」,ctrl[capacity]
设为 kSentinel
,ctrl[1]
和 cloned byte 都设为迁移元素的 H2 哈希值,其他区域均为 kEmpty
。注意这里下标 1 是固定的,因此原有的(唯一)元素会迁移到 slot 1。
NOTE: H2 是完整哈希值的低 7 位,面向 ctrl。剩余的高 57 位则是 H1,后面会看到。
prepare insert after SOO
size_t PrepareInsertAfterSoo(size_t hash, size_t slot_size,
CommonFields& common) {
assert(common.capacity() == NextCapacity(SooCapacity()));
// After resize from capacity 1 to 3, we always have exactly the slot with
// index 1 occupied, so we need to insert either at index 0 or index 2.
assert(HashSetResizeHelper::SooSlotIndex() == 1);
// 增加 size() 一个单位
PrepareInsertCommon(common);
// H1(hash, ctrl)
// => (hash >> 7) ^ (reinterpret_cast<uintptr_t>(ctrl) >> 12)
// 一些小优化,H1 还会对 ctrl 加盐(使得扩容时改动哈希值的分布),2^^12 刚好是页表大小
// 注意这里是 H1() & 2,所以下标要么是 0,要么是 2
const size_t offset = H1(hash, common.control()) & 2;
// 减少 growth_info 一个单位
common.growth_info().OverwriteEmptyAsFull();
// 等价于 ctrl[offset] = ctrl[cloned] = H2(hash)
SetCtrlInSingleGroupTable(common, offset, H2(hash), slot_size);
common.infoz().RecordInsert(hash, /*distance_from_desired=*/0);
// 返回下标,给迭代器用
return offset;
}
///////////////////////////////////////////////////////////////// 一些辅助函数
// 整个函数等价于 size() += 1
inline void PrepareInsertCommon(CommonFields& common) {
common.increment_size();
common.maybe_increment_generation_on_insert();
}
// Extracts the H1 portion of a hash: 57 bits mixed with a per-table salt.
inline size_t H1(size_t hash, const ctrl_t* ctrl) {
return (hash >> 7) ^ PerTableSalt(ctrl);
}
// Returns a per-table, hash salt, which changes on resize. This gets mixed into
// H1 to randomize iteration order per-table.
//
// The seed consists of the ctrl_ pointer, which adds enough entropy to ensure
// non-determinism of iteration order in most cases.
inline size_t PerTableSalt(const ctrl_t* ctrl) {
// The low bits of the pointer have little or no entropy because of
// alignment. We shift the pointer to try to use higher entropy bits. A
// good number seems to be 12 bits, because that aligns with page size.
return reinterpret_cast<uintptr_t>(ctrl) >> 12;
}
// Like SetCtrl, but in a single group table, we can save some operations when
// setting the cloned control byte.
inline void SetCtrlInSingleGroupTable(const CommonFields& c, size_t i, ctrl_t h,
size_t slot_size) {
assert(is_single_group(c.capacity()));
DoSanitizeOnSetCtrl(c, i, h, slot_size);
ctrl_t* ctrl = c.control();
ctrl[i] = h;
ctrl[i + c.capacity() + 1] = h;
}
扩容执行后,还需要为迭代器定位到一个具体的下标。这里的一点小心思是针对特定上下文的优化,因为明确扩容后的下标 1 是已被占位的,所以只需通过 H1 从 0 和 2 当中定位即可。
这里也有点小心思,H1(hash = H(key)) = (hash >> 7) ^ salt(ctrl)
。也就是说 H1 还会对 ctrl
加盐,使得扩容时(因 ctrl
指针变动而)重平衡哈希值的分布。这里的盐值取 uintptr_t(ctrl) >> 12
,而 \(2^{12}\) 是一个页表的大小,尽量避免分配器低位对齐造成影响。
预插入(non-SOO)
template <class K>
std::pair<iterator, bool> find_or_prepare_insert_non_soo(const K& key) {
assert(!is_soo());
// 因为下面要进行计算哈希,这里要异步预取堆上 ctrl 到缓存行
prefetch_heap_block();
auto hash = hash_ref()(key);
// 生成一个 probe seq,用于后续线性探测
auto seq = probe(common(), hash);
const ctrl_t* ctrl = control();
while (true) {
// 使用 SIMD 加速载入部分 ctrl 到 Group
Group g{ctrl + seq.offset()};
// 如果批量命中,再往块中查找
for (uint32_t i : g.Match(H2(hash))) {
if (ABSL_PREDICT_TRUE(PolicyTraits::apply(
EqualElement<K>{key, eq_ref()},
PolicyTraits::element(slot_array() + seq.offset(i)))))
// 找到就可以收工
return {iterator_at(seq.offset(i)), false};
}
// 如果这里有空位
auto mask_empty = g.MaskEmpty();
if (ABSL_PREDICT_TRUE(mask_empty)) {
size_t target = seq.offset(
// 等价于 uint32_t 类型的 mask_empty.LowestBitSet(),其他参数为调试用途
GetInsertionOffset(mask_empty, capacity(), hash, control()));
return {iterator_at(PrepareInsertNonSoo(common(), hash,
// FindInfo 等价于 {offset, probe_length}
FindInfo{target, seq.index()},
// 提供一堆 policy 函数接口
GetPolicyFunctions())),
true};
}
seq.next();
// 肯定有容身之处!
assert(seq.index() <= capacity() && "full table!");
}
}
prefetch
// Prefetch the heap-allocated memory region to resolve potential TLB and
// cache misses. This is intended to overlap with execution of calculating the
// hash for a key.
// 异步预取非 SOO 的 ctrl bytes
void prefetch_heap_block() const {
assert(!is_soo());
#if ABSL_HAVE_BUILTIN(__builtin_prefetch) || defined(__GNUC__)
__builtin_prefetch(control(), 0, 1);
#endif
}
似乎有点用的小心思。因为接下来的操作是计算哈希,不如顺手发出 ctrl 异步载入缓存行的指令。这里的 ctrl
可能是足够长的,不能行内覆盖到具体的位置,不过也算尽力了。
probe
// Begins a probing operation on `common.control`, using `hash`.
inline probe_seq<Group::kWidth> probe(const ctrl_t* ctrl, const size_t capacity,
size_t hash) {
return probe_seq<Group::kWidth>(H1(hash, ctrl), capacity);
}
inline probe_seq<Group::kWidth> probe(const CommonFields& common, size_t hash) {
return probe(common.control(), common.capacity(), hash);
}
// The state for a probe sequence.
//
// Currently, the sequence is a triangular progression of the form
//
// p(i) := Width * (i^2 + i)/2 + hash (mod mask + 1)
//
// The use of `Width` ensures that each probe step does not overlap groups;
// the sequence effectively outputs the addresses of *groups* (although not
// necessarily aligned to any boundary). The `Group` machinery allows us
// to check an entire group with minimal branching.
//
// Wrapping around at `mask + 1` is important, but not for the obvious reason.
// As described above, the first few entries of the control byte array
// are mirrored at the end of the array, which `Group` will find and use
// for selecting candidates. However, when those candidates' slots are
// actually inspected, there are no corresponding slots for the cloned bytes,
// so we need to make sure we've treated those offsets as "wrapping around".
//
// It turns out that this probe sequence visits every group exactly once if the
// number of groups is a power of two, since (i^2+i)/2 is a bijection in
// Z/(2^m). See https://en.wikipedia.org/wiki/Quadratic_probing
// 有两个要点:
// * 环绕到最后前因为有 cloned bytes,所以 SIMD 依然可以找对人
// * 探测过程既不会重复,也不会缺漏
template <size_t Width>
class probe_seq {
public:
// Creates a new probe sequence using `hash` as the initial value of the
// sequence and `mask` (usually the capacity of the table) as the mask to
// apply to each value in the progression.
probe_seq(size_t hash, size_t mask) {
assert(((mask + 1) & mask) == 0 && "not a mask");
mask_ = mask;
// 初始为 hash
offset_ = hash & mask_;
}
// The offset within the table, i.e., the value `p(i)` above.
size_t offset() const { return offset_; }
size_t offset(size_t i) const { return (offset_ + i) & mask_; }
void next() {
// index 序列为 0+{w, 2w, 3w, 4w, 5w, ...}
// offset 序列为 H+{w, 3w, 6w, 10w, 15w, ...}
// 总之就是 hash 加上 index 的前缀和(别忘了取模)
// 也就是注释说的 triangular progression
index_ += Width;
offset_ += index_;
offset_ &= mask_;
}
// 0-based probe index, a multiple of `Width`.
size_t index() const { return index_; }
private:
size_t mask_;
size_t offset_;
size_t index_ = 0;
};
probe()
是延迟计算的,只是返回一个表达式对象。
Group
#ifdef ABSL_INTERNAL_HAVE_SSE2
using Group = GroupSse2Impl;
using GroupFullEmptyOrDeleted = GroupSse2Impl;
#elif /* ... */
#endif
// Group 实现类
struct GroupSse2Impl {
static constexpr size_t kWidth = 16; // the number of slots per group
explicit GroupSse2Impl(const ctrl_t* pos) {
// 载入 16 个 ctrl byte
ctrl = _mm_loadu_si128(reinterpret_cast<const __m128i*>(pos));
}
// Returns a bitmask representing the positions of slots that match hash.
BitMask<uint16_t, kWidth> Match(h2_t hash) const {
auto match = _mm_set1_epi8(static_cast<char>(hash));
return BitMask<uint16_t, kWidth>(
// _mm_cmpeq_epi8 字节相等时 match 设为 0xff,不相等设为 0x00
// _mm_movemask_epi8 压缩向量的每个最高位布尔值为标量的低 16 位
static_cast<uint16_t>(_mm_movemask_epi8(_mm_cmpeq_epi8(match, ctrl))));
}
// Returns a bitmask representing the positions of empty slots.
NonIterableBitMask<uint16_t, kWidth> MaskEmpty() const {
// SSSE3 要指定 -march,默认编译器不提供支持
#ifdef ABSL_INTERNAL_HAVE_SSSE3
// This only works because ctrl_t::kEmpty is -128.
// 这是精心设计过的数值
// 假设 r = _mm_sign_epi8(a, b),那么 r[i] = b[i] < 0 ? -a[i] : a[i];
// 虽然 i8 是不存在 128 的数值,但是一个数值 x 的负数可以解释为 ~x + 1
// 那么 -128 (0b 1000 0000) 的环绕仍是 -128
// 所以只有 kEmpty 最高位保持 1,然后通过 _mm_movemask_epi8 筛选出来
return NonIterableBitMask<uint16_t, kWidth>(
static_cast<uint16_t>(_mm_movemask_epi8(_mm_sign_epi8(ctrl, ctrl))));
#else
// 这种写法比较直白了,但是要三条向量指令
auto match = _mm_set1_epi8(static_cast<char>(ctrl_t::kEmpty));
return NonIterableBitMask<uint16_t, kWidth>(
static_cast<uint16_t>(_mm_movemask_epi8(_mm_cmpeq_epi8(match, ctrl))));
#endif
}
// ...
__m128i ctrl;
};
// An abstract bitmask, such as that emitted by a SIMD instruction.
//
// Specifically, this type implements a simple bitset whose representation is
// controlled by `SignificantBits` and `Shift`. `SignificantBits` is the number
// of abstract bits in the bitset, while `Shift` is the log-base-two of the
// width of an abstract bit in the representation.
// This mask provides operations for any number of real bits set in an abstract
// bit. To add iteration on top of that, implementation must guarantee no more
// than the most significant real bit is set in a set abstract bit.
template <class T, int SignificantBits, int Shift = 0>
class NonIterableBitMask {
public:
explicit NonIterableBitMask(T mask) : mask_(mask) {}
explicit operator bool() const { return this->mask_ != 0; }
// ...
T mask_;
};
Group
是以 SIMD 宽度来设计的。目前会用到的 SIMD 操作有 Match()
和 MaskEmpty()
,都写好注释了。这也解释了 ctrl_t::kEmpty
必须为 -128 的原因。
ctrl_t
// The values here are selected for maximum performance. See the static asserts
// below for details.
// A `ctrl_t` is a single control byte, which can have one of four
// states: empty, deleted, full (which has an associated seven-bit h2_t value)
// and the sentinel. They have the following bit patterns:
//
// empty: 1 0 0 0 0 0 0 0
// deleted: 1 1 1 1 1 1 1 0
// full: 0 h h h h h h h // h represents the hash bits.
// sentinel: 1 1 1 1 1 1 1 1
//
// These values are specifically tuned for SSE-flavored SIMD.
// The static_asserts below detail the source of these choices.
//
// We use an enum class so that when strict aliasing is enabled, the compiler
// knows ctrl_t doesn't alias other types.
enum class ctrl_t : int8_t {
kEmpty = -128, // 0b10000000
kDeleted = -2, // 0b11111110
kSentinel = -1, // 0b11111111
};
static_assert(
(static_cast<int8_t>(ctrl_t::kEmpty) &
static_cast<int8_t>(ctrl_t::kDeleted) &
static_cast<int8_t>(ctrl_t::kSentinel) & 0x80) != 0,
// 非哈希值(H2)都会固定符号位为 1。同时这也是 H2 取 7 bit 的原因,因为要丢弃符号位
// 注:谷歌之前说过(忘记来源了 orz),7 bit 只是一种实现既定的设计,后面可能会改
"Special markers need to have the MSB to make checking for them efficient");
static_assert(
ctrl_t::kEmpty < ctrl_t::kSentinel && ctrl_t::kDeleted < ctrl_t::kSentinel,
// 保证 kEmpty 和 kDeleted 均小于 kSentinel
// 有些场景不需要严格区分空还是已删除,只需对比一次 kSentinel 数值就能完成判断
// 这也是 IsEmptyOrDeleted() 的实现
"ctrl_t::kEmpty and ctrl_t::kDeleted must be smaller than "
"ctrl_t::kSentinel to make the SIMD test of IsEmptyOrDeleted() efficient");
static_assert(
ctrl_t::kSentinel == static_cast<ctrl_t>(-1),
// 虽然没太明白意图,但是 pcmpeqd 相等时是赋值 all-one 到寄存器的,也就是仍等于 -1
"ctrl_t::kSentinel must be -1 to elide loading it from memory into SIMD "
"registers (pcmpeqd xmm, xmm)");
static_assert(ctrl_t::kEmpty == static_cast<ctrl_t>(-128),
// 前面 Group 解释过这个设计
"ctrl_t::kEmpty must be -128 to make the SIMD check for its "
"existence efficient (psignb xmm, xmm)");
static_assert(
(~static_cast<int8_t>(ctrl_t::kEmpty) &
~static_cast<int8_t>(ctrl_t::kDeleted) &
// 使用了最低位 0 的共享
static_cast<int8_t>(ctrl_t::kSentinel) & 0x7F) != 0,
// 用于 _mm_cmpgt_epi8(_mm_set1_epi8(kSentinel), ctrl) 对比,并生成 mask
// 从实现来看,似乎没有这个 shared bit 必要?
"ctrl_t::kEmpty and ctrl_t::kDeleted must share an unset bit that is not "
"shared by ctrl_t::kSentinel to make the scalar test for "
"MaskEmptyOrDeleted() efficient");
static_assert(ctrl_t::kDeleted == static_cast<ctrl_t>(-2),
// -2 是用算法凑出来的,具体见 ConvertSpecialToEmptyAndFullToDeleted()
// 这个函数可以达成效果:
// * kDeleted -> kEmpty
// * kEmpty -> kEmpty
// * _ -> kDeleted
//
// 实现方式:
// _mm_or_si128(_mm_shuffle_epi8(_mm_set1_epi8(126), ctrl), _mm_set1_epi8(-128))
// 其中 126 = 0b 0111 1110
// 又因为 shuffle 操作区分符号位,所以:
// 对于非哈希值,0 | -128 == kEmpty
// 对于哈希值,126 | -128 == kDeleted
"ctrl_t::kDeleted must be -2 to make the implementation of "
"ConvertSpecialToEmptyAndFullToDeleted efficient");
其实 static_assert()
提示了 ctrl_t
不同数值的设计,官方解释并不算详细,所以加了点注释。
// Helpers for checking the state of a control byte.
inline bool IsEmpty(ctrl_t c) { return c == ctrl_t::kEmpty; }
inline bool IsFull(ctrl_t c) {
// Cast `c` to the underlying type instead of casting `0` to `ctrl_t` as `0`
// is not a value in the enum. Both ways are equivalent, but this way makes
// linters happier.
return static_cast<std::underlying_type_t<ctrl_t>>(c) >= 0;
}
inline bool IsDeleted(ctrl_t c) { return c == ctrl_t::kDeleted; }
inline bool IsEmptyOrDeleted(ctrl_t c) { return c < ctrl_t::kSentinel; }
这里顺手看下 IsFull()
和 IsEmptyOrDeleted()
的实现:
full
:符号位必须为 0,因此只需判断>= 0
。empty|deleted
:这两个值均小于kSentinel
,因此比较即可。
至于为什么 ctrl_t::kDeleted
一定是 -2,我在上面注释也解释了。
prepare insert non-SOO
// Given the hash of a value not currently in the table and the first empty
// slot in the probe sequence, finds a viable slot index to insert it at.
//
// In case there's no space left, the table can be resized or rehashed
// (for tables with deleted slots, see FindInsertPositionWithGrowthOrRehash).
//
// In the case of absence of deleted slots and positive growth_left, the element
// can be inserted in the provided `target` position.
//
// When the table has deleted slots (according to GrowthInfo), the target
// position will be searched one more time using `find_first_non_full`.
//
// REQUIRES: Table is not SOO.
// REQUIRES: At least one non-full slot available.
// REQUIRES: `target` is a valid empty position to insert.
size_t PrepareInsertNonSoo(CommonFields& common, size_t hash, FindInfo target,
const PolicyFunctions& policy) {
// When there are no deleted slots in the table
// and growth_left is positive, we can insert at the first
// empty slot in the probe sequence (target).
// 未曾删除且有增长空间的场景是热路径(case 0),可以直接插入首个 empty slot
const bool use_target_hint =
// Optimization is disabled when generations are enabled.
// We have to rehash even sparse tables randomly in such mode.
!SwisstableGenerationsEnabled() &&
// growth_info 没有高位删除标记,并且数值大于 0
common.growth_info().HasNoDeletedAndGrowthLeft();
// 删除过元素,或者没有余位,这里使用 unlikely 标记
// 这些冷路径都需要重新确认 target
if (ABSL_PREDICT_FALSE(!use_target_hint)) {
// Notes about optimized mode when generations are disabled:
// We do not enter this branch if table has no deleted slots
// and growth_left is positive.
// We enter this branch in the following cases listed in decreasing
// frequency:
// 1. Table without deleted slots (>95% cases) that needs to be resized.
// 2. Table with deleted slots that has space for the inserting element.
// 3. Table with deleted slots that needs to be rehashed or resized.
// 看样子这些冷路径的 if-else 分支是按照场景频率来设计的
// 虽然我不知道概率数据是怎么测出来的,能不能靠这些负载反推出谷歌的商业机密?
//
// case 1: 没删过,没余位(likely)
if (ABSL_PREDICT_TRUE(common.growth_info().HasNoGrowthLeftAndNoDeleted())) {
const size_t old_capacity = common.capacity();
// 见之前的扩容操作
policy.resize(common, NextCapacity(old_capacity), HashtablezInfoHandle{});
// 一个特定上下文的查找操作,可以对比 case 2 使用通用的 find_first_non_full()
target = HashSetResizeHelper::FindFirstNonFullAfterResize(
common, old_capacity, hash);
} else {
// Note: the table may have no deleted slots here when generations
// are enabled.
const bool rehash_for_bug_detection =
common.should_rehash_for_bug_detection_on_insert();
// 不管,默认 false
if (rehash_for_bug_detection) {
// Move to a different heap allocation in order to detect bugs.
const size_t cap = common.capacity();
policy.resize(common,
common.growth_left() > 0 ? cap : NextCapacity(cap),
HashtablezInfoHandle{});
}
// case 2: 删过,有余位
if (ABSL_PREDICT_TRUE(common.growth_left() > 0)) {
target = find_first_non_full(common, hash);
// case 3: 删过,没余位
} else {
target = FindInsertPositionWithGrowthOrRehash(common, hash, policy);
}
}
}
// case 0 热路径
//
// 等价于 size() += 1
PrepareInsertCommon(common);
// 等价于 grow_info -= IsEmpty(common.control()[target.offset])
common.growth_info().OverwriteControlAsFull(common.control()[target.offset]);
SetCtrl(common, target.offset, H2(hash), policy.slot_size);
common.infoz().RecordInsert(hash, target.probe_length);
return target.offset;
}
TODO: 《界轨》通关中,小部分遗留代码先放着。
挖坑不填坑,没人比我更卑鄙
UPDATE: 我回来了。垃圾游戏浪费时间,继续补完文章。
这里的插入 target 细分了很多的 case,主要就看有没有删过和多余的空间。
case 0
size_t PrepareInsertNonSoo(CommonFields& common, size_t hash, FindInfo target,
const PolicyFunctions& policy) {
// ...
// 等价于 size() += 1
PrepareInsertCommon(common);
// 等价于 grow_info -= IsEmpty(common.control()[target.offset])
common.growth_info().OverwriteControlAsFull(common.control()[target.offset]);
// 前面的扩容操作见过该函数,就是设置 ctrl[target.offset] = h2
// 除此以外,还可能处理 cloned bytes,具体见下面
SetCtrl(common, target.offset, H2(hash), policy.slot_size);
common.infoz().RecordInsert(hash, target.probe_length);
return target.offset;
}
// Overload for setting to an occupied `h2_t` rather than a special `ctrl_t`.
inline void SetCtrl(const CommonFields& c, size_t i, h2_t h, size_t slot_size) {
SetCtrl(c, i, static_cast<ctrl_t>(h), slot_size);
}
// Sets `ctrl[i]` to `h`.
//
// Unlike setting it directly, this function will perform bounds checks and
// mirror the value to the cloned tail if necessary.
inline void SetCtrl(const CommonFields& c, size_t i, ctrl_t h,
size_t slot_size) {
DoSanitizeOnSetCtrl(c, i, h, slot_size);
ctrl_t* ctrl = c.control();
ctrl[i] = h;
// 这个写法很绕,用于处理 cloned bytes 同时显式避免分支判断
// 对于 i < NumClonedBytes 的情况,赋值到对应 cloned byte
// 对于 i >= NumClonedBytes 的情况,实际不会有任何影响
ctrl[((i - NumClonedBytes()) & c.capacity()) +
(NumClonedBytes() & c.capacity())] = h;
}
在 case 0 的上下文,刚才的 prepare insert non-SOO 流程可以简化到以上形式。只处理 ctrl 信息。
NOTE: 可以对比 setCtrl()
和 SetCtrlInSingleGroupTable()
实现,后者作为特定上下文的优化,进一步免去了 &
运算。
case 1
后续的 case 均会重新定位 target,并且定位后再次垂落到 case 0 相同流程。
size_t PrepareInsertNonSoo(CommonFields& common, size_t hash, FindInfo target,
const PolicyFunctions& policy) {
// ...
if (ABSL_PREDICT_FALSE(!use_target_hint)) {
// case 1: 没删过,没余位(likely)
if (ABSL_PREDICT_TRUE(common.growth_info().HasNoGrowthLeftAndNoDeleted())) {
const size_t old_capacity = common.capacity();
// 见之前的扩容操作
policy.resize(common, NextCapacity(old_capacity), HashtablezInfoHandle{});
// 一个特定上下文的查找操作,可以对比 case 2 使用通用的 find_first_non_full()
target = HashSetResizeHelper::FindFirstNonFullAfterResize(
common, old_capacity, hash);
}
// ...
}
// case 0...
}
// Optimized for small groups version of `find_first_non_full`.
// Beneficial only right after calling `raw_hash_set::resize`.
// It is safe to call in case capacity is big or was not changed, but there
// will be no performance benefit.
// It has implicit assumption that `resize` will call
// `GrowSizeIntoSingleGroup*` in case `IsGrowingIntoSingleGroupApplicable`.
// Falls back to `find_first_non_full` in case of big groups.
static FindInfo FindFirstNonFullAfterResize(const CommonFields& c,
size_t old_capacity,
size_t hash) {
// 不符合单群概念,这个函数在前面扩容阶段也出现过
if (!IsGrowingIntoSingleGroupApplicable(old_capacity, c.capacity())) {
// 垂落到 case 2 的核心函数
return find_first_non_full(c, hash);
}
// Find a location for the new element non-deterministically.
// Note that any position is correct.
// It will located at `half_old_capacity` or one of the other
// empty slots with approximately 50% probability each.
// 这里说的应该是考虑的数据被猜中的安全性
// 任何位置都是正确的?
size_t offset = probe(c, hash).offset();
// Note that we intentionally use unsigned int underflow.
// 如果 offset 足够小,则会溢出满足条件
// 否则就是位于右半边必为空
if (offset - (old_capacity + 1) >= old_capacity) {
// Offset fall on kSentinel or into the mostly occupied first half.
// 这里原来是放置哨兵位置的,扩容后不需要存在两个哨兵
// Examples:
// old_ctrl = 0S0EEEEEEE...
// new_ctrl = S0EEEEEEEE...
//
// old_ctrl = 01S01EEEEE...
// new_ctrl = 1S01EEEEEE...
//
// old_ctrl = 0123456S0123456EE...
// new_ctrl = 456S0123?????????...
// 注意上面 S 的位置
offset = old_capacity / 2;
}
// 为什么确定这个 offset 肯定是 kEmpty 的呢?
// 这要看回前面提到的 commit 0ef87fa0c1d9cff4c231a2a3af854f0b5edfa92b
// 在扩容阶段的 GrowIntoSingleGroupShuffleControlBytes 函数当中
// 本来是哨兵(shuffle 导致)的 ctrl[old_capacity / 2] 被设为 kEmpty
assert(IsEmpty(c.control()[offset]));
return FindInfo{offset, 0};
}
case 1 相比 case 2 比较费解,建议先看 case 2,它也是 case 1 的后备方案。case 1 是针对单群做了特殊处理,谷歌认为这种做法对于单群更有效率且更加安全。这些前提是建立在前面的 shuffle 操作(比较占版面没放出来,建议自行翻阅 commit 记录)的基础上,在这种上下文的查找甚至不需要判断是否为空。
NOTES:
- 为什么不找 kDeleted 的位置?因为 case 1 不存在这种 slot。
- 为什么只应用于单群?谷歌在注释提到可以应用于其他场景,但是没有性能优势。
case 2
// Probes an array of control bits using a probe sequence derived from `hash`,
// and returns the offset corresponding to the first deleted or empty slot.
//
// Behavior when the entire table is full is undefined.
//
// NOTE: this function must work with tables having both empty and deleted
// slots in the same group. Such tables appear during `erase()`.
template <typename = void>
inline FindInfo find_first_non_full(const CommonFields& common, size_t hash) {
auto seq = probe(common, hash);
const ctrl_t* ctrl = common.control();
// 第一次探测,不使用 SIMD 操作
if (IsEmptyOrDeleted(ctrl[seq.offset()]) &&
// 不知道干嘛的,非调试时不存在该条件
!ShouldInsertBackwards(common.capacity(), hash, ctrl)) {
return {seq.offset(), /*probe_length=*/0};
}
// 后续探测,使用 SIMD 操作反复查询
while (true) {
GroupFullEmptyOrDeleted g{ctrl + seq.offset()};
// 直至找到 kEmpty | kDeleted
auto mask = g.MaskEmptyOrDeleted();
if (mask) {
return {
seq.offset(GetInsertionOffset(mask, common.capacity(), hash, ctrl)),
seq.index()};
}
seq.next();
assert(seq.index() <= common.capacity() && "full table!");
}
}
case 2 相比 case 1 并不需要扩容,因为确定有余位可插入。而可插入到地方既可以是 kEmpty,也可以是 kDeleted。过程非常朴素,就是找位置,这些 查找用的 SIMD 操作前面也有解释过。
case 3
// Called whenever the table needs to vacate empty slots either by removing
// tombstones via rehash or growth.
ABSL_ATTRIBUTE_NOINLINE
FindInfo FindInsertPositionWithGrowthOrRehash(CommonFields& common, size_t hash,
const PolicyFunctions& policy) {
const size_t cap = common.capacity();
if (cap > Group::kWidth &&
// Do these calculations in 64-bit to avoid overflow.
// 比较接近 7/8 负载因子,数值设计在下面解释了(真·工作周边)
common.size() * uint64_t{32} <= cap * uint64_t{25}) {
// Squash DELETED without growing if there is enough capacity.
//
// Rehash in place if the current size is <= 25/32 of capacity.
// Rationale for such a high factor: 1) DropDeletesWithoutResize() is
// faster than resize, and 2) it takes quite a bit of work to add
// tombstones. In the worst case, seems to take approximately 4
// insert/erase pairs to create a single tombstone and so if we are
// rehashing because of tombstones, we can afford to rehash-in-place as
// long as we are reclaiming at least 1/8 the capacity without doing more
// than 2X the work. (Where "work" is defined to be size() for rehashing
// or rehashing in place, and 1 for an insert or erase.) But rehashing in
// place is faster per operation than inserting or even doubling the size
// of the table, so we actually afford to reclaim even less space from a
// resize-in-place. The decision is to rehash in place if we can reclaim
// at about 1/8th of the usable capacity (specifically 3/28 of the
// capacity) which means that the total cost of rehashing will be a small
// fraction of the total work.
//
// Here is output of an experiment using the BM_CacheInSteadyState
// benchmark running the old case (where we rehash-in-place only if we can
// reclaim at least 7/16*capacity) vs. this code (which rehashes in place
// if we can recover 3/32*capacity).
//
// Note that although in the worst-case number of rehashes jumped up from
// 15 to 190, but the number of operations per second is almost the same.
//
// Abridged output of running BM_CacheInSteadyState benchmark from
// raw_hash_set_benchmark. N is the number of insert/erase operations.
//
// | OLD (recover >= 7/16 | NEW (recover >= 3/32)
// size | N/s LoadFactor NRehashes | N/s LoadFactor NRehashes
// 448 | 145284 0.44 18 | 140118 0.44 19
// 493 | 152546 0.24 11 | 151417 0.48 28
// 538 | 151439 0.26 11 | 151152 0.53 38
// 583 | 151765 0.28 11 | 150572 0.57 50
// 628 | 150241 0.31 11 | 150853 0.61 66
// 672 | 149602 0.33 12 | 150110 0.66 90
// 717 | 149998 0.35 12 | 149531 0.70 129
// 762 | 149836 0.37 13 | 148559 0.74 190
// 807 | 149736 0.39 14 | 151107 0.39 14
// 852 | 150204 0.42 15 | 151019 0.42 15
DropDeletesWithoutResize(common, policy);
} else {
// Otherwise grow the container.
policy.resize(common, NextCapacity(cap), HashtablezInfoHandle{});
}
// This function is typically called with tables containing deleted slots.
// The table will be big and `FindFirstNonFullAfterResize` will always
// fallback to `find_first_non_full`. So using `find_first_non_full` directly.
// 回到了 case 2
return find_first_non_full(common, hash);
}
void DropDeletesWithoutResize(CommonFields& common,
const PolicyFunctions& policy) {
void* set = &common;
void* slot_array = common.slot_array();
const size_t capacity = common.capacity();
assert(IsValidCapacity(capacity));
assert(!is_small(capacity));
// 它把话都说完了,我也没啥好说的
// Algorithm:
// - mark all DELETED slots as EMPTY
// - mark all FULL slots as DELETED
// - for each slot marked as DELETED
// hash = Hash(element)
// target = find_first_non_full(hash)
// if target is in the same group
// mark slot as FULL
// else if target is EMPTY
// transfer element to target
// mark slot as EMPTY
// mark target as FULL
// else if target is DELETED
// swap current element with target element
// mark target as FULL
// repeat procedure for current slot with moved from element (target)
ctrl_t* ctrl = common.control();
ConvertDeletedToEmptyAndFullToDeleted(ctrl, capacity);
const void* hash_fn = policy.hash_fn(common);
auto hasher = policy.hash_slot;
auto transfer = policy.transfer;
const size_t slot_size = policy.slot_size;
size_t total_probe_length = 0;
void* slot_ptr = SlotAddress(slot_array, 0, slot_size);
// The index of an empty slot that can be used as temporary memory for
// the swap operation.
constexpr size_t kUnknownId = ~size_t{};
size_t tmp_space_id = kUnknownId;
for (size_t i = 0; i != capacity;
++i, slot_ptr = NextSlot(slot_ptr, slot_size)) {
assert(slot_ptr == SlotAddress(slot_array, i, slot_size));
if (IsEmpty(ctrl[i])) {
tmp_space_id = i;
continue;
}
if (!IsDeleted(ctrl[i])) continue;
const size_t hash = (*hasher)(hash_fn, slot_ptr);
const FindInfo target = find_first_non_full(common, hash);
const size_t new_i = target.offset;
total_probe_length += target.probe_length;
// Verify if the old and new i fall within the same group wrt the hash.
// If they do, we don't need to move the object as it falls already in the
// best probe we can.
const size_t probe_offset = probe(common, hash).offset();
const auto probe_index = [probe_offset, capacity](size_t pos) {
return ((pos - probe_offset) & capacity) / Group::kWidth;
};
// Element doesn't move.
if (ABSL_PREDICT_TRUE(probe_index(new_i) == probe_index(i))) {
SetCtrl(common, i, H2(hash), slot_size);
continue;
}
void* new_slot_ptr = SlotAddress(slot_array, new_i, slot_size);
if (IsEmpty(ctrl[new_i])) {
// Transfer element to the empty spot.
// SetCtrl poisons/unpoisons the slots so we have to call it at the
// right time.
SetCtrl(common, new_i, H2(hash), slot_size);
(*transfer)(set, new_slot_ptr, slot_ptr);
SetCtrl(common, i, ctrl_t::kEmpty, slot_size);
// Initialize or change empty space id.
tmp_space_id = i;
} else {
assert(IsDeleted(ctrl[new_i]));
SetCtrl(common, new_i, H2(hash), slot_size);
// Until we are done rehashing, DELETED marks previously FULL slots.
if (tmp_space_id == kUnknownId) {
tmp_space_id = FindEmptySlot(i + 1, capacity, ctrl);
}
void* tmp_space = SlotAddress(slot_array, tmp_space_id, slot_size);
SanitizerUnpoisonMemoryRegion(tmp_space, slot_size);
// Swap i and new_i elements.
(*transfer)(set, tmp_space, new_slot_ptr);
(*transfer)(set, new_slot_ptr, slot_ptr);
(*transfer)(set, slot_ptr, tmp_space);
SanitizerPoisonMemoryRegion(tmp_space, slot_size);
// repeat the processing of the ith slot
--i;
slot_ptr = PrevSlot(slot_ptr, slot_size);
}
}
ResetGrowthLeft(common);
common.infoz().RecordRehash(total_probe_length);
}
case 3 的上下文是删过但是又没有余位,因为这里可行的操作是做压缩(drop)处理以尽可能避开扩容。具体的数值设计很难权衡,总之是靠 benchmark 得出的,具体看官方注释。另外压缩用的算法也已经有很好的总结了,见 DropDeletesWithoutResize()
开头部分。
删除操作
TL;DR: find + alloc::destroy + 移除 metadata(修改大小、标记 kDeleted 等)。
查找操作
template <class K = key_type>
iterator find(const key_arg<K>& key) ABSL_ATTRIBUTE_LIFETIME_BOUND {
AssertOnFind(key);
if (is_soo()) return find_soo(key);
prefetch_heap_block();
return find_non_soo(key, hash_ref()(key));
}
供外部使用的查找操作非常直接。如果看过前面的流程,基本都能猜到要做什么。
find SOO
template <class K = key_type>
iterator find_soo(const key_arg<K>& key) {
assert(is_soo());
return empty() || !PolicyTraits::apply(EqualElement<K>{key, eq_ref()},
PolicyTraits::element(soo_slot()))
? end()
: soo_iterator();
}
SOO 是固定位置的,因此只需对 soo_slot()
进行 apply EqualElement 判定并返回 SOO 迭代器。
find non-SOO
template <class K = key_type>
iterator find_non_soo(const key_arg<K>& key, size_t hash) {
assert(!is_soo());
auto seq = probe(common(), hash);
const ctrl_t* ctrl = control();
while (true) {
Group g{ctrl + seq.offset()};
for (uint32_t i : g.Match(H2(hash))) {
if (ABSL_PREDICT_TRUE(PolicyTraits::apply(
EqualElement<K>{key, eq_ref()},
PolicyTraits::element(slot_array() + seq.offset(i)))))
return iterator_at(seq.offset(i));
}
if (ABSL_PREDICT_TRUE(g.MaskEmpty())) return end();
seq.next();
assert(seq.index() <= capacity() && "full table!");
}
}
non-SOO 的操作和前面 find_or_prepare_insert_non_soo()
相同,就是 SIMD 的批处理和探测。
END
不想调试了,就到此结束吧。全网就我一个大原种在翻没人看的源码,而群友们总是能对架构高谈阔论指指点点,也不知道有没有看过实现,不干了。希望后来的原种大神能续上遗留的一点细节。
其实从这些流程当中也可以理解 Swiss table 的高性能实现思路:就是先有一个好的想法,剩下全靠卷细节。也正是如此,这篇文章除了细节以外也没啥好总结的,RTFSC!