It’s usually not the right call to use a hash table like you’re describing. You’re not saving any work (all keys have to be migrated sooner or later) and by doing it like this you incur a non-trivial amount of overhead on most operations.
Hash tables generally work on the assumption that operations are O(1) in the amortized sense, that individual inserts might be expensive because of rehashing, but in aggregate they are very cheap. In practice, this is almost always a fine assumption, and it is how the vast majority of hash tables in all programming languages work.
The kinds of hash tables you are talking about have their uses in places where you have soft real-time constraints, but that is honestly a pretty niche use case.
iirc memcached either uses or used to use https://en.wikipedia.org/wiki/Linear_hashing to avoid rehashing induced pauses. it definitely makes sense for that use case but you're correct that it's not the right call for a general purpose hash table implementation.
Programs have different phases and jitter might be relevant only in some of them.
Imagine hash tables used to store configuration data. You don't care about performance of "reload config" operation, but once config is loaded and the worker is handling traffic, read operations must be fast.
I agree. I’m rarely using a hash tables for growing data sets. In my experience, I’ve been more inclined to redesign the algorithm or use a different data structures at the point were a migrating hash table makes sense.
> Hash tables generally work on the assumption that operations are O(1) in the amortized sense, that individual inserts might be expensive because of rehashing, but in aggregate they are very cheap.
Hash tables aren't O(1) in an amortized sense, they are O(1) in a statistical sense. The O(1) complexity of hash tables is dependent entirely on the size of the underlying array, the method of collision resolution, and the quality of the (pre)hash function in relation to the distribution of data you're storing in the table. Whether you insert 1 item or 1 million items into the table, you're still getting O(1) computational complexity if the table is designed well. If not, you could be getting O(n) complexity in the worst case.
While the fixed cost of hashing is irrelevant in the O(1) nature of hash table operations, it does mean that a hash table in practice could be slower (in terms of wall time) than a linked list for insert/find/remove operations. Despite a linked list being O(n) for those operations, a linked list doesn't have to pay the fixed cost of hashing. So for low values of n, the linked list might be a better choice. I think this is what you mean by "O(1) in the amortized sense" but it's not what we mean when we say the computational complexity of these operations on a hash table is O(1).
What the OP is talking about is an example of amortization of resize(), essentially adding a fixed cost to insert() and remove() so you don't have to pay a huge sum when you call resize(). But that doesn't mean insert() and remove() are no longer O(1).
If you use the terminology precisely, the worst case cost of a regular hash table insertion is O(n) because of the potential resize, but its amortized cost is O(1) because that's what it "averages out to".
Hash tables generally work on the assumption that operations are O(1) in the amortized sense, that individual inserts might be expensive because of rehashing, but in aggregate they are very cheap. In practice, this is almost always a fine assumption, and it is how the vast majority of hash tables in all programming languages work.
The kinds of hash tables you are talking about have their uses in places where you have soft real-time constraints, but that is honestly a pretty niche use case.