Hacker Newsnew | past | comments | ask | show | jobs | submit | loopingoptimism's commentslogin

In principle both HAMT and CHAMP compress sparse associative arrays by construction. However, the bitmap encoding of CHAMP results in a better compression for map data structures, enabled by permuting the elements that are stored in the node's array.

This is in particular the case when compared to Clojure's PersistentHashMap. Clojure reserves two array slots per entry to either store a) two pointers inline (i.e., key and value tuple), or b) a single reference to a sub-trie in the second slot. At runtime Clojure checks if the first slot is equal to 'null' for distinguishing cases a)/b) and doing the correct type casts at runtime. The thesis visualizes this in chapter 3, page 42.


No, the linked work describes eager data structures that use techniques such as path-copying for persistence. No laziness required.


The talk at Clojure West wasn't given by myself, but I found it worthwhile linking. Rather it was given by someone who independently picked up my research results and replicated them in the context of ClojureScript (https://github.com/bendyworks/lean-map). The authors independently confirmed the performance improvements of CHAMP over HAMT that I was observing.

You can also have a look at the JVMLS'16 talk (https://www.youtube.com/watch?v=pUXeNAeyY34) for a high-level overview of the work that is covered in my thesis.


From what I've read, it seems that the performance benefits disappear when compound data structures/objects are used as keys. Is this true?

Let's say that I were to use a vector of coordinates as the key to something in a map. Would CHAMP still vastly outperform HAMT?


Why should that be the case? Can you point to sources where you read that? In my experience, CHAMP clearly has advantages over HAMT in this scenario.

To answer your questions about using vectors of coordinates of keys: it depends on the design implementation of the vector's hash code, regardless if you use HAMT or CHAMP.

Using collections as keys in other collections is in general a performance sensitive subject. The available HAMT implementations in Clojure and Scala fail to deliver here. The case study in Chapter 3.7 nests hash-sets into hash-sets (i.e., Set.Immutable<Set.Immutable<K>> sets). The CHAMP implementation yields minimal speedups of ~10x over Clojure and Scala due to the way it calculates and incrementally updates the collection's hash code.


Adoption (especially in industry) takes time, but there's definitely interest in the Clojure and Scala communities.

The CHAMP data structure was already picked up by the ClojureScript community (https://github.com/bendyworks/lean-map), but still requires some work to be upstreamed to Clojure I guess.

I personally (I'm the author of the thesis linked above) plan to work together with the Scala folks at Lightbend for the collections overhaul that's planned for Scala 2.13 (see https://github.com/scala/collection-strawman).


Any idea about CHAMP and Haskell? IIRC the reception of HAMT was somewhat lukewarm back in the day, and AFAIK to this day the Haskell standard library doesn't use them.


I do not know well the current state of HAMT data structures in Haskell, but there is a related thread at the Haskell subreddit that you might be interested in: https://www.reddit.com/r/haskell/comments/6tmbju/efficient_i...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: