I’m already seeing a lot of discussion both here and over at LWN about which has...

KMag · on Feb 4, 2020

SHA-256 is probably the right choice, but I don't think it's as obvious as you suggest, given SHA-512/256.

SHA-512/256 is a standard peer-reviewed and well-studied way to run SHA-512 with a different initial state and then truncate output to 256 bits.

This is heavy bikeshedding, but SHA-512/256 would be a more conservative choice than SHA-256. Under standard assumptions, SHA-256 is no weaker than SHA-512. The structure is extremely similar to SHA-256, but a collision on intermediate state requires a collision on all 512 bits of state instead of 256.

On most 64-bit CPUs without dedicated hash instructions, SHA-512/256 is faster for messages longer than a couple of blocks, due to processing blocks twice as large in fewer than twice as many operations.

Currently, the latest server and laptop CPUs have SHA-256 hardware acceleration but not SHA-512 acceleration. I'm not sure how many phone CPUs support sha256 but not ARMv8.2-SHA extensions (SHA-512). If it weren't for this difference in hardware acceleration, there would be few reasons to use SHA-256.

That being said, the current difference in hardware acceleration support probably makes SHA-256 the right choice here.

strenholme · on Feb 4, 2020

SHA-512/256 is a lot newer than SHA2-256 (usually called SHA-256, but I prefer the SHA2 prefix to make it clear that it’s a very different beast than SHA3-256), and its speed on 32-bit CPUs is less than optimal, so I don’t see it as being a more conservative choice. In terms of security, it uses the same 19-year-old unbroken algorithm as SHA2-256.

I am aware of the length extension issues, but they are not relevant for Git’s use case.

In terms of support, SHA-512/256 has, as you mentioned, less hardware acceleration support, and it’s also not supported in a lot of mainstream programs like GNU Coreutils. I also know that some companies mandate using SHA2-256 whenever a cryptographic hash is needed.

Git made the right choice with SHA2-256: It’s the most widely supported secure cryptographic hash out there.

jayflux · on Feb 4, 2020

> BLAKE is faster when using software to perform the hash

Is BLAKE 3 still faster than sha-256 when using the cpu speciliazed instructions? I think most modern desktop CPUs has built-in instructions for SHA256.

I’m guessing when people compare BLAKE 3 to SHA 256 they’re comparing software to software, but this wouldn’t be the case in reality?

strenholme · on Feb 4, 2020

I haven’t seen any benchmarks for BLAKE3 vs. the Intel/AMD SHA extensions. My guess is that Intel hardware accelerated SHA-256 will be faster than BLAKE3 running in software for most real world uses.

I can tell you this much: It is only with Ice Lake, which was released in the last year, that mainstream Intel chips finally got native hi speed SHA-NI support. Coffee Lake and Comet Lake, which are still the CPUs in a lot of new laptops being sold right now, do not support SHA-NI.

wahern · on Feb 4, 2020

AMD Zen supports SHA extensions across all SKUs. Here are `openssl speed` numbers on an AMD EPYC 3201:

  type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
  blake2s256       46720.33k   187461.21k   305314.65k   373840.55k   398207.66k   401528.15k
  blake2b512       38423.44k   155318.81k   422325.08k   592401.75k   674843.31k   681743.70k
  sha256           84620.44k   279840.47k   723573.76k  1199678.81k  1484693.50k  1510484.65k
  sha512           33854.38k   135674.20k   275343.70k   444872.36k   545802.92k   554166.95k
  sha3-256         26146.35k   103860.27k   253944.92k   308119.21k   347477.33k   351906.47k
  sha3-512         26349.83k   105590.85k   144236.03k   173082.62k   189448.19k   189814.10k

It's possible that Blake3 might be faster than accelerated SHA-256 on large inputs, where Blake3 can maximally leverage its SIMD friendliness. OTOH, Blake3 really pushes the envelope in terms of minimal security margin. Performance isn't everything. SHA-3 is so slow because NIST wanted a failsafe.

OpenSSL info:

  OpenSSL 1.1.1c  28 May 2019
  built on: Tue Aug 20 11:46:33 2019 UTC
  options:bn(64,64) rc4(8x,int) des(int) aes(partial) blowfish(ptr) 
  compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -fdebug-prefix-map=/build/openssl-D7S1fy/openssl-1.1.1c=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2

NOTE: /proc/cpuinfo shows sha_ni detection, and the apt-get source of this version of OpenSSL confirms SHA extension support in the source code, but I didn't confirm that it was actually being used at runtime.

strenholme · on Feb 4, 2020

Assuming Blake3 will be across the board 43% faster (7 instead of 10 rounds) than 32-bit blake2s256, we would get:

  Blake3  SHA-256
   66743    84620  Tiny
  534057  1199679  Medium (1024 bytes)
  573611  1510485  Largeish (16384 bytes)

This is based on the parent’s numbers with a fudge factor to account for Blake3 being a faster version of blake2s256 (i.e. the 32-bit version of Blake2 which is the only version in Blake3)

Of course, this does take in to account that Blake3 has tree hashing and other modes which scale better to multiple cores.

(Edit: update figures; I need to scale up Blake2s256 not Blake2b512)

oconnor663 · on Feb 5, 2020

The BLAKE3 tree mode also takes advantage of SIMD parallelism on a single core, which ends up being a larger effect than the reduced number of rounds. At 2-4 KiB of input (depending on the implementation) it's 2x faster than BLAKE2s on my laptop. Where AVX2 and AVX-512 are supported, those kick in at 8 KiB and 16 KiB of input respectively, widening the difference further. The red bar chart at https://github.com/BLAKE3-team/BLAKE3 is a single-threaded measurement on a machine that supports AVX-512.

vluft · on Feb 4, 2020

On my machine with sha extensions, blake3 is about 15% faster (single threaded in both cases) than sha256.

abecedarius · on Feb 4, 2020

Also, Blake3 has some kind of advantage in parallelizability, iirc.

vluft · on Feb 4, 2020

yeah, blake3 multi-threaded is about 11 times faster for me than sha256 single-threaded.

_verandaguy · on Feb 4, 2020

Honest question: what are the use cases in Git where hash computation speed is a meaningful optimization?

SQLite · on Feb 4, 2020

My experience in developing and maintaining Fossil is that the hashing speed is not a factor, unless you are checking in huge JPEGs or MP3s or something. And even then, the relative performance of the various hash algorithms is not enough to worry about.

_verandaguy · on Feb 5, 2020

Thanks for the insight. My intuition was kind of the same, but on modern hardware computing the digest-style (as opposed to cryptographic, slow-by-design) hash is essentially imperceptible for payloads in the low MBs -- and much above that is a use case for LFS.

strenholme · on Feb 4, 2020

It’s actually not a big deal with Git, which is why SHA2-256 is the right choice.

loeg · on Feb 4, 2020

Rewriting all repos from SHA1 to hash-next?

papreclip · on Feb 4, 2020

less wasted computation means less global warming