Right, most people that try to really optimize these things do not have access t...

apitman · on July 28, 2023

You might be interested in our paper that just got published in Bioinformatics today as chance would have it: https://academic.oup.com/bioinformatics/advance-article-abst...

epistasis · on July 28, 2023

Thanks for sharing, I'll check it out. It would be interesting to see if it would help with these pretty astonishing Open Omics results that recently came out:

https://community.intel.com/t5/Blogs/Tech-Innovation/Artific...

withinboredom · on July 28, 2023

Of course it’s slower. Your using https to do something that’s meant to be raw binary. The overhead is killing you. Something like iscsci, is pretty quick compared to https as a storage protocol.

epistasis · on July 28, 2023

Which is exactly why using standard Unix/POSIX files makes sense as a universal interface for genomics programs that are run in highly heterogeneous environments across the world, even if it leads to software engineers wishing that their internal custom data storage systems were used instead.

If random access is needed in a cloud environment, use either that local ephemeral SSD, or a cloud block device which is probably just an iSCSI implementation underneath, or at least a close equivalent.

Operating fleets of compute and IO in cloud environments means that POSIX semantics generally work really well for genomics.

Folks that have reimplemented basic genomics algorithms on top of protocol buffers standardized serialization still store them as BLOBs, and have not delivered benefits that can be realized in publicly available compute environments.

withinboredom · on July 28, 2023

I’m still waiting for people to realize nvme doesn’t have to be local in newer kernels. That’s when the real fun will begin.

gcoakes · on July 28, 2023

I used to work on datacenter NVMe products. I wrote the tests which validated them (mostly functional not performance). I left that company before things got hot with fabrics, but I really want to see that stuff succeed. It looked really cool.

dekhn · on July 28, 2023

I've worked in web tech long enough to recognize that the overhead of HTTP does not explain the difference in performance between "raw binary" protocols and ones that have textual headers.

Put another way I've seen extremely low-latency https servers. The latency in S3 doesn't come from using https.

withinboredom · on July 28, 2023

Raw binary. As in I send the CPU instructions to seek to a memory location (whether local or remote). iSCSCI is usually on the same network (maybe even the same machine if using k8s to do this via Longhorn) and handled in kernel space.

I highly doubt HTTP has less latency than that.

dekhn · on July 28, 2023

no, but since that part of the data transfer wasn't the bottleneck, it doesn't matter.