GPU databases are brilliant for cases where the working set can live entirely wi...

dragontamer · on Jan 30, 2019

> GPU databases are brilliant for cases where the working set can live entirely within the GPU's memory.

Probably true for current computer software. But there are numerous algorithms that allow groups of nodes to work together on database-joins, even if they don't fit in one node.

Consider Table A (1000 rows), and Table B (1,000,000 rows). Lets say you want to compute A Join B, but B doesn't fit in your memory (lets say you only have room for 5000 rows). Well, you can split Table B into 250 pieces, each with 4000-rows.

TableA (1000 rows) + TableB (4000 rows) is 5000 rows, which fits in memory. :-)

You then compute A join B[0:4000], then A join B[4000:8000], etc. etc. In fact, all 250 of these joins can be done in parallel.

----------

As such, its theoretically possible to perform database joins on parallel systems, even if they don't fit into any particular node's RAM.

shaklee3 · on Jan 30, 2019

If you can afford it, much of the penalty from the pcie bus goes away if you have a system with nvlink. You still need to transfer the data back to the CPU for the final results, but most of the filtering and reduction operations across GPU memory can be done on nvlink only.

fulafel · on Jan 30, 2019

There's also the iGPU case where the GPU is on the system DRAM bus.

elanora96 · on Jan 30, 2019

I'm clueless on the amount of bandwidth needed for larger applications, will the eventual release of PCIe 4 & 5 have a big impact on this? Or will it still be too slow?

vkaku · on Jan 30, 2019

PCIe3 x16, current generation has a bandwidth of 15.75 GB/s (×16)

Assuming you have 16 GB or RAM on the GPU, theoretically ~1 second is all it would take to load the GPU with that amount of data. Unfortunately, when you take into consideration huge data sets, you'll be able to saturate that with 5 M2 SSDs each running at 3200 MB/s, assuming Disk<-RAM DMA->GPU. Those would also require at least 5 PCIe 2x8 ports on a pretty performant setup. RAM bandwidth is assumed to be around ~40-60GB/s, so hopefully no bottlenecks there.

This is assuming your GPU could swizzle through 16GB of data in a second. They have a theoretical memory bandwidth of between 450-970 GB/s.

Now realistically, per vendor marketing manuals, the fastest DB I've seen allows one to ingest at 3TB / hour ~~ 1GB / second.

So there must be more to it than the theoretically 16GB / second business. PCIe4 x16 doubles the speed to 32GB/s but at this time it looks pointless to me.

arnon · on Jan 30, 2019

NVLink, which works between IBM Power CPUs and NVIDIA GPUs is already 9.5x faster than PCIe 3.0...

Not all queries or database operations are bottlenecked by PCIe. Some are compute bound, others are network bound, etc.

eptcyka · on Jan 30, 2019

I can't imagine a modern system where the link between system memory and the CPU would be slower than the PCI link between memory and a peripheral.