Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

GPT-3 requires 700 gigabytes of GPU RAM. I'm looking at my cheapest computer components retailer listing a 48 gigabyte GPU at $5k. So to run the previous generation of GPT would cost me about $70k right now. When do you think I can expect to run GPT-4 on my consumer $device? :)


I would be surprised if GPT-3 uses 700GB of RAM. It may be true, I don't know. But I am running 70B parameter models (quantized to 5 or 6 bits, biggest is 48GB loaded) on my 64GB Mac M2 Max Studio now and they are usable and the machine is still usable too. With an M2 Ultra and 192GB of RAM I imagine you could do a lot more.

I'm not arguing that these models hold up against GPT 3.5 and I still use GPT 4 when it matters. But they work and it's more like the difference between Premier League & Division 1, rather than PL & a five-a-side team from Bracknell.

Even a few years ago I could not have imagined this.

Given the pace of work on optimisation and my assumption that the M3 Studio I buy next will probably have 256GB of RAM at much the same power levels as I use now, it seems eminently possible it's a year or two away.


First of all, you're off by an order of magnitude.

Second, I don't think it will be that long. There are already LLMs as good as GPT-3 running on average laptops and even phones.

In the next couple of years, you'll see:

- Ordinary PCs, tablets, and phones with dedicated AI chips, like TPUs - they'll be more tuned specifically for LLMs

- Mathematical and algorithmic optimizations will make existing LLMs faster on the same hardware

- Newer generations of LLMs will get even more useful with fewer parameters

The combination of all of these means that it's not at all unreasonable to expect that today's top-of-the-line LLM will be running locally on your device within just a couple of years.

Of course, LLMs in the cloud will advance even further, so there will always be a tradeoff, and there will always be demand for cloud AI, depending on the application.


I don't know. RAM is $$ and currently usable LLMs needs huge RAM with higher bandwidth. I don't see any story that it will be solved with future AI chips. Do you know anything?


Where did you get the 700 gigabytes figure from? I don't think OpenAI even released the model size, although it's considered to be 175B parameters. Given how well quantization works at these sizes you would need less than 200 GB of GPU memory to run it.


That doesn't seem to make sense. I can run Llama 2 on my 12-year-old desktop PC with no compatible GPU and only 16GB of system RAM. It ain't quick, but it runs.


Maybe 700GB is what ChatGPT uses to serve zillions of users concurrently. If you're running your own individual instance, you obviously don't need as many resources.


You are now where the GUI was in in 1986 or so.


and you think corporate will wait near forty years for tech to catch up? no, they will, indeed force you to be always online, leading to absolute and total surveillance where you voluntarily add dozens of mics and cameras to every home. if you thought telescreens a'la 1984 is bad, just wait.


It's funny that being obsessed with "corporations" feels so specifically GenX. I think because it's part of the New Left movement. In reality governments have all the power here, which is why this would be illegal in at least CA/EU/China.

Another issue where this comes up is high housing costs and climate change, which are mostly caused by bad land use laws (and the profiteers are landlords, who mostly own one or two properties), but people from the New Left era will literally refuse to believe you about this because they can't accept that any bad thing on Earth could not be caused by "corporations".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: