Postgres can be scaled vertically like Stackoverflow did. With cache on edge for...

mr_toad · on Aug 29, 2024

Stack overflow absolutely had load balancers, and 9 web servers, and Redis caches. They also use 4 SQL servers, so not entirely vertical either. And they were only serving 500 requests a second on average (peak was probably higher).

pajeets · on Aug 29, 2024

was it? i read it was a huge ram server

zo1 · on Aug 29, 2024

The details of their architecture is documented in a series of blog posts:

https://nickcraver.com/blog/2016/02/03/stack-overflow-a-tech...

I get what you're saying, they didn't do dynamic and "wild" horizontal scaling, they focused more on having an optimal architecture with beefy "vertically scaled" servers.

Very much something we should focus on. These days horizontal scaling, microservices, kubernetes, and just generally "throwing compute" at the problem is the lazy answer to scaling issues.

mr_toad · on Aug 29, 2024

https://stackexchange.com/performance

DoctorOW · on Aug 29, 2024

That's a primary and backup server for Stackoverflow and a primary/backup for SE. But they each have the full dataset for their sites, not actual horizontal scaling. Also that page is just a static marketing tool, not very representative of their current stack. See: https://meta.stackexchange.com/questions/374585/is-the-stack...

KronisLV · on Aug 31, 2024

Having most of the servers be loaded at about 5% CPU usage feels extremely wasteful, but at the same time I guess it's better to have the spare capacity for something that you really want to keep online, given the nature of the site.

However, if they have a peak of 450 web requests per second and somewhere between 11000 - 23800 SQL queries per second, that'd mean between 25 - 53 SQL queries to serve a single request. There's probably a lot of background processes and whatnot (and also queries needed for web sockets) that cut the number down and it's not that bad either way, but I do wonder why that is.

The apps with good performance that I've generally worked with attempted to minimize the amount of DB requests needed to serve a user's request (e.g. session cached in Redis/Valkey and using DB views to return an optimized data structure that can be returned with minimal transformations).

Either way, that's a quite beefy setup!

danmaz74 · on Aug 29, 2024

Having at least 2 web servers and a read-only DB replica for redundancy/high availability is very easy and much safer. Yes, setting up a single-server is faster, but if your DB server dies - and at some point it will happen - you'll not just save a lot of downtime, but also a lot of stress and additional work.

brightball · on Aug 29, 2024

Read replicas come with their own complexity as you have to account for the lag time on the replica for UX. This leads to a lot of unexpected quirks if it’s not planned for.

danmaz74 · on Aug 29, 2024

That's true, but you can use your replica only for non-realtime reporting, or even just as a hot standby.

Edit: Careful for the non-realtime reporting though if you want to run very slow queries - those will pause replication and can be a PITA.

cqqxo4zV46cp · on Aug 29, 2024

A hot standby / failover still meets this definition. That’s how I interpreted what was being described.

cultofmetatron · on Aug 29, 2024

my startup has a similar setup (elixir + postgres). we use aurora so we get automated failover. its more expensive but its just a cost of doing business.

aledalgrande · on Aug 31, 2024

Last time I looked at Aurora (just as it came out) it was hilariously expensive. Are the costs better now for a real use case?

cultofmetatron · on Aug 31, 2024

> it was hilariously expensive

It still is. But you have to look at it in perspective. do you have customers that NEED high availability an will pull out pitch forks if you are down for even a few minutes? I do. the peace of mind is what you're paying for in that case.

Plus its still cheaper than paying a devops guy a fulltime salary to maintain these systems if you do it on your own.

justinclift · on Aug 29, 2024

That works for the performance aspect, but doesn't address any kind of High Availability (HA).

There are definitely ways to make HA work, especially if you run your own hardware, but the point is that you'll need (at least) a 2nd server to take over the load of the primary one that died.

pajeets · on Aug 29, 2024

sure failover is recommended if you have HA commitments

nazka · on Sept 3, 2024

Thank you for sharing this! I have been diving into it.

How do you manage transactions with PostgREST? Is there a way to do it inside it? Or does it need to be in a good old endpoint/microservice? I can’t find anything in their documentation about complex business logic beyond CRUD operations.

steve-chavez · on Sept 4, 2024

Transactions are done using database functions https://docs.postgrest.org/en/v12/references/api/functions.h....

nazka · on Sept 6, 2024

Ah ok awesome thank you!

whakim · on Aug 29, 2024

Yes, scaling vertically is much easier than scaling horizontally and dealing with replicas, caching, etc. But that certainly has limits and shouldn’t be taken as gospel, and is also way more expensive when you’re starting to deal with terabytes of RAM.

I also find it very difficult to trust your advice when you’re telling folks to stick Postgres on a VPS - for almost any real organization using a managed database will pay for itself many times over, especially at the start.

pajeets · on Aug 29, 2024

looking at hetzner benchmarks i would say VPS are quite enough to handle Postgres for Alexa Top 1000. When you approach under top 100, you will need more RAM than what is offered.

But my point is you won't ever hit this type of traffic. You don't even need Kafka to handle streams of logs from a fleet of generators from the wild. Postgres just works.

In general, the problem with modern backend architectural thinking is that it treats database as some unreliable bottleneck but that is an old fashioned belief.

Vast majority of HN users and startups are not going to be servicing more than 1 million transactions per second. Even a medium sized VPS from Digital Ocean running Postgres can handle that load just fine.

Postgres is very fast and efficient and you dont need to build your architecture around problems you wont ever hit and prepay that premium for that <0.1% peak that happens so infrequently (unless you are a bank and receive fines for that).

whakim · on Aug 29, 2024

I work at a startup that is less than 1 year old and we have indices that are in the hundreds of gigabytes. It is not as uncommon as you think. Scaling vertically is extremely expensive, especially if one doesn’t take your (misguided) suggestion to run Postgres on a VPS rather than using a managed solution like most do.

pajeets · on Aug 29, 2024

shouldn't be expensive to handle that amount of indices on a dedicated server without breaking the bank

seabrookmx · on Aug 29, 2024

> One server

What happens if this server dies?

wongarsu · on Aug 29, 2024

Then your service is offline until you fix it. For many services a completely acceptable thing to happen once in a blue moon

Most would probably get two servers with a simple failover strategy. But on the other hand servers rarely die. At the scale of a datacenter it happens often, but if you have like six of them, buy server grade stuff and replace them every 3-5 years chances you won't experience any hardware issues

pajeets · on Aug 29, 2024

if you cant risk this rarity then get a failover server with equal specs

maybe add another for good measure....if the biz insurance needs extreme HA then absolutely have multiple failover

my point is you arent doing extreme orchestration or routing

throw a cloudflare ddos protection too

JB_Dev · on Aug 30, 2024

Eventually you get data residency asks to keep data in the right region and for that you need to have horizontal partitioning of some kind.

jamil7 · on Aug 29, 2024

Our backend at work does use a read replica purely for websockets. I always wondered if it was overkill, I’m not a backend developer, though.

pajeets · on Aug 29, 2024

not sure what you are building but i hope that was for a real time multiplayer game otherwise doesn't make sense to have bi-directional communication when you only need reads

making read replicas function also as writes is needed for such cases but already when you have more than one place to write you run into edge cases and complexities in debugging

jamil7 · on Sept 6, 2024

I think the reason is pushes are sent out regularly in batches by some cron system and rather than reading from the main database it reads from the replica before it pushes them out. I didn't really explain the context properly in my comment.

mattacular · on Aug 29, 2024

> Just up the RAM and CPU up to TB levels

not sure what CPU at TB levels means but hope your wallet scales better vertically

cosmicradiance · on Aug 29, 2024

They are definitely not on the cloud.

pajeets · on Aug 29, 2024

Aurora on AWS definitely has extreme RAM

It's not cheap at roughly $200/hr but already if you have this type of traffic then you are generating revenues (hopefully) at much greater amounts.