MyRocks – A RocksDB storage engine with MySQL

markpapadakis · on Dec 3, 2017

FWIW, we used MyRocks (went from InnoDB to MyISAM, to MyRocks) for some mySQL tables holding a few dozen million rows a few months ago.

InnoDB wasn't working out for us (INSERTs and SELECTs were too slow). MyISAM fared a lot better; INSERTs were a lot faster(obviously?), and SELECTs some 50% faster. But it too wasn't as great as we hoped it'd be. It got to where it was too slow for production use (some operations would take over 4s, which was a deal breaker for us).

We then switched to MyRocks. It was great initially -- much smaller on-disk footprint, INSERTs were fast, SELECTs were fast enough. But two weeks later it also got to where it was too slow. Slower than InnoDB and MyISAM even; also, our mySQL server would often starve for memory, and restarting it was the only practical way to “fix” it.

We would DELETE from those tables every few hours because we were only interested in the last 2 weeks worth of rows, so the dataset size was constrained/bounded, so slow downs weren't a result of tables getting larger.

In the end, we just gave up on MySQL, wrote our own thing that stores and accesses data on-disk directly. Disk footprint is over 2 orders of magnitude smaller, and all operations take constant time(no more than 20ms at 99pc), whereas in the past we ‘d get around 3s at best, and 10 or maybe 20s on average(not even at 99pc).

This is not about us doing anything “better” or about rolling your own alternative to generic datastores. It’s about MyRocks performance’s initially being good, but deteriorating very quickly - and how it compares with InnoDB and MyISAM, at least how it did for us. Also, using an RDBMS wasn't likely the right choice for what we were doing anyway, but for various reasons that's what we used.

It was in beta at the time, and I am sure there are tunables we could change to maybe get a better performance, but we didn’t really bother with any of that.

Twirrim · on Dec 3, 2017

At any stage, did you speak to a DBA? Those are some really weird choices you made along the way.

hendzen · on Dec 3, 2017

"We would DELETE from those tables every few hours because we were only interested in the last 2 weeks worth of rows, so the dataset size was constrained/bounded, so slow downs weren't a result of tables getting larger."

Yeah, this workload is hard on any LSM store, unless compaction is carefully tuned so that the tombstones are cleaned up often enough. Most people would recommend using partitioning...

markpapadakis · on Dec 3, 2017

Yes, we 'd insert new rows every 30 minutes or so. Also, we did try partitioning one of the 3 tables (which made sense) into multiple tables -- it didn't improve the situation at all.

Again, there are probably tunables and practices specific to MyRocks that we just didn't consider. We just didn't want to pursue this any further.

Shelnutt2 · on Dec 3, 2017

I believe hendzen was referring to using partition on a table in mysql.

Mariadb has an example walkthrough on this: https://mariadb.com/kb/en/library/partition-maintenance/#par... .

The only change that I'd recommend considering, is we use partition by list, instead of range, and keep a weeks worth of daily future partitions. This makes it so you don't have to reorganize the table. On the other hand it means if you get future data it can fail and if your partition jobs have issues for a week you can fail to load data all together.

Using partitions for time series data help also due to partition pruning, https://dev.mysql.com/doc/refman/5.7/en/partitioning-pruning... .

markpapadakis · on Dec 3, 2017

Thank you -- we didn't use table partitioning because we didn't have good results in the past, though we should probably have tried it, and maybe it'd have worked well. We just kind of gave up on mySQL use for that specific problem after failing (again, maybe its entirely because of our inability to use mySQL "properly") to get good results and moved on to something else.

innagadadavida · on Dec 3, 2017

You new here? Reinventing the wheel and blogging about it has glory. Where is the glory in writing some SQL that just works? Moreover it makes the code base very readable and maintainable, making your job redundant.

markpapadakis · on Dec 3, 2017

I suspected that folks would think this is about rolling your own thing as opposed to relying on existing solutions, and I thought I was clear what I said was not about that — I just thought it would be somewhat valuable to someone how MyRocks compared to InnoDB and MyISAM for our use case. That’s just one datapoint.

Doesn’t mean others will have a similar experience(in terms of performance and scalability) with us. Obviously. YMMV.

dbattaglia · on Dec 3, 2017

I think you need to just expect and brace for a certain amount of rage on HN if you say that you did something outside the norm. It’s unfortunate that people with literally 1 or 2 sentences of context can be so judgmental and completely miss the point.

Twirrim · on Dec 3, 2017

Note: in my original comment I rather deliberately did not call them the wrong choices, just weird. Hence asking if they'd engaged a DBA. They look like the kinds of choices that someone would make if they were just trying random things.

Twirrim · on Dec 3, 2017

You could build your stack out as a full LAMP setup, all on a single box. As things get a little busy you could spin up a second server, running all the software, adding master<->master replication between the databases and split the load between them. Then you could add a third, or even a fourth, creating a nice database replication loop between them (1->2->3->4->1) to ensure all db instances remain in sync. It would work. You could probably get a nice shiny blog post out of it about how easy it is to horizontally scale a web application. After all, you didn't even need to deal with all that hassle of splitting up writes and reads off to master and slave servers as appropriate. "We easily scaled our application horizontally, you could too!" Eventually you're going to run in to all sorts of hell with trying to keep the databases in sync. As your write load increases, the fleet won't be able to keep up, and everything on it will start fighting for IO. Approaching horizontal replication like that is such a horrible idea.

Alternative take, sadly from personal experience: At one job, I got asked to help out a customer who just couldn't get the performance they needed out of their database. They wanted me to help migrate them to a larger server, which I did. Not long afterwards they came back asking for help tuning their database, because it was still slow as molasses. Turns out it was a site trying to be like a yellow pages. Their central bottleneck came from a single field in a single table that detailed the businesses. That field was "categories", a nice FULL TEXT field. Every category had a four letter short code. They wanted to have businesses exist in multiple categories, and so what they'd do was have an alphabetically sorted list of short codes, semicolon separated, e.g. "DOCT;DENT;MEDI" (for a business that offered Doctor and Dentist services). When you went to look at the medical table, the query would do something roughly of the form "SELECT * FROM businesses WHERE categories LIKE '%MEDI;%';". This would have been about 2009. There was no way to index off that field, every query would have to do a full table scan of that column to find out every relevant business. It worked, and when just a small handful of people were using the site at the same time, everything was A-OK as the server was really powerful and brute force was viable. Add any more users and the thing would fall to its knees. It wasn't that MySQL was the problem, it wasn't, it was that they were using it wrong. Switching engines wouldn't have fixed the fundamental flaw in the schema. I even showed them how they could solve all of their problems with a relatively simple schema change, but they wouldn't change things. They did spend a lot of time ranting about how it was MySQL's problem and "How can Google manage it but MySQL can't", when it really, absolutely and truly wasn't MySQL's fault. From the company perspective, everything was mostly great. They were paying for database servers far bigger than they needed, and paying for my time to read their rants, and have my advice ignored. You can lead a horse to the water, but you can't make it drink, I guess?

Just because you can use a tool one way, doesn't mean it's the right way to use it.

From the perspective of your average jack-of-all-trades sysadmin, who has had to dabble in DBA work from time to time, here's what seemed really strange to me:

InnoDB -> MyISAM. That was a really, really strange choice. When you mention that at the outset as a change made, that's the kind of choice that rings alarm bells in my head and makes me think "They really don't know what they're doing". MyISAM is an older technology than InnoDB and has a large number of drawbacks to it. A few key ones:

* MyISAM writes lock up the entire table, vs InnoDB row level locking. The whole table becomes read only until the write is finished. You can only have one write or update happening at a time, forcing you in to effective serialisation for all writes. That introduces a nasty scaling limitation.

* MyISAM doesn't support transactions. It doesn't barf when it sees transaction related instructions, it just ignores them.

* MyISAM's crash recovery is next to negligible. InnoDB has transaction logs around and the like that help it to recover from a crash gracefully without data loss, along with a host of better approaches to data storage on disk.

* Development on it virtually ceased a long time ago. Lots of effort around query optimisations for modern architecture, multi-processing etc. etc. have gone in to InnoDB etc. and not in to MyISAM.

The only advantage MyISAM had over InnoDB for a while was lack of FULL TEXT column support in InnoDB, but that was added in version 5.6, which went RC about 5 years ago or so.

I can't imagine a single person who knows anything about MySQL considering that change.

You also indicated that the engines start out fine, but performance dropped over time, and that you were deleting data older than a certain length of time. That makes me think a few things:

1) Your tables were getting badly fragmented. By constantly deleting data older than a certain age, you were forcing reads and writes to be all over the place. The impact is worse if you're not using partitioned tables. Which leads into..

2) Not using MySQL native table partitioning. This introduces some major advantages with queries, allowing more parallelisation of various actions underneath it. It also has the advantage of limiting the scope of any operation, particularly index updates (as I understand it, index updates on a partitioned table only end up modifying the index for a single partition, rather than having to modify an index for the entire table).

markpapadakis · on Dec 3, 2017

Re: myISAM qualities: 1. Writes are infrequent; every 30 minutes or so we insert/update rows. Requests rate is very flow. This is for a specific report -- and said report was infrequently requested. Only updated by a single producer/process. All that means that locking wasn't a concern for us.

2. We didn't need transactions - if the producer would fail while it was executing the REPLACE statements, we 'd start it over and it wouldn't be a problem (idempotency)

3. We didn't care for crash recovery either -- if aything would go wrong, we 'd rebuild those tables (we only cared for 2 weeks or so worth of rows, rebuilding them wouldn't take long).

I think you ignore that, despite MyISAM's deficiencies, it's really fast if you don't care for the aforementioned properties/warranties provided by more modern engines. And it was -- for our dataset, it was almost twice as fast as InnoDB.

We have been running mySQL in production since release 3.x; we moved to it from mSQL. It may not mean much, but we know it mySQL well, at least some of our folks do.

As I said in another reply, we didn't use native table partitioning because we didn't get the expected benefits in a different use-case/dataset, but we certainly should have considered it.

Thank you for the suggestions though :)

continuations · on Dec 3, 2017

What did you use MyRocks for and what kind of data store you ended up writing on your own?

Over 2 orders of magnitude reduction in disk footprint and going from 20s to 20ms in query performance - those are incredible improvements. Would love to learn more about how you pulled that off.

markpapadakis · on Dec 3, 2017

It's not that incredible; for most use-cases there is often an optimal(or "better") way to implement a solution that's specifically designed and implemented to support it, as opposed to relying on systems and designs that are broadly applicable/useful (e.g RDBMS).

We just figured out exactly what we wanted, stored the data in chunks, compressed(we used 2 var-int encoding schemes, and snappy compression), indices(skip-lists) for each file and each chunk and for queries, we parellize access to as files required across multiple OS threads (scatter-gather). In fact, I am sure we could have gotten better performance if we wanted to spend more time on that problem. It wasn't novel or particularly interesting or hard anyway. Just something that needed to be done to help us solve a problem.

nwrk · on Dec 3, 2017

More juicy details is here

https://github.com/facebook/mysql-5.6/wiki

Also Percona backs it: https://www.percona.com/blog/2016/10/24/announcing-myrocks-i...

bstill · on Dec 3, 2017

And MariaDB: https://mariadb.com/kb/en/library/about-myrocks-for-mariadb/

perlgeek · on Dec 3, 2017

Since there seems to be a lot of confusion around transactions in MyRocks:

https://github.com/facebook/mysql-5.6/wiki/Transaction-Isola...

> MyRocks supports two of the four transaction isolation levels: Read committed and Repeatable reads

Repeatable Reads in the default isolation level you get with InnoDB as well (though you can also use "serializable", which MyRocks does not support): https://dev.mysql.com/doc/refman/5.7/en/innodb-transaction-i...

MySQL does not support transactions that span multiple storage engines (to the best of my knowledge), so you should try to avoid tables with different storage engines within the same database. At least if you need transactions.

falcolas · on Dec 3, 2017

Can anyone point at the performance differences with TokuDB? InnoDB is a great starting point, but it has a lot of limitations that TokuDB does not have.

dbks · on Dec 4, 2017

you might want to check out the tests conducted by Mark in his blog - https://smalldatum.blogspot.com/2017/12/io-bound-linkbench-a.... There's a lot more there.

vadimberman · on Dec 3, 2017

I couldn't find it in the docs (sorry if I missed something obvious), so a question to those who know the subject better: will there be a way to plug in an existing RocksDB database and run SQL queries, or they are going to be two different incompatible flavours?

kevincox · on Dec 3, 2017

This sounds like a different problem then the one being solved here. To do this it would probably be better to go for the "foreign table" interface then you can just point it at the db.

ddorian43 · on Dec 3, 2017

with what columns/serialization ?

vadimberman · on Dec 3, 2017

No idea how it's implemented, but why not come up with dummy names like "key" and "value"?

paulryanrogers · on Dec 3, 2017

Anyone know if they are MVCC based? ACID? Mobile site is Spartan on details.

EDIT: They do mention it's not transactional.

antoncohen · on Dec 3, 2017

Not transactional in what way?

Their "Limitations" doc doesn't mention that it lacks transactions (https://github.com/facebook/mysql-5.6/wiki/MyRocks-limitatio...). They have a "Transaction Isolation" doc that describes some options and differences (https://github.com/facebook/mysql-5.6/wiki/Transaction-Isola...).

paulryanrogers · on Dec 3, 2017

Perhaps I misunderstood:

"Using mixed storage engines is not recommended in production because it is not really transactional, ..."

http://myrocks.io/docs/getting-started/#migrating-from-innod...

electrum · on Dec 3, 2017

I believe that is saying that MySQL transactions across two storage engines are not transactional, as that would require two-phase-commit (2PC) or similar. It might also have implications for replication.

MyRocks itself definitely has transactions. As the wiki explains, it uses snapshots (MVCC) for transaction isolation.

VintageCool · on Dec 3, 2017

MySQL does have a system for two phase commit. I've seen it accidentally triggered on a system that executed queries that joined two different engines (InnoDB and TokuDB) and had the binlog turned off. It's called XA mode and it requires an fsync per commit to a transaction log file.

cakoose · on Dec 3, 2017

Where do they say it's not transactional?

1000units · on Dec 3, 2017

Very unprofessional name.

crispinb · on Dec 3, 2017

This misuse of ‘professionalism’ to refer to the faux Victorian mannerliness fashionable amongst lesser suits is just one more troubling aspect of loss of conceptual resources inculcated by corporate culture.

1000units · on Dec 3, 2017

It's objectively a matter of bad optics for a product marketed towards general professional audiences. I'm personally unoffended, but I recognize it as a mistake because I know business. I don't think you know how to size a suit.

If you're interested in making this about my personality, click on my username.

matt4077 · on Dec 3, 2017

I hate these mid-level pencil-pushers that "aren't personally offended" but terribly fear others might be.

There's an airline named Virgin, a morning-after pill "Plan B", one of SpaceX's vessels is called "Of Course I Still Love You", and the Secret Service uses a weapon called YAR ("Yet another rifle"). Insisting everything should be named "Widget X2000-F1" is the telltale sign of mediocrity.

degenerate · on Dec 3, 2017

Every time I see "The Boring Company" mentioned somewhere, I crack a smile. The business world needs more fun and liveliness, so I'm appreciative when someone names a product something neat.

crispinb · on Dec 3, 2017

Whether it is ‘objectively’ so or not has little to do with professionalism, which kind of makes my point (which is that an important concept is falling into disrepair because the word for it is being misused). Anyway it seems I offended you, for which I apologise. My ire was roused by the usage, not the writer. I have no doubt I do worse often enough.

jrs95 · on Dec 3, 2017

This really should have ended with a Jesse Pinkman-esque "Bitch".

crispinb · on Dec 3, 2017

Dare I admit I had no idea who that was? I only wasn’t laughed at because my mates already know TV references are wasted on me.

jaysoncena · on Dec 3, 2017

Can you give more context?

1000units · on Dec 3, 2017

It evokes the gesture of one grabbing himself and shouting something impolite, as depicted in any of a number of De Niro films, for example.

bdcravens · on Dec 3, 2017

Presumably that's a colloquialism not widely understood? That meaning really didn't occur to me until I read your comment.

jrs95 · on Dec 3, 2017

That's funny I was just doing that in response to your comments ;)

1000units · on Dec 3, 2017

I've got your database right here, pal!

kentor · on Dec 3, 2017

Maybe if it weren't plural. You're way over thinking things.

1000units · on Dec 3, 2017

There's two of them.

IMTDb · on Dec 3, 2017

Maybe I don't get it, but...why ? According to comment, some people might be offended by the name, but I really can't see how :/