Sparkey – Key/value storage by Spotify (opens in new tab)

(github.com)

109 pointspatricjansson12y ago55 comments

Sparkey is a simple constant key/value storage library. It is mostly suited for read heavy systems with infrequent large bulk inserts.

55 comments

levosmetalo12y ago

I remember my interview at Spotify where we discussed how to implement thumbnail display service in the most effective way. What we actually came to is something along the lines of this library.

I always like it when a company focus on their real problems in job interviews and manage to avoid the brain teaser trap. That way you can have a feeling about the job that you are going to work on there, and see if you really like both work and people.

rohansingh12y ago

Thanks :-) That is a go-to interview question for us and we actually all do it slightly differently and take it in different directions depending on your expertise or specialization.

It is really closely aligned to what our core service is (distributing and streaming files) and is a great chance to talk with the interviewee and figure out where their strengths are.

Totally encourage other people to interview this way. It's what I've done at the past few companies I've been at and really worked excellently — just think of a problem you're working or have worked on, and distill it into an interview problem.

staunch12y ago

I also like this style, but you have to be very mindful of the fact that you've been thinking about the problem 1000x more than the candidate. The Curse of Knowledge[1] haunts the interview process. I haven't tried it much, but maybe it would be better to always use a fresh problem in each interview that not even you had seen. Maybe selected from StackOverflow.

1. http://en.m.wikipedia.org/wiki/Curse_of_knowledge

jetz12y ago

If you're familiar with thumbnail display you know Facebook has Haystack for efficient image serving. It goes very low end. Sparkey or CDB or BAM (mentioned in this post) could be much less complex and can do similar work. Why did they go Haystack route I dont get it.

seiji12y ago

Are you asking why they didn't use Haystack? Haystack isn't released publicly as far as I can tell (plus, just because another company writes something doesn't mean it's actually good).

2 more replies

jdp12y ago

Another cool project is bam[1], a constant key/value server built on similar principles. A single input file (in this case, a TSV file instead of the SPL log file) and an index file. The cool thing about bam is that it uses the CMPH[2] library to generate a minimal perfect hash function over the keys in the input file before putting them in the index file.

[1]: https://github.com/StefanKarpinski/bam [2]: http://cmph.sourceforge.net/

StefanKarpinski12y ago

Wow, didn't expect this to make a mention on the front page of HN today. I never did convince Etsy to let me deploy bam in production, but it's so simple that it should be doable without much fuss. I mainly built it as a proof-of-concept to show that serving static data does not have to be difficult – and that loading large static data sets into a relational database is a truly wasteful, terrible approach. Are you actually using bam "in anger"?

krka12y ago

Bam looks really interesting, definitely a lot simpler than Sparkey, and the basic principle is the same. I have been hesitant to use perfect hashing for Sparkey since I wasn't sure how well it holds up for really large data sets (close to a billion keys). Impressive to write it in less than 300 lines of clean code!

krka12y ago

There are a bunch of comments related to performance data - the code is available so nothing is stopping anyone from making an unbiased comparison. :)

That said, I intend to publish some sort of performance comparison code / results. The downside with me doing it is that: 1) I know the sparkey code much better than I know level-db or any other solution, so the tuning parameters will probably be suboptimal for the other solutions. 2) I will only focus on our specific usecase (write large bulks, do lots of random reads), which may seem a bit unfair to the most general solutions.

krka12y ago

Here are some preliminary performance benchmarks from my regular workstation (Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz, 8 GB RAM): http://pastebin.com/7buZVgdu

The sparkey usage is fairly optimized, but I just randomly put something together for the level-db, so consider the results extremely biased.

hyc_symas12y ago

Where's the source code for your bench? How large are the records you're loading? How large are the keys? What's the insert order?

How does your test compare to http://symas.com/mdb/microbench/ ? If you're going to try to talk about numbers, talk about them in a meaningful context. Right now you're just handwaving.

krka12y ago

Yes, this post was handwaving, and I tried to make that clear ("preliminary", "extremely biased").

On monday I added some slightly more proper benchmark code, you can find it on https://github.com/spotify/sparkey/blob/master/src/bench.c

I didn't add the level-db code to this benchmark however, since I 1) didn't want to manage that dependency 2) didn't know how to write optimized code for usage of it.

I'm using very small records, a couple of bytes of key and value. The insert order is strictly increasing (key_0, key_1, ...), though that doesn't really matter for sparkey since it uses a hash for lookup instead of ordered lists or trees.

As for the symas mdb microbench, I only looked at it briefly but it seems like it's not actually reading the value it's fetching, only doing the lookup of where it actually is, is that correct?

"MDB's zero-memcpy reads mean its read rate is essentially independent of the size of the data items being fetched; it is only affected by the total number of keys in the database."

Doing a lookup and not using the values seems like a very unrealistic usecase.

Here's the part of the benchmark I'm referring to: for (int i = 0; i < reads_; i++) { const int k = rand_.Next() % reads_; key.mv_size = snprintf(ckey, sizeof(ckey), "%016d", k); mdb_cursor_get(cursor, &key, &data, MDB_SET); FinishedSingleOp(); }

1 more reply

atdt12y ago

I was surprised to see LevelDB (https://code.google.com/p/leveldb/) was missing from the list of storage solutions you tried, because it seems optimal for your use-case. Were you aware of it?

blippie12y ago

I'm not sure about the optimal use-case match. Sparkey is for "mostly static" datasets where on disk structures are generated by a batch process and pushed to servers providing read only access to the data to consumers.

leveldb on the other hand, supports concurrent writes and provides features to handle data consistency and cheap gradual reindexing.

illumen12y ago

Also Sparkey seems to work well with bittorrent/rsync distribution. I recall spotify use bittorrent to distribute files to their servers.

sorbits12y ago

According to GitHub, last commit (which is also initial checkin) is two years old.

woogley12y ago

The project is hosted at Google Code. Last commit was Aug 21: https://code.google.com/p/leveldb/source/detail?r=748539c183...

2 more replies

tinco12y ago

Why do you think it seems optimal for their use case?

A quick glance over LevelDB's features gives me the impression that its bulk-write performance would not be sufficient.

progx12y ago

The problem is, that so many projects for so many things exists. It is hard to find the matching project, thats why many people invent the wheele again.

hyc_symas12y ago

LevelDB has very few good use cases and this certainly isn't one of them. http://symas.com/mdb/microbench/ http://symas.com/mdb/hyperdex/

rschmitty12y ago

I wish more projects would follow this kind of readme format, at least somewhat. There are so many new things that popup on HN but have very little information about all the whats and whys I should care

What problem are you solving?

If existing solutions existed what hurdles did you face with them and how did you overcome them with your custom solution?

How do you compare from a performance view? (granted they still need to do this, but at least put in a section about it)

huhtenberg12y ago

Looks like a cdb variation that moves index to a separate file and therefore allows changing the database (to a degree) without requiring a rebuild.

[0] http://en.wikipedia.org/wiki/Cdb_%28software%29

js212y ago

Indeed, from the README: We used to rely a lot on CDB (which is a really great piece of software). It performed blazingly quick and produces compact files. We only stopped using it when our data started growing close to the 4 GB limit

ptramo12y ago

A 64-bit port of cdb isn't too hard: https://github.com/pcarrier/cdb64

jflatow12y ago

Similar also to DiscoDB, which does support compression, and uses perfect hashing for constant time-lookup with minimal disk access.

Not only that but it provides lightning-fast conjunctive normal form queries, a.k.a logical combinations of primitive keys. Plus it has Python / Erlang bindings.

http://discodb.rtfd.org

https://github.com/jflatow/discodb

seiji12y ago

Yeah, my first two thoughts were discodb and bitcask http://basho.com/hello-bitcask/ too.

wyuenho12y ago

I'm baffled by the choice of using the GNU autofools chain just to include a Doxygen target in the Makefile. The whole thing is essentially straight up C with just 1 library dependency.

The command line argument processing is also quite haphazardly done, it's not like it using getopt or whatever that poses compatibility issues. Is writing and packaging with a Makefile that difficult?

blippie12y ago

We used to have a "simple" makefile, but once we started to support multiple development environments (OSX, various Linux flavors) it got more and more complex. Autotools actually brings a lot of functionality as part of the package and is what people expect. I'm no fan, but it works for our use case.

tinco12y ago

They wrote a database to solve an operational need. From experience I can tell you that's an endeavour you should strife to spend as little time on as possible.

I think it's a miracle they produced something they feel comfortable sharing with the world. If you write a database in house, and the tool chain and the argument processing are the only things done haphazardly, then hats off to you :)

wyuenho12y ago

Whoa I sense passive-aggressiveness :) I'm still waiting for those benchmarks. The code is very clean and simple I was just picking bones. The use case seems to be overly specific tho. Is there any other example use cases where this library could be useful?

nodata12y ago

Submit a patch. Show them how it could be better.

hyc_symas12y ago

rm -rf sparkey

apt-get install lmdb

jetz12y ago

There is LMDB (http://symas.com/mdb/) storage solution for read heavy workloads. Alternative to LevelDB or CDB.

slynux12y ago

This looks interesting. Can be used for deduplication by keeping this hash table on disk for large amount of data

I wrote something like this to optimize disk seeks heavily by returning a reference of 8 byte and keeping a hashtable in memory. A mostly-append only records store that allowing mutations of same key and by rounding size of blobs by power of 2. Written to optimize storage layer for Membase.

https://github.com/t3rm1n4l/lightkv

krka12y ago

I have now created a very simple benchmark suite to give you some rough performance numbers, and updated the README to include some sample numbers for one specific machine.

rythie12y ago

I'm struggling to find something that it does that a webserver pointed at the filesystem doesn't do (with hash-ids for file names). I'm wondering if that's all it is, with a bit of logic to write the files in the correct structure.

sp33212y ago

From the description: Sparkey is an extremely simple persistent key-value store. You could think of it as a read-only hashtable on disk and you wouldn't be far off.

rythie12y ago

Good point, I missed that. Read the feature list which lists stuff the filesystem does itself.

kirbyk12y ago

This goes to prove that simple is fast.

j / k navigate · click thread line to collapse

55 comments

levosmetalo12y ago

I remember my interview at Spotify where we discussed how to implement thumbnail display service in the most effective way. What we actually came to is something along the lines of this library.

rohansingh12y ago

Thanks :-) That is a go-to interview question for us and we actually all do it slightly differently and take it in different directions depending on your expertise or specialization.

It is really closely aligned to what our core service is (distributing and streaming files) and is a great chance to talk with the interviewee and figure out where their strengths are.

staunch12y ago

1. http://en.m.wikipedia.org/wiki/Curse_of_knowledge

jetz12y ago

seiji12y ago

Are you asking why they didn't use Haystack? Haystack isn't released publicly as far as I can tell (plus, just because another company writes something doesn't mean it's actually good).

2 more replies

jdp12y ago

[1]: https://github.com/StefanKarpinski/bam [2]: http://cmph.sourceforge.net/

StefanKarpinski12y ago

krka12y ago

There are a bunch of comments related to performance data - the code is available so nothing is stopping anyone from making an unbiased comparison. :)

krka12y ago

Here are some preliminary performance benchmarks from my regular workstation (Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz, 8 GB RAM): http://pastebin.com/7buZVgdu

The sparkey usage is fairly optimized, but I just randomly put something together for the level-db, so consider the results extremely biased.

hyc_symas12y ago

Where's the source code for your bench? How large are the records you're loading? How large are the keys? What's the insert order?

How does your test compare to http://symas.com/mdb/microbench/ ? If you're going to try to talk about numbers, talk about them in a meaningful context. Right now you're just handwaving.

krka12y ago

Yes, this post was handwaving, and I tried to make that clear ("preliminary", "extremely biased").

On monday I added some slightly more proper benchmark code, you can find it on https://github.com/spotify/sparkey/blob/master/src/bench.c

I didn't add the level-db code to this benchmark however, since I 1) didn't want to manage that dependency 2) didn't know how to write optimized code for usage of it.

As for the symas mdb microbench, I only looked at it briefly but it seems like it's not actually reading the value it's fetching, only doing the lookup of where it actually is, is that correct?

"MDB's zero-memcpy reads mean its read rate is essentially independent of the size of the data items being fetched; it is only affected by the total number of keys in the database."

Doing a lookup and not using the values seems like a very unrealistic usecase.

1 more reply

atdt12y ago

I was surprised to see LevelDB (https://code.google.com/p/leveldb/) was missing from the list of storage solutions you tried, because it seems optimal for your use-case. Were you aware of it?

blippie12y ago

leveldb on the other hand, supports concurrent writes and provides features to handle data consistency and cheap gradual reindexing.

illumen12y ago

Also Sparkey seems to work well with bittorrent/rsync distribution. I recall spotify use bittorrent to distribute files to their servers.

sorbits12y ago

According to GitHub, last commit (which is also initial checkin) is two years old.

woogley12y ago

The project is hosted at Google Code. Last commit was Aug 21: https://code.google.com/p/leveldb/source/detail?r=748539c183...

2 more replies

tinco12y ago

Why do you think it seems optimal for their use case?

A quick glance over LevelDB's features gives me the impression that its bulk-write performance would not be sufficient.

progx12y ago

The problem is, that so many projects for so many things exists. It is hard to find the matching project, thats why many people invent the wheele again.

hyc_symas12y ago

LevelDB has very few good use cases and this certainly isn't one of them. http://symas.com/mdb/microbench/ http://symas.com/mdb/hyperdex/

rschmitty12y ago

What problem are you solving?

If existing solutions existed what hurdles did you face with them and how did you overcome them with your custom solution?

How do you compare from a performance view? (granted they still need to do this, but at least put in a section about it)

huhtenberg12y ago

Looks like a cdb variation that moves index to a separate file and therefore allows changing the database (to a degree) without requiring a rebuild.

[0] http://en.wikipedia.org/wiki/Cdb_%28software%29

js212y ago

ptramo12y ago

A 64-bit port of cdb isn't too hard: https://github.com/pcarrier/cdb64

jflatow12y ago

Similar also to DiscoDB, which does support compression, and uses perfect hashing for constant time-lookup with minimal disk access.

Not only that but it provides lightning-fast conjunctive normal form queries, a.k.a logical combinations of primitive keys. Plus it has Python / Erlang bindings.

http://discodb.rtfd.org

https://github.com/jflatow/discodb

seiji12y ago

Yeah, my first two thoughts were discodb and bitcask http://basho.com/hello-bitcask/ too.

wyuenho12y ago

I'm baffled by the choice of using the GNU autofools chain just to include a Doxygen target in the Makefile. The whole thing is essentially straight up C with just 1 library dependency.

blippie12y ago

tinco12y ago

They wrote a database to solve an operational need. From experience I can tell you that's an endeavour you should strife to spend as little time on as possible.

wyuenho12y ago

nodata12y ago

Submit a patch. Show them how it could be better.

hyc_symas12y ago

rm -rf sparkey

apt-get install lmdb

jetz12y ago

There is LMDB (http://symas.com/mdb/) storage solution for read heavy workloads. Alternative to LevelDB or CDB.

slynux12y ago

This looks interesting. Can be used for deduplication by keeping this hash table on disk for large amount of data

https://github.com/t3rm1n4l/lightkv

krka12y ago

I have now created a very simple benchmark suite to give you some rough performance numbers, and updated the README to include some sample numbers for one specific machine.

rythie12y ago

sp33212y ago

From the description: Sparkey is an extremely simple persistent key-value store. You could think of it as a read-only hashtable on disk and you wouldn't be far off.

rythie12y ago

Good point, I missed that. Read the feature list which lists stuff the filesystem does itself.

kirbyk12y ago

This goes to prove that simple is fast.

j / k navigate · click thread line to collapse