Scribe: Transporting petabytes per hour via a distributed, buffered queueing (opens in new tab)

(engineering.fb.com)

124 pointsgluegadget6y ago42 comments

42 comments

braggy PR is misleading: the 25GB/s coming from CERN is after they filter the data down from 600TB/s because there are no commercial systems that can capture data at higher rates.

breck6y ago

This is a good point!

Just for fun, for more perspective on big data, a human body generates around 1-10M new cells per second, and a cell contains about 10-100GB of information. So a single human is generating 1-100PB/s of data just in the new cells! (Give or take a few OOM)

throwaway_bad6y ago

Are you trying to quantify the "information" by the size of the DNA? I think this is a pretty meaningless number to multiply since most of the DNA will be exact copies and DNA alone doesn't capture all the information about a cell.

OTOH the amount of "information" needed to perfectly simulate a cell is probably unbounded. Just a corollary of the fact that we currently don't know how to perfectly simulate reality. Even a single "real" number can take up infinite space.

3 more replies

gnufx6y ago

I'm not sure what that means, but the Cori filesystem is rated at 700GB/s and Summit's 2.5TB/s. See https://docs.nersc.gov/filesystems/cori-scratch/ and https://www.olcf.ornl.gov/olcf-resources/compute-systems/sum...

dekhn6y ago

it's pretty simple. the physical data acquisition devices (ATLAS is an example) collect data at rates in the 100s of terabytes/sec https://home.cern/science/computing/processing-what-record)

No storage system can store that data (and most of it is not useful) so they have a series of hardware triggers and buffers that reduce the data down to roughly what modern (general purpose) hardware is capable of handling. They tune the thresholds to match what consumer hardware is capable of.

With regard to supercomputer filesystems: nobody wants to use GPFS. CERN's EOS sustained (theoretical) 3.3TB/sec in Apr 2015, so it's not like they're uncompetitive with the largest supercomputer...

1 more reply

packetslave6y ago

it's adorable how you call it "braggy PR" when almost every major technology company these days (FB, Google, Amazon, Uber, Pinterest, etc., pretty much everybody except Apple) has an engineering blog where they share possibly-interesting work they've done.

mkasu6y ago

Apple also has a blog where they discuss (some of) their machine learning results[1].

1: https://machinelearning.apple.com/

londons_explore6y ago

Keep in mind the network cost of petabytes per hour cross continent.

Those of us who don't own our own cross-ocean fiber can't afford to design systems like this.

carapace6y ago

Never underestimate the bandwidth of a cargo ship full of SSDs.

(I'm paraphrasing an old, old joke.)

noir_lord6y ago

Fairly high latency though I guess.

1 more reply

gnulinux6y ago

Sounds like a startup idea.

liveoneggs6y ago

Level 3 and Vodafone have a bit of a first mover advantage here

100k6y ago

Naming is hard! Facebook used to have a _different_ Scribe (https://en.wikipedia.org/wiki/Scribe_(log_server)).

We used it at a company I worked for, but it had long-since been deprecated, so I was confused when I saw this Scribe.

johndoe3456y ago

This is the same scribe. Facebook closed-sourced it because it was too hard to maintain an open version and a version that addresses Facebook's needs.

naringas6y ago

I wonder why couldn't they just make the open source version address their needs...

1 more reply

saisundar6y ago

>"The growing number of complex components made it difficult to retain an open source version stripped of our internal specifications. This complexity is the main reason that we archived the open source Scribe project. The current version of Scribe’s architecture is detailed below, with a focus on the components that comprise the data plane. ". Quoted from the article.

gnud6y ago

Scribe is also an enterprise integration/ETL tool. For _many_ years before FB existed.

javiermaestro6y ago

Interesting! Just found out about this, and I googled it out. TIBCO acquired Scribe Software, and Scribe Software was running as early as 1998-1999! :D

https://web.archive.org/web/20120301000000*/http://www.scrib...

The initial stuff seems to not be so related, but the current description of what they do seems much much closer to what Facebook's Scribe does today :)

Naming is hard! :D

dividuum6y ago

Is it normal for these internal system to not implement any kind of access control? From the post it seems every reader can access every stream?

javiermaestro6y ago

That's actually not the case, there's access control :)

The article just focuses on certain areas of the system and doesn't go into the security and privacy parts, that's all.

(I work in Scribe)

sjg0076y ago

Aka not a product requirement... Users are trusted etc... Works great until it doesn't..

pvlak6y ago

why can't Producer/Scribed write directly to LogDevice storage. Any reasons for routing through WriteService.

thetrooperer6y ago

That's a good question. There are multiple reasons for this. I'll briefly mention two of them. One is the high fan in ratio - millions of machines are writing relatively small blobs of data, so the middle layer serves as an aggregator (which saves backend's IOPs, number of connections, etc). Another reason is the volume of metadata - it would be inefficient to keep all the LogDevice-level metadata on each of the producer hosts.

pvlak6y ago

Will the WriteService(Aggregator) make sense for environments having thousands of machines(not in millions) and they are all within the DataCenter. In our company, we are moving away from this design of having aggregators, to directly writing to Storage whereever possible, as it reduces the message loss.

On the volume of metadata held by Producers, will there be any significant difference between holding WriteService & LogDevice meta.

1 more reply

bradhe6y ago

So, if I'm reading this correctly, 2.5GB/s of log data being generated? If we assume (aggressively) that they have 5mil machines in their infrastructure, doesn't that mean that each machine would have to be generating 500kB/s of log data?

Despite that, I find the claims to be underwhelming. So your system can process massive amounts of data by scaling massively horizontally...neat.

javiermaestro6y ago

The number in the article is 2.5 TB/s, not GB/s :)

(disclaimer: I work in Scribe)

bradhe6y ago

Right—sorry. But point still stands. Under what circumstances was that much data being generated from (what I’m assuming is) normal logging?

1 more reply

ninju6y ago

How does this compare to a robust Splunk infrastructure?

packetslave6y ago

For what you'd pay for a Splunk license that can handle petabytes/hour of data, it would probably be cheaper to just buy Facebook and use Scribe. :-)

j / k navigate · click thread line to collapse

42 comments

dekhn6y ago

braggy PR is misleading: the 25GB/s coming from CERN is after they filter the data down from 600TB/s because there are no commercial systems that can capture data at higher rates.

breck6y ago

This is a good point!

throwaway_bad6y ago

3 more replies

gnufx6y ago

dekhn6y ago

it's pretty simple. the physical data acquisition devices (ATLAS is an example) collect data at rates in the 100s of terabytes/sec https://home.cern/science/computing/processing-what-record)

With regard to supercomputer filesystems: nobody wants to use GPFS. CERN's EOS sustained (theoretical) 3.3TB/sec in Apr 2015, so it's not like they're uncompetitive with the largest supercomputer...

1 more reply

packetslave6y ago

mkasu6y ago

Apple also has a blog where they discuss (some of) their machine learning results[1].

1: https://machinelearning.apple.com/

londons_explore6y ago

Keep in mind the network cost of petabytes per hour cross continent.

Those of us who don't own our own cross-ocean fiber can't afford to design systems like this.

carapace6y ago

Never underestimate the bandwidth of a cargo ship full of SSDs.

(I'm paraphrasing an old, old joke.)

noir_lord6y ago

Fairly high latency though I guess.

1 more reply

gnulinux6y ago

Sounds like a startup idea.

liveoneggs6y ago

Level 3 and Vodafone have a bit of a first mover advantage here

100k6y ago

Naming is hard! Facebook used to have a _different_ Scribe (https://en.wikipedia.org/wiki/Scribe_(log_server)).

We used it at a company I worked for, but it had long-since been deprecated, so I was confused when I saw this Scribe.

johndoe3456y ago

This is the same scribe. Facebook closed-sourced it because it was too hard to maintain an open version and a version that addresses Facebook's needs.

naringas6y ago

I wonder why couldn't they just make the open source version address their needs...

1 more reply

saisundar6y ago

gnud6y ago

Scribe is also an enterprise integration/ETL tool. For _many_ years before FB existed.

javiermaestro6y ago

Interesting! Just found out about this, and I googled it out. TIBCO acquired Scribe Software, and Scribe Software was running as early as 1998-1999! :D

https://web.archive.org/web/20120301000000*/http://www.scrib...

The initial stuff seems to not be so related, but the current description of what they do seems much much closer to what Facebook's Scribe does today :)

Naming is hard! :D

dividuum6y ago

Is it normal for these internal system to not implement any kind of access control? From the post it seems every reader can access every stream?

javiermaestro6y ago

That's actually not the case, there's access control :)

The article just focuses on certain areas of the system and doesn't go into the security and privacy parts, that's all.

(I work in Scribe)

sjg0076y ago

Aka not a product requirement... Users are trusted etc... Works great until it doesn't..

pvlak6y ago

why can't Producer/Scribed write directly to LogDevice storage. Any reasons for routing through WriteService.

thetrooperer6y ago

pvlak6y ago

On the volume of metadata held by Producers, will there be any significant difference between holding WriteService & LogDevice meta.

1 more reply

bradhe6y ago

Despite that, I find the claims to be underwhelming. So your system can process massive amounts of data by scaling massively horizontally...neat.

javiermaestro6y ago

The number in the article is 2.5 TB/s, not GB/s :)

(disclaimer: I work in Scribe)

bradhe6y ago

Right—sorry. But point still stands. Under what circumstances was that much data being generated from (what I’m assuming is) normal logging?

1 more reply

ninju6y ago

How does this compare to a robust Splunk infrastructure?

packetslave6y ago

For what you'd pay for a Splunk license that can handle petabytes/hour of data, it would probably be cheaper to just buy Facebook and use Scribe. :-)

j / k navigate · click thread line to collapse