Stenographer – A full-packet-capture utility (opens in new tab)

(github.com)

61 pointsdionyziz11y ago21 comments

21 comments

A bit surprised to see no support for DPDK. While using AF_PACKET makes the project a bit more portable, you'll be able to save a lot of cycles by using skipping the kernel all together.

kcudrevelc11y ago

I blame blissful ignorance: I'm not at all familiar with DPDK. I'll definitely read up, though!

I wonder if O_DIRECT writes can happen from DPDK memory space? If not, we don't gain anything, since we'd need to copy packets into RAM for writes anyway.

Supporting stock Linux is definitely a nice-to-have... I'd like to make this a relatively easily installed deb. Currently, all dependencies are available via apt-get in stock Ubuntu.

akadien11y ago

I would add netmap to that list, too.

micheloosterhof11y ago

Interesting!

An open source high performance rolling packet dump with packet index for incident response.

To be honest, I had imagined Google already had solutions like this internally :)

These have been commercially available for a few years, look at RSA Security Analytics (formerly NetWitness), or BlueCoat Security Analytics (formerly Solera). Or the (open source) Bro Time Machine. I used to work with one of these products in the past.

What make systems like this a lot more powerful is more and easier search and retrieval. While indexing IP numbers and port numbers is good, it will get much more useful if you can connect it to something like 'bro' and get session level data and then index filenames, user-agents, file hashes, and others pieces of information. I'm sure you can see the use cases.

Having an easy way to query 'all traffic with this particular user agent', together with the full packet capture, which allows you to write new rules, can significantly increase the efficiency of a security team.

Apart from the streaming analytics, once the PCAP data is stored, you can use mapreduce type operations on them to search through yesterday's data with today's IDS signatures (look at PacketPig/what Packetloop does). Maybe a lambda architecture is the way to go, or just reprocess old data through the same stream processing.

Cool work though! I'm curious where this will go next.

mlacitation11y ago

The design document is a good read and includes high-level details of how they're grabbing packets (AF_PACKET), the packet index format (leveldb), and defensive action they took (fuzz testing via AFL, setcap, setcomp):

https://github.com/google/stenographer/blob/master/DESIGN.md

kcudrevelc11y ago

Hey, thanks! If you have any additional questions about the design process, internals, etc, feel free to ask. I'm the primary author of the project, and I'll be refreshing the HN post for the next hour or so trying to answer questions as they come up, and/or updating the docs.

MichaelGG11y ago

What kind of performance do you see when searching over, say, 10TB/day or two of captured data? It seems like the query would have to open a file for every minute? Have you considered a higher level index, to tell which minute files are worth inspecting? (I realize this only helps when searching for more unique characteristics.)

Is LevelDB the best choice out there for write once KV pairs? For, say, IP address indexing, what's the final bits/packet overhead of indexing?

I didn't see any compression for the packet data. Did you consider high perf compression like LZ4?

Is AF_PACKET better than PF_RING+DNA? It's been a while since I looked but with hardware accel they boasted massive perf advantages.

Excellent design docs and cool work!

kcudrevelc11y ago

Hey, great questions!

Query Performance: Right now, we've got test machines deployed with 8 500GB disks for packets + 1 indexing disk (all 15KRPM spinning disks). They keep at 90% full, or roughly 460GB/disk, about 1K files/disk. Querying over the entire corpus (~4TB of packets) for something innocuous like 'port 65432' takes 25 seconds to return ~50K packets (that's after dropping all disk caches). The same query run again takes 1.5 sec, with disk caches in place. Of course, the number of packets returned is a huge factor in this... each packet requires a seek in the packets file. Searching for something that doesn't exist (host 0.0.0.1) takes roughly 5 seconds. Note that time-based queries, like "port 4444 and after 3h ago and before 1h ago" do choose to only query certain files, taking advantage of the fact that we name files by microsecond timestamp and we flush files every minute.

A big part of query performance is actually over-provisioning disks. We see disk throughput of roughly 160-180MB/s. If we write 160MB/s, our read throughput is awful. If we write 100MB/s, it's pretty good. Who would have thought: disks have limited bandwidth, and it's shared between reads and writes. :)

We actually don't use LevelDB... we use the SSTables that underly LevelDB. Since we know we're write-once, we use https://github.com/google/leveldb/blob/master/include/leveld... directly for writes (and its Go equivalent for reads). I'm familiar with the file format (they're used extensively inside Google), so it was a simple solution. That said, it's been very successful... we tend to have indexes in the 10s of MBs for 2-4GB files. Of course, index size/compressibility is directly correlated with network traffic: more varied IPs/ports would be harder to compress. The built-in compression of LevelDB tables is also a boon here... we get prefix compression on keys, plus snappy compression on packet seek locations, for free.

We currently do no compression of packets. Doing so would definitely increase our CPU usage per packet, and I'm really scared of what it would do to reads. Consider that reading packets in compressed storage would require decompressing each block a packet is in. On the other hand, if someone wanted to store packets REALLY long term, they could easily compress the entire blockfile+index before uploading to more permanent storage. I expect this would be better than having to do it inline. Even if we did build it in, we'd probably do it tiered (initial write uncompressed, then compress later on as possible).

AF_PACKET is no better than PF_RING+DNA, but I also don't think it's any worse. They both have very specific trade-offs. The big draw for me for AF_PACKET is that it's already there... any stock Linux machine will already have it built in and working. Thus steno should "just work", while a PF_RING solution has a slightly higher barrier to entry. I think PF_RING+DNA should give similar performance to steno... but libzero currently probably gives better performance because packets can be shared across processes. This is a really interesting problem that I'm wondering if we could also solve with AF_PACKET... but that's a story for another day. Short story: I wanted this to work on stock linux as much as possible.

1 more reply

signa1111y ago

> The design document is a good read ...

and large chunks of code is in go :), with only performance related stuff (read packet-capture) being done in c++, pretty cool.

e28eta11y ago

They probably don't want to give away too much (like security details of their network), but I think it'd be more compelling with some examples of how to use this for Intrusion Detection.

It's a topic I don't know much about, and I think it'd reinforce the claim this isn't for user monitoring.

CHY87211y ago

Ok so this isn't a Google product. In brief (please correct if I'm wrong), Google lets its employees work on their own side projects on company resources if they assign copyright to Google. This means that it gets published on the Google github account, but is then denoted to not be a Google product - it's someone's side project.

I do however have at least anecdotal experience with how these sorts of systems work. The idea is that as a large company, you traditionally pump all of your internet through a firewall, which scans it all online, does deep packet inspection etc to look for attackers.

Then, because it takes up a lot of space, you ditch it, and perhaps keep finer grained logfiles - perhaps just the DNS requests or headers or suspicious packets etc.

The idea here is that for many companies, this isn't helpful when you do get owned - you'll have deleted most of the relevant data (showing exactly what got exfiltrated etc, how it happened etc) and you might have some logfiles showing TCP addresses but you know little else.

Since a company of 1000 will use no more than around 1-10TB per day for its staff, it's actually now feasible to store every packet that is sent in and out of your network - you could store for 90 days on around 0.1-1PB - which is actually fairly affordable for a company of that size.

Then, you either run large (more expensive than can be done in a firewall) jobs over the data offline to look for intrusions, or wait for a breach and then drill down on the data to try to learn exactly what happened.

The reason why this isn't really a tool for monitoring users is:

a) What can you do to track users that you couldn't already do with systems that don't store all the data? b) The target seems to be corporate networks who can and should monitor what their users are doing on their network. c) The nature of this sort of data is that because it's not really indexed any specific searches would be very expensive - perhaps requiring runthroughs of terabytes of data. So individually spying on many people isn't really doable without further processing - this is really just a big packet dumper.

If you were going to try and monitor random Joe Public, then you'd certainly be fitting a device like this to a computer their traffic would be passing through - but this isn't useful for someone who's not an ISP or nation state (and in that case, there'd probably be smarter ways of doing this (since here, you can only sniff local connections)). For Google, the most they'd be able to sniff is communications from their users to their own servers - which isn't a huge bonus for the costs.

Even for an ISP, it'd just be massively expensive and unhelpful - a UK ISP (Plusnet) I just searched up has around 800,000 ADSL users, and at peak time they see total usage of 130Gbps-ish. Even assuming average half utilisation of 65Gbps, that's still 702TB a day. That's a massive amount of data to store for any reason. The reason you (bad person) only store the metadata is beause the metadata is the valuable part!

I welcome corrections :)

kcudrevelc11y ago

No corrections necessary, you're right on the money.

This is a 20% project. While it's one we plan to use internally, it's not a "supported" Google product. It's just another open-source project along with the many others we use to keep our networks secure.

Also, it's designed specifically to do one thing (packet history) and do it well. In no way is it a complete solution; this is a building block for network detection and response.

To reiterate some of the salient points:

1) Disk is REALLY cheap these days.

2) NIDS don't store lots of history, because they're optimized for detecting patterns and signatures. So they might find something in the middle of a TCP stream and send an alert, but you don't have much context around it. This allows you to build that context by requesting all packets from that stream during a (possibly very long) time range.

3) There's a ton of reasons why this isn't used to monitor users:

* it's wrong: I'd flat-out refuse to build something designed to monitor users

* it wouldn't work #1: most interesting user traffic is encrypted on the wire

* it wouldn't work #2: our production network architecture is not good at single aggregation points

* it wouldn't work #3: there aren't enough disks in the world to handle our production network load

* it's redundant: applications can already do per-application, structured monitoring as necessary for debugging/auditing/etc.

e28eta11y ago

Thanks!

> Then, you either run large (more expensive than can be done in a firewall) jobs over the data offline to look for intrusions, or wait for a breach and then drill down on the data to try to learn exactly what happened.

I was thinking in terms of offline jobs, and don't have a good intuition for what those rules would look like. I'm also skeptical that your average company would have the expertise to write a good set of rules. So I was interested to see that "half" of an IDS tool.

I think the real answer is that it truly is just a rolling packet dump, and it's up to you to use it however you choose.

I can think of uses outside of network security: capturing traffic from your mobile devices on your home network (maybe this is just IDS if you're watching for the contents of your address book to be exfiltrated by a malicious app), or snooping on people through a Internet cafe, library, or other (small) open network that you administer.

For these uses, just like IDS, you'd want to run offline jobs against the data. Whether that's a full scan for something interesting, or an indexing pass that extracts (portions?) into a more easily viewable form.

kcudrevelc11y ago

Offline jobs are an interesting idea, but they weren't what we were really thinking of. Instead, we use stenographer more like a database of recent traffic. Consider this as a simple use case for intrusion detection:

  set up snort and steno
  foreach snort alert
    request all packets in stream from steno: srcIP,srcPort,dstIP,dstPort match
    OR request all packets on that srcIP,dstIP, to get OTHER connections between those hosts
    store pcap to directory (or central DB, or whatever)

Then, when a human analyst wants to investigate the alert, instead of getting the very limited PCAP that comes out of snort, they get a ton of data they can use to build context, write new detection rules, etc.

warmwaffles11y ago

What does this offer that tcpdump doesn't?

ithkuil11y ago

1) Performance. Zero copy ("The kernel writes them from the NIC to shared memory, then the kernel uses that same shared memory for O_DIRECT writes to disk. The packets transit the bus twice and are never copied from RAM to RAM."). Parallelism.

2) Disk management. Rotates old data, etc

3) Indexing and supports efficient retrieval while writing.

It allows to analyse the traffic after the fact, at 10Gbps line speed.

akadien11y ago

You can get zero-copy for tcpdump with PF_RING or netmap.

ithkuil11y ago

I'm aware of libpcap's ability to share memory with a user buffer, but I didn't find any mention that tcpdump utility is actually written to exploit it for extra fast writes.

Look here how they handle this in stenographer: https://github.com/google/stenographer/blob/65fb928e6bce276c...

I guess that in principle they could have patched tcpdump, but it's probably easier to have a smaller software written to do exactly what you want rather than extend a general purpose mature complex tool such as tcpdump.

j / k navigate · click thread line to collapse

21 comments

dryicerx11y ago

A bit surprised to see no support for DPDK. While using AF_PACKET makes the project a bit more portable, you'll be able to save a lot of cycles by using skipping the kernel all together.

kcudrevelc11y ago

I blame blissful ignorance: I'm not at all familiar with DPDK. I'll definitely read up, though!

I wonder if O_DIRECT writes can happen from DPDK memory space? If not, we don't gain anything, since we'd need to copy packets into RAM for writes anyway.

Supporting stock Linux is definitely a nice-to-have... I'd like to make this a relatively easily installed deb. Currently, all dependencies are available via apt-get in stock Ubuntu.

akadien11y ago

I would add netmap to that list, too.

micheloosterhof11y ago

Interesting!

An open source high performance rolling packet dump with packet index for incident response.

To be honest, I had imagined Google already had solutions like this internally :)

Cool work though! I'm curious where this will go next.

mlacitation11y ago

https://github.com/google/stenographer/blob/master/DESIGN.md

kcudrevelc11y ago

MichaelGG11y ago

Is LevelDB the best choice out there for write once KV pairs? For, say, IP address indexing, what's the final bits/packet overhead of indexing?

I didn't see any compression for the packet data. Did you consider high perf compression like LZ4?

Is AF_PACKET better than PF_RING+DNA? It's been a while since I looked but with hardware accel they boasted massive perf advantages.

Excellent design docs and cool work!

kcudrevelc11y ago

Hey, great questions!

1 more reply

signa1111y ago

> The design document is a good read ...

and large chunks of code is in go :), with only performance related stuff (read packet-capture) being done in c++, pretty cool.

e28eta11y ago

They probably don't want to give away too much (like security details of their network), but I think it'd be more compelling with some examples of how to use this for Intrusion Detection.

It's a topic I don't know much about, and I think it'd reinforce the claim this isn't for user monitoring.

CHY87211y ago

Then, because it takes up a lot of space, you ditch it, and perhaps keep finer grained logfiles - perhaps just the DNS requests or headers or suspicious packets etc.

The reason why this isn't really a tool for monitoring users is:

I welcome corrections :)

kcudrevelc11y ago

No corrections necessary, you're right on the money.

Also, it's designed specifically to do one thing (packet history) and do it well. In no way is it a complete solution; this is a building block for network detection and response.

To reiterate some of the salient points:

1) Disk is REALLY cheap these days.

3) There's a ton of reasons why this isn't used to monitor users:

* it's wrong: I'd flat-out refuse to build something designed to monitor users

* it wouldn't work #1: most interesting user traffic is encrypted on the wire

* it wouldn't work #2: our production network architecture is not good at single aggregation points

* it wouldn't work #3: there aren't enough disks in the world to handle our production network load

* it's redundant: applications can already do per-application, structured monitoring as necessary for debugging/auditing/etc.

e28eta11y ago

Thanks!

I think the real answer is that it truly is just a rolling packet dump, and it's up to you to use it however you choose.

kcudrevelc11y ago

  set up snort and steno
  foreach snort alert
    request all packets in stream from steno: srcIP,srcPort,dstIP,dstPort match
    OR request all packets on that srcIP,dstIP, to get OTHER connections between those hosts
    store pcap to directory (or central DB, or whatever)

warmwaffles11y ago

What does this offer that tcpdump doesn't?

ithkuil11y ago

2) Disk management. Rotates old data, etc

3) Indexing and supports efficient retrieval while writing.

It allows to analyse the traffic after the fact, at 10Gbps line speed.

akadien11y ago

You can get zero-copy for tcpdump with PF_RING or netmap.

ithkuil11y ago

I'm aware of libpcap's ability to share memory with a user buffer, but I didn't find any mention that tcpdump utility is actually written to exploit it for extra fast writes.

Look here how they handle this in stenographer: https://github.com/google/stenographer/blob/65fb928e6bce276c...

j / k navigate · click thread line to collapse