Introducing Heka (opens in new tab)

(blog.mozilla.org)

299 pointscrankycoder197513y ago57 comments

Mozilla Services announces Heka - a tool for high performance data gathering, analysis, monitoring, and reporting.

57 comments

Uh, yet another collector/grapher. That's nice but..

We have tons of collectors. And tons of graphers. What we have not is a little bit of smarts in that tools. Ability to predict and ability to react.

Predict. We have Holt-Winters Forecasting Algorithm implemented in RRDTool from 2005 and a couple of papers.

React. I'm not talking about 'fix it automagically'. But everyone wants to know 'wtf was that peak on this graph last night?'. Usually your never know, except the simplest cases. Because you cannot collect everything about everything all the time. But monitoring system could enable 'collect everything we can' for short period of time when it detects something. Something wrong or something strange, something out of the pattern. Does anybody hear about system with something like that?

aba_sababa13y ago

We're working on it here at Etsy :)

It'll be released in a week or two. In the meantime, I've been speaking about it: http://devslovebacon.com/conferences/bacon-2013/talks/bring-...

EFruit13y ago

Is what you're referring to something like Growl that pops up and says 'Hey, metrics.sessions.active just dropped by 70%", or something pre-configured to spin up additional VMs/instances/dynos when some metrics misbehave? TL;DR: How autonomous is it?

snaky13y ago

That's interesting. Can I subscribe to announce?

nonsequitarian13y ago

It's certainly not available out-of-box, and it would take a bit of work, but it should be possible to build something like this with Heka. You could write a dynamic filter plugin that watched for peaks in a specific graph (or any other arbitrary trigger) and, if found, generate a message to trigger the activation of a bunch of new dynamic filters, quickly turning on a more comprehensive set of data analyzers.

snaky13y ago

It's definitely possible, at least with RRDtool and 30 lines of Python (only 20 lines for react to measured peak, not forecasting one).

1 more reply

jennyinc13y ago

We're working on algorithms to improve the 'predict' part (way beyond Holt-Winters!): ow.ly/jRKrT

If you're interested to see our algorithms in action, email me jennyinc at gmail.com

berkay13y ago

Having implemented similar solutions, it's clear to me that developers did their homework and designed accordingly. I find myself agreeing with almost every decision I could see. - Go: light, no dependencies. This is key. If you ever deployed something in a non homogeneous environment with 100s/1000s of servers, you'd know the pain.

- Plugin system: Only way to scale the development of the solution

- Lua for plugins: Yes! Language is not important, but not having to stop and restart the application for changes in logic, etc. is essential.

- Routing. Sounds great, can't wait to take a deeper look.

Kudos to devs. Nicely done!

Sven713y ago

Not very familiar with Go, could someone please elaborate on what "Go: no dependencies" mean? Thanks

berkay13y ago

Basically it means that you can have statically linked executables that do not have dependencies to other libraries, and can be deployed by simply copying the files.

Alternatives written is scripting language like ruby, python, etc. require runtimes and libraries, and it can get quite complicated to deploy them (especially if the env. has lots of different OS versions, etc.), keep track of all dependencies, deal with conflicts with other apps that may require a different version of the runtimes & libraries. As such for operational reasons it's very appealing to have a distributable binary that works without having to worry about what prerequisites are and whether it would impact anything else on the server.

grosskur13y ago

I've been experimenting with a different metrics toolchain of shh + log-shuttle + l2met recently (also written in Go):

https://github.com/freeformz/shh

https://github.com/ryandotsmith/log-shuttle

https://github.com/ryandotsmith/l2met

shh can be extended with custom pollers written in Go, but focuses on collecting system-level metrics. log-shuttle is a general-purpose tool for shipping logs over HTTP. l2met receives logs over HTTP and can be extended with custom outlets written in Go, but requires log statements in a specific format ("measure.db.latency=20" or "measure=db.latency val=20").

It's great to see so many new tools in this space. Previously I had a bunch of one-off "carbonize" scripts running out of cron, each collecting a specific kind of metric and sending it to Graphite or statsd. This worked OK but required quite a bit of code to get things done. Heka's plugin system looks like a nice way to structure things.

themgt13y ago

Very interesting. Does this fit in conceptually with circus at all? It seems like there's a fair amount of overlap between the process/HTTP management done by circus and this stats/data collection/analysis (specifically hekad agent in the architecture diagram): http://heka-docs.readthedocs.org/en/latest/architecture/inde...

I'm curious if Mozilla is using these two tools in combination internally, and what that architecture looks like.

https://github.com/mozilla-services/circus

nonsequitarian13y ago

Heka and Circus have very different goals. Heka is about generic data gathering, processing, and routing. Circus is about task and process monitoring and management. There's some overlap in that Circus needs to gather and process a bit of data to do its job, and it would certainly be possible to use Heka as a part of that, but we're not doing so. We'll probably use Circus to manage at least a few of our `hekad` processes, though.

judofyr13y ago

Seems similar to Riemann: http://riemann.io/

buro913y ago

Riemann is top of our list to be implemented in an environment made up of PostgreSQL, Memcached, Python (Django) and Go.

But as Hekad has plugins for all of the above (except Go - but I'm sure it's possible): http://heka-docs.readthedocs.org/en/latest/architecture/inde...

Well I guess we'll now be evaluating whether Hekad looks like it might be a more promising fit.

I particularly like the bullet points on aggregation counters, filters and transformations. We'll have to see how they work in practise though. The docs are very pretty, but as is usual with early releases it seems a little difficult to picture the whole and how it will actually work in practise from the soup of detail that Sphinx spits out.

nonsequitarian13y ago

Unfortunately, the diagram you linked to is misleading; we don't have all of those plugins built yet. We've got a (quite rich) Python client (https://github.com/mozilla-services/heka-py) and a (rudimentary) Go client (https://github.com/mozilla-services/heka/tree/master/client) but Memcached and PostgreSQL connectors aren't yet in place. Fleshing out our plugin set (especially the inputs) is one of our highest priorities, so those should be coming Real Soon Now(tm). Contributions welcome!

sciurus13y ago

Parts of it seem similar to Riemann (which receives events pushed to it and processes then), and parts of it seem similar to collectd (which has plugins to gather just about any kind of metric and push them to just about any kind of storage imaginable). Collectd recently gained the ability to do some aggregation, but it can't do the level of message processing that hekad does. Collectd also lacks the message routing capabilities of hekad.

If you're not familiar with collectd's capbilities, you can get a quick overview of the official plugins at http://git.verplant.org/?p=collectd.git;a=blob;hb=master;f=R...

dkhenry13y ago

Off the top of my head this is a reimplementation of the following * SNMP * CollectD * Carbon * JMX * WMI * CMIP

And a whole host of other proprietary transports. So its cool and looks awesome, but what does it give me that the entirety of other monitoring protocols doesn't

fizx13y ago

I'm not affiliated with OP, but I wrote perhaps the most similar OSS project, so I have some perspective here.

There's a bunch of things going on on your boxes (logs, jmx, syslog, etc), and you want to get them out in a useful unified format. You have to do some ugly things (e.g. parse rails logs for latencies), and then emit the data, preferably in some structured format that knows that render=17ms is a duration so that you can graph it.

They chose their own transport to speak between heka nodes, because it maps perfectly to their internal representation, but it looks like they are willing to speak any of those protocols you mentioned to the outside world. It's useful to do a limited amount of munging inside the hekasystem before sending the data to logstash, graphite, etc, so it looks like they spent quite a bit of time building a framework for that initial work, so you can move it as close to the edges as you'd like.

To me, the transport and/or protocol isn't interesting, it's that you have a flexible, lightweight agent that's also capable of doing pre-processing and rollups.

crankycoder1975OP13y ago

One of the driving motivations was simplicity for developers and get a reasonable out-of-the-box experience.

This comes from a couple things.

Go compiles to a single static library so you don't have to worry about having dozens of "the right" library installed on your machine. Grab the heka binary and run with it.

This greatly eases our operations work as we have fewer dependency conflicts to deal with when we push things to production.

mapleoin13y ago

That doesn't make a lot of sense. You don't have to write a monitoring software from scratch just because you want statically compiled bundled libraries. You can do that with any programming language.

1 more reply

dkhenry13y ago

Interesting so its more of a ease of use then a performance issue ? The numbers you quotes for performance seemed impressive.

1 more reply

rayiner13y ago

SNMP -> shivers

samatman13y ago

with reference to the name: "and I do live in South Berkeley / North Oakland…"

I had a feeling. We may hope your code is hella tight...

samatman13y ago

I get it, no jokes on HN. This was a quote from the developer, who has, I suppose, a better sense of humor than y'all.

fixxer13y ago

I didn't know we could say "y'all" on HN, either.

1 more reply

zobzu13y ago

i see it more as a syslog replacement. It does a lot more than syslog of course, but tit doesnt do what "collectd" and whatever else does. Heka seems to "just" do logging/routing/etc and be extremely fast and reliable doing so. And has no dependencies/small footprint.

Which is what syslog can't do.

nonsequitarian13y ago

Actually, there is a fair amount of overlap between collectd and Heka. And Heka provides mechanisms for in-flight data processing and graphing output, in addition to logging and routing. But you're also right that Heka might in some cases also be used in place of syslog.

Of course, both syslog and collectd have been around and battle-hardened for many(!) years, whereas this first Heka release is being called "0.2-beta-1" for a reason. I wouldn't go rush into replacing any mission-critical infrastructure just yet. ;)

keyle13y ago

Off topic but I did smile to see it's written in Go, and not Rust. I guess Rust isn't there yet.

coldtea13y ago

What's to smile about?

Nobody ever claimed that Rust is "there yet".

The core Rust developers all say that Rust is still in flux, and that a stable version is still many months in the future, possibly 2014. And they advise not to use it in production.

vitno13y ago

The design goals and philosophies are really quite different.

kevinmeredith13y ago

Is this similar to piwik?

nonsequitarian13y ago

Not really. Piwik is focused on web traffic analytics. You could build a piwik-type system using Heka, but Heka is a lower level tool w/ a wider focus.

mh-13y ago

in that they're both on the internet.

i believe that's about it, though.

doun13y ago

seems that this DO NOT support Windows?

trink13y ago

Windows support will be added. https://github.com/mozilla-services/heka/issues/145

j / k navigate · click thread line to collapse

57 comments

snaky13y ago

Uh, yet another collector/grapher. That's nice but..

We have tons of collectors. And tons of graphers. What we have not is a little bit of smarts in that tools. Ability to predict and ability to react.

Predict. We have Holt-Winters Forecasting Algorithm implemented in RRDTool from 2005 and a couple of papers.

aba_sababa13y ago

We're working on it here at Etsy :)

It'll be released in a week or two. In the meantime, I've been speaking about it: http://devslovebacon.com/conferences/bacon-2013/talks/bring-...

EFruit13y ago

snaky13y ago

That's interesting. Can I subscribe to announce?

nonsequitarian13y ago

snaky13y ago

It's definitely possible, at least with RRDtool and 30 lines of Python (only 20 lines for react to measured peak, not forecasting one).

1 more reply

jennyinc13y ago

We're working on algorithms to improve the 'predict' part (way beyond Holt-Winters!): ow.ly/jRKrT

If you're interested to see our algorithms in action, email me jennyinc at gmail.com

berkay13y ago

- Plugin system: Only way to scale the development of the solution

- Lua for plugins: Yes! Language is not important, but not having to stop and restart the application for changes in logic, etc. is essential.

- Routing. Sounds great, can't wait to take a deeper look.

Kudos to devs. Nicely done!

Sven713y ago

Not very familiar with Go, could someone please elaborate on what "Go: no dependencies" mean? Thanks

berkay13y ago

Basically it means that you can have statically linked executables that do not have dependencies to other libraries, and can be deployed by simply copying the files.

grosskur13y ago

I've been experimenting with a different metrics toolchain of shh + log-shuttle + l2met recently (also written in Go):

https://github.com/freeformz/shh

https://github.com/ryandotsmith/log-shuttle

https://github.com/ryandotsmith/l2met

themgt13y ago

I'm curious if Mozilla is using these two tools in combination internally, and what that architecture looks like.

https://github.com/mozilla-services/circus

nonsequitarian13y ago

judofyr13y ago

Seems similar to Riemann: http://riemann.io/

buro913y ago

Riemann is top of our list to be implemented in an environment made up of PostgreSQL, Memcached, Python (Django) and Go.

But as Hekad has plugins for all of the above (except Go - but I'm sure it's possible): http://heka-docs.readthedocs.org/en/latest/architecture/inde...

Well I guess we'll now be evaluating whether Hekad looks like it might be a more promising fit.

nonsequitarian13y ago

sciurus13y ago

If you're not familiar with collectd's capbilities, you can get a quick overview of the official plugins at http://git.verplant.org/?p=collectd.git;a=blob;hb=master;f=R...

dkhenry13y ago

Off the top of my head this is a reimplementation of the following * SNMP * CollectD * Carbon * JMX * WMI * CMIP

And a whole host of other proprietary transports. So its cool and looks awesome, but what does it give me that the entirety of other monitoring protocols doesn't

fizx13y ago

I'm not affiliated with OP, but I wrote perhaps the most similar OSS project, so I have some perspective here.

To me, the transport and/or protocol isn't interesting, it's that you have a flexible, lightweight agent that's also capable of doing pre-processing and rollups.

crankycoder1975OP13y ago

One of the driving motivations was simplicity for developers and get a reasonable out-of-the-box experience.

This comes from a couple things.

Go compiles to a single static library so you don't have to worry about having dozens of "the right" library installed on your machine. Grab the heka binary and run with it.

This greatly eases our operations work as we have fewer dependency conflicts to deal with when we push things to production.

mapleoin13y ago

1 more reply

dkhenry13y ago

Interesting so its more of a ease of use then a performance issue ? The numbers you quotes for performance seemed impressive.

1 more reply

rayiner13y ago

SNMP -> shivers

samatman13y ago

with reference to the name: "and I do live in South Berkeley / North Oakland…"

I had a feeling. We may hope your code is hella tight...

samatman13y ago

I get it, no jokes on HN. This was a quote from the developer, who has, I suppose, a better sense of humor than y'all.

fixxer13y ago

I didn't know we could say "y'all" on HN, either.

1 more reply

zobzu13y ago

Which is what syslog can't do.

nonsequitarian13y ago

keyle13y ago

Off topic but I did smile to see it's written in Go, and not Rust. I guess Rust isn't there yet.

coldtea13y ago

What's to smile about?

Nobody ever claimed that Rust is "there yet".

The core Rust developers all say that Rust is still in flux, and that a stable version is still many months in the future, possibly 2014. And they advise not to use it in production.

vitno13y ago

The design goals and philosophies are really quite different.

kevinmeredith13y ago

Is this similar to piwik?

nonsequitarian13y ago

Not really. Piwik is focused on web traffic analytics. You could build a piwik-type system using Heka, but Heka is a lower level tool w/ a wider focus.

mh-13y ago

in that they're both on the internet.

i believe that's about it, though.

doun13y ago

seems that this DO NOT support Windows?

trink13y ago

Windows support will be added. https://github.com/mozilla-services/heka/issues/145

j / k navigate · click thread line to collapse