Show HN: Vector – A High-Performance Log and Metric Router Written in Rust (opens in new tab)

(github.com)

322 pointszhs6y ago63 comments

63 comments

I was dragging my feet to build a log shipper solution. I was going to use Filebeat -> ElasticSearch -> Kibana.

This looks great. My primary attraction is possibly low memory footprint of this program over Filebeat. Secondary attraction is how easy it appears to enable transformations.

Now, if I can make a suggestion for your next/additional project..... A neat system metric collector in Rust that can export to Prometheus with same principles.

Low memory footprint,

Rust,

Single binary,

Customizable with a single config file without spending hours in manuals,

Stdin, Stderr -> transform -> Prometheus.

I’m learning Rust and eventually plan to build such a solution but I think a lot of this project can be repurposed for what I asked much faster than building a new one.

Cheers on this open source project. I will contribute whatever I can. Thanks!!

lukes3866y ago

Thank you! Very glad it looks useful to you.

It's still slightly rough around the edges, but Vector can actually ingest metrics today in addition to deriving metrics from log events. We have a source component that speaks the statsd protocol which can then feed into our prometheus sink. We're planning to add more metrics-focused sources and sinks in the future (e.g. graphite, datadog, etc), so check back soon!

wikibob6y ago

Just a question, are you familiar with work that's been done on the OpenCensus Collector and Agent [0]?

There was discussion earlier this year about creating a design doc for OpenCensus to handle logs. I'm not sure if that got finished, or it was sidelined while the OpenCensus & OpenTracing merger was worked on. Both projects will combine under the OpenTelemetry name.

I've been quite happy with the OpenCensus instrumentation SDKs.

I think the Logs & Metrics space is interesting, especially because there is so much overlap, both are just ways of representing data about an event that occurred in the software.

OpenCensus is fairly widely backed: Google, Microsoft, Etsy, Scalyr...[1]

[0] https://github.com/census-instrumentation/opencensus-service...

[1] https://opencensus.io/community/users/

1 more reply

navaati6y ago

What you're asking for kinda exist, it's called mtail, check it out !

lukes3866y ago

We really love the idea of mtail and think it's criminally underused.

A big part of the reason we started building Vector was to integrate that kind of functionality into a larger project, so people wouldn't have to get over the hump of discovering, setting up, and rolling out a whole separate tool.

We're definitely not as mature as mtail yet, but we're working really hard to get there.

Thaxll6y ago

You better use something reliable and tested like Filebeat, it's written in Go and is fast enough / low memory usage.

kalkin6y ago

Seems similar to Veneur (like many other projects mentioned in comments here; didn't realize this space was so crowded!) - down to the first two letters of the name: https://github.com/stripe/veneur

Veneur is more metrics-focused, but might offer inspiration as you work on metrics support in Vector - in particular the SSF source, internal aggregation, and Datadog and SignalFX sinks.

lukes3866y ago

Absolutely, Veneur is something we looked at quite a bit when it popped up. It's clear Stripe was feeling a lot of the same pain points we were when we started building Vector and they've come up with something really impressive.

As you mentioned, it seems they've focused more on metrics out of the gate, while we've spent more of our time on the logging side of things (for now). We're working to catch up on metrics functionality, but interoperability via SSF is an interesting idea!

jhgg6y ago

We use a rather bespoke syslog -> clickhouse log sink (https://github.com/discordapp/punt/tree/clickhouse) we wrote in house because logstash (and then subsequently elastic starch) was too slow. Would love to switch off of it and to this! Hopefully a clickhouse sink comes soon! Maybe will contribute one upstream!

reacharavindh6y ago

Out of curiosity, could you tell us a little more about your log analysis workflow? Once they are in Clickhouse, how do you visualise/search/analyse your logs? What is your equivalent of Kibana?

jhgg6y ago

We do rollups into bigquery where we have a bunch of dashboards to look at stuff historically.

I did really like Kibana, ultimately, we had to ditch it (because of ditching ES). Of course, this was a good thing, as I more than once degraded ingest the ES cluster by just using Kibana to do some aggressive filtering. Clickhouse handles these without problem.

I think a more complete world view may be to pipe logs into kafka, and ingest them into Clickhouse/Druid for different types of analysis/rollups.

Our current logging volume exceeds ~10b log lines per day now. Clickhouse handles this ingest almost too well (we have 3 16 core nodes that sit at 5% CPU). This is down from a... 20ish node ES cluster that basically set pegged on CPU... and our log volume then was ~1b/day.

For more ad-hoc, we just use the clickhouse-cli to query the dataset directly. We are tangentially investigating using superset with it.

1 more reply

binarylogic6y ago

Absolutely, this is likely the next integration we'll be working on. There were a few features schema-wise that we needed support before we started, but we're _very_ close. We'd love beta testers to help us build it out. Feel free to email us if you're interested: vector@timber.io

codepodu6y ago

Heyyy you're the one who wrote authlogic! What a blast from the past :) Thank you for your contribution to the Rails community, and congrats on shipping Timber & Vector!

SwellJoe6y ago

Just a heads up: There are several figures in your docs where the entirety of the useful information on the page is in the image and they don't have alt tags or any accessible way to get the information (that I can find anyway). e.g. https://docs.vector.dev/use-cases/security-and-compliance

kevsim6y ago

I really appreciate the comparison table. Very rarely are things totally new and novel so it’s very nice to know what other things it’s _like_

nickserv6y ago

Could this replace a simple fluentd setup right now or are there still major functionalities missing?

Specifically, I'm ingesting nginx logs in JSON format, cleaning up invalid UTF8 bytes (usually sent in the forwarded-for header to exploit security vulnerabilities), and sending to elasticsearch on an automated 90 day retention policy (daily indexes).

Seems like a fairly common use case for webservers.

rishiloyola6y ago

Well logstash is now supporting good persistent queue. https://www.elastic.co/guide/en/logstash/current/persistent-...

I don't know why author didn't put correctness tick mark on it.

binarylogic6y ago

It's very likely we're doing something wrong with this test, but after many hours of trying we couldn't get our simple test to pass for Logstash, even though it passed for others:

https://github.com/timberio/vector-test-harness/tree/master/...

Definitely open to feedback on what we're doing wrong.

butteroverflow6y ago

This is totally off topic, but holy crap, first message in 11 years. I would have lost my password about a dozen times by now.

1 more reply

dandigangi6y ago

Freaking A... Rust performance is nuts when done right. Very cool.

dm035146y ago

How does this compare to telegraf?

https://github.com/influxdata/telegraf

Biggest thing that pops out to me is LUA engine (seems amazing :) )

binarylogic6y ago

Telegraf is nicely done. We spent a lot of time testing solutions in our test harness (https://github.com/timberio/vector-test-harness) and Telegraf was the most impressive of the tools we tested, so kudos to the Influx team on that.

But to answer your question, telegraf is very heavily metrics focused, and their logging support appears to be limited (reducing logs to metrics only). Vector is _currently_ focused on logging with an eye towards metrics, but still has work to do on the metrics front.

For example, we opened the door with the `log_to_metric` transform (https://docs.vector.dev/usage/configuration/transforms/log_t...) to ensure our data model supports metrics, but we still have a lot of work to do when it comes to metrics as a whole. Our end goal is to eventually replace telegraf and be a single, open, vendor neutral solution for both logs and metrics.

Happy to clarify further :)

envolt6y ago

Off Topic -

Does either of Filebeat or Logstash support config hot reload, as mentioned in the Vector's doc? https://docs.vector.dev/usage/administration/reloading

Edit - Found It - https://www.elastic.co/guide/en/logstash/current/reloading-c...

binarylogic6y ago

They do, we actually put together a correctness test for this behavior:

https://github.com/timberio/vector-test-harness/tree/master/...

Logstash is not graceful. Our testing shows that they basically shut it down and start it again.

nh26y ago

For clarification, does "mib/s" mean "Mbit/s" (since lowercase b usually stands for bits, and uppercase B usually for Bytes)?

If yes, how comes log processing runs at only so low throughput in general?

That is not to talk down your achievements (as per your benchmark page, you do better than similar projects in terms of throughput), but I'm genuinely curious why modern machines that have 40 Gbit/s memory bandwidth are capped at (in your case) 76.7Mbit/s. What's the bottleneck?

amanzi6y ago

The capitalisation is confusing, but "Mi" means "mebi" - either Mib for mebibits or MiB for mebibytes. The correct term for 1024 * 1024 bits is a Mib, and, 1024 x 1024 x8 bits is a MiB.

nh26y ago

That is correct, but it doesn't answer the question whether it's Bytes or bits (which makes an 8x difference).

Given that the reported values don't care about the "m" (which means "milli" -- clearly doesn't make sense for bytes), I don't think we can rely on the casing of the "b" to tell us the answer.

nh26y ago

I've asked it here now: https://github.com/timberio/vector/issues/568

binarylogic6y ago

Ah, I'm amazed that I did not know this. I just updated everything to use MiB. Thanks for pointing that out.

zackkitzmiller6y ago

Congrats to the Timber Team on this one. Ben and Zach were a couple of former colleagues on mine.

asprouse6y ago

What is a high level example of why I'd use this?

lukes3866y ago

Hi! I work on Vector. For a motivating example, let's say you have an application fronted by nginx. Using Vector would allow you to ingest your nginx logs off disk, parse them, expose status code and response time distributions to prometheus, and store the parsed logs as JSON on S3.

There are obviously plenty of ways to accomplish that same thing today, but we believe Vector is somewhat unique in allowing you to do it with one tool, without touching your application code or nginx config, and with enough performance to handle serious workloads. And Vector is far from done! There's a ton more we're working to add moving forward (thinking about observability data from an ETL and stream processing perspective should give you a rough idea).

geodel6y ago

Our company uses Splunk. I am not on admin/ops side so possibly missing details. The way I understand is that there is Splunk forwarder running on our app servers. And then there is Splunk server URL from there I get consolidated logs in browser where I can search and run many other statistical function.

So is Vector like Splunk forwarder or more than that?

2 more replies

timerol6y ago

This project has a really pretty website to go along with it, which includes a section on Use Cases: https://docs.vector.dev/use-cases

scurvy6y ago

Heka, but in Rust instead of Golang?

lukes3866y ago

That's a good way to think about it! Heka was a big inspiration. The design isn't exactly the same, but we're aiming to solve a lot of the same problems.

tveita6y ago

There is also a Heka but in C.

https://github.com/mozilla-services/hindsight

Unfortunately deployment of Hindsight isn't as nice as Heka since you need to compile it yourself with all the Lua extensions you need, and the documentation is very disorganized.

Vector looks great on those counts, will be excited to try it if they get features like reliable Vector-Vector transport and more flexible file delimiters.

sciurus6y ago

Hindsight isn't exactly heka in C; it's useful for data processing but not as a general-purpose log shipper. Currently we're using fluentd for the latter.

(Disclosure: I work for Mozilla on the team that runs services used by firefox users and developers)

Dowwie6y ago

This is so exciting! An enterprise-grade solution to log workflow. To those unfamiliar with the Rust ecosystem, this project (Vector) addresses the 'L' within the ELK stack, and probably more.

the_duke6y ago

There already are a lot of projects in this space.

While better performance is always great, most are already plenty fast for the majority of use cases.

The main power comes from the multitude of inputs and outputs. Vector has a lot of catching up to do there. But if they manage to offer a noteworthy performance gain... one more is always a good thing.

PS: the Logstash numbers seem suspiciously low. I'd bet it's some JVM config issue. Logstash can come to a crawl if it does not have enough memory.

otterley6y ago

It's also worth taking into account the size of the software and its relative CPU utilization. Log shippers do require CPU cycles and memory that would otherwise be available to run the other workloads on the host.

As for the multitudes of inputs/outputs, covering the 95% most-used sources and sinks is a great starting point. I think Vector got that list right in this case.

leetbulb6y ago

Yep, I push about 50-100MB/s through a single instance of Logstash (Redis (list) -> S3). That configuration is not in the benchmark table, but surely it's more demanding than TCP -> Blackhole, TCP -> TCP, etc.

Regardless, Vector looks very nice and I'll be testing it out :)

the_duke6y ago

They are using the default config with 1GB memory. Sadly that's absolutely nothing for Logstash.

Reported an issue in their test harness.

2 more replies

peter_l_downs6y ago

Congratulations to the Timber team! I've had the pleasure of doing a small amount of contract work for them, have nothing but praise for them all.

gnufx6y ago

When people say "high performance" about these things, I wonder how they compare, for instance, with the Sandia tools.* One things that matters there is avoiding system noise (jitter) on the monitored systems with transport over RDMA.

* http://ovis.ca.sandia.gov/

heliostatic6y ago

Love Timber.io, so it's great to see some of your innovations coming back out as open source projects. Thanks!

seruman6y ago

Flume, but in Rust?

https://flume.apache.org/

the_duke6y ago

And about 50 other projects in all kinds of languages...

warpspin6y ago

A pity it does not support at least once or exactly once delivery for the Vector sink.

Also, the documentation seems to miss information about which sinks support TLS?

We're currently looking for a distributed-over-the-internet logging setup and are interested in evaluating Rsyslog/RELP alternatives.

rishiloyola6y ago

How do you handle back pressure?

Do you have specific module for it?

eeZah7Ux6y ago

All this bombast and hubris is very unprofessional.

li4ick6y ago

I wonder how popular posts on hackernews would be if people didn't mention "written in rust" in the title.

thenewwazoo6y ago

Probably significantly less, but I suspect "written in Rust" is part of what makes these posts interesting. People have an interest in what you can do with this language they're constantly hearing about (and hearing people rave about). There are lots of incredible languages that few people are using for real products. "Written in Rust" is therefore interesting on multiple axes.

eeZah7Ux6y ago

Jumping on the hype bandwagon. Don't question it or you get downvoted... oops.

j / k navigate · click thread line to collapse

63 comments

reacharavindh6y ago

I was dragging my feet to build a log shipper solution. I was going to use Filebeat -> ElasticSearch -> Kibana.

This looks great. My primary attraction is possibly low memory footprint of this program over Filebeat. Secondary attraction is how easy it appears to enable transformations.

Now, if I can make a suggestion for your next/additional project..... A neat system metric collector in Rust that can export to Prometheus with same principles.

Low memory footprint,

Rust,

Single binary,

Customizable with a single config file without spending hours in manuals,

Stdin, Stderr -> transform -> Prometheus.

I’m learning Rust and eventually plan to build such a solution but I think a lot of this project can be repurposed for what I asked much faster than building a new one.

Cheers on this open source project. I will contribute whatever I can. Thanks!!

lukes3866y ago

Thank you! Very glad it looks useful to you.

wikibob6y ago

Just a question, are you familiar with work that's been done on the OpenCensus Collector and Agent [0]?

I've been quite happy with the OpenCensus instrumentation SDKs.

I think the Logs & Metrics space is interesting, especially because there is so much overlap, both are just ways of representing data about an event that occurred in the software.

OpenCensus is fairly widely backed: Google, Microsoft, Etsy, Scalyr...[1]

[0] https://github.com/census-instrumentation/opencensus-service...

[1] https://opencensus.io/community/users/

1 more reply

navaati6y ago

What you're asking for kinda exist, it's called mtail, check it out !

lukes3866y ago

We really love the idea of mtail and think it's criminally underused.

We're definitely not as mature as mtail yet, but we're working really hard to get there.

Thaxll6y ago

You better use something reliable and tested like Filebeat, it's written in Go and is fast enough / low memory usage.

kalkin6y ago

Seems similar to Veneur (like many other projects mentioned in comments here; didn't realize this space was so crowded!) - down to the first two letters of the name: https://github.com/stripe/veneur

Veneur is more metrics-focused, but might offer inspiration as you work on metrics support in Vector - in particular the SSF source, internal aggregation, and Datadog and SignalFX sinks.

lukes3866y ago

jhgg6y ago

reacharavindh6y ago

Out of curiosity, could you tell us a little more about your log analysis workflow? Once they are in Clickhouse, how do you visualise/search/analyse your logs? What is your equivalent of Kibana?

jhgg6y ago

We do rollups into bigquery where we have a bunch of dashboards to look at stuff historically.

I think a more complete world view may be to pipe logs into kafka, and ingest them into Clickhouse/Druid for different types of analysis/rollups.

For more ad-hoc, we just use the clickhouse-cli to query the dataset directly. We are tangentially investigating using superset with it.

1 more reply

binarylogic6y ago

codepodu6y ago

Heyyy you're the one who wrote authlogic! What a blast from the past :) Thank you for your contribution to the Rails community, and congrats on shipping Timber & Vector!

SwellJoe6y ago

kevsim6y ago

I really appreciate the comparison table. Very rarely are things totally new and novel so it’s very nice to know what other things it’s _like_

nickserv6y ago

Could this replace a simple fluentd setup right now or are there still major functionalities missing?

Seems like a fairly common use case for webservers.

rishiloyola6y ago

Well logstash is now supporting good persistent queue. https://www.elastic.co/guide/en/logstash/current/persistent-...

I don't know why author didn't put correctness tick mark on it.

binarylogic6y ago

It's very likely we're doing something wrong with this test, but after many hours of trying we couldn't get our simple test to pass for Logstash, even though it passed for others:

https://github.com/timberio/vector-test-harness/tree/master/...

Definitely open to feedback on what we're doing wrong.

butteroverflow6y ago

This is totally off topic, but holy crap, first message in 11 years. I would have lost my password about a dozen times by now.

1 more reply

dandigangi6y ago

Freaking A... Rust performance is nuts when done right. Very cool.

dm035146y ago

How does this compare to telegraf?

https://github.com/influxdata/telegraf

Biggest thing that pops out to me is LUA engine (seems amazing :) )

binarylogic6y ago

Happy to clarify further :)

envolt6y ago

Off Topic -

Does either of Filebeat or Logstash support config hot reload, as mentioned in the Vector's doc? https://docs.vector.dev/usage/administration/reloading

Edit - Found It - https://www.elastic.co/guide/en/logstash/current/reloading-c...

binarylogic6y ago

They do, we actually put together a correctness test for this behavior:

https://github.com/timberio/vector-test-harness/tree/master/...

Logstash is not graceful. Our testing shows that they basically shut it down and start it again.

nh26y ago

For clarification, does "mib/s" mean "Mbit/s" (since lowercase b usually stands for bits, and uppercase B usually for Bytes)?

If yes, how comes log processing runs at only so low throughput in general?

amanzi6y ago

The capitalisation is confusing, but "Mi" means "mebi" - either Mib for mebibits or MiB for mebibytes. The correct term for 1024 * 1024 bits is a Mib, and, 1024 x 1024 x8 bits is a MiB.

nh26y ago

That is correct, but it doesn't answer the question whether it's Bytes or bits (which makes an 8x difference).

Given that the reported values don't care about the "m" (which means "milli" -- clearly doesn't make sense for bytes), I don't think we can rely on the casing of the "b" to tell us the answer.

nh26y ago

I've asked it here now: https://github.com/timberio/vector/issues/568

binarylogic6y ago

Ah, I'm amazed that I did not know this. I just updated everything to use MiB. Thanks for pointing that out.

zackkitzmiller6y ago

Congrats to the Timber Team on this one. Ben and Zach were a couple of former colleagues on mine.

asprouse6y ago

What is a high level example of why I'd use this?

lukes3866y ago

geodel6y ago

So is Vector like Splunk forwarder or more than that?

2 more replies

timerol6y ago

This project has a really pretty website to go along with it, which includes a section on Use Cases: https://docs.vector.dev/use-cases

scurvy6y ago

Heka, but in Rust instead of Golang?

lukes3866y ago

That's a good way to think about it! Heka was a big inspiration. The design isn't exactly the same, but we're aiming to solve a lot of the same problems.

tveita6y ago

There is also a Heka but in C.

https://github.com/mozilla-services/hindsight

Unfortunately deployment of Hindsight isn't as nice as Heka since you need to compile it yourself with all the Lua extensions you need, and the documentation is very disorganized.

Vector looks great on those counts, will be excited to try it if they get features like reliable Vector-Vector transport and more flexible file delimiters.

sciurus6y ago

Hindsight isn't exactly heka in C; it's useful for data processing but not as a general-purpose log shipper. Currently we're using fluentd for the latter.

(Disclosure: I work for Mozilla on the team that runs services used by firefox users and developers)

Dowwie6y ago

This is so exciting! An enterprise-grade solution to log workflow. To those unfamiliar with the Rust ecosystem, this project (Vector) addresses the 'L' within the ELK stack, and probably more.

the_duke6y ago

There already are a lot of projects in this space.

While better performance is always great, most are already plenty fast for the majority of use cases.

PS: the Logstash numbers seem suspiciously low. I'd bet it's some JVM config issue. Logstash can come to a crawl if it does not have enough memory.

otterley6y ago

As for the multitudes of inputs/outputs, covering the 95% most-used sources and sinks is a great starting point. I think Vector got that list right in this case.

leetbulb6y ago

Regardless, Vector looks very nice and I'll be testing it out :)

the_duke6y ago

They are using the default config with 1GB memory. Sadly that's absolutely nothing for Logstash.

Reported an issue in their test harness.

2 more replies

peter_l_downs6y ago

Congratulations to the Timber team! I've had the pleasure of doing a small amount of contract work for them, have nothing but praise for them all.

gnufx6y ago

* http://ovis.ca.sandia.gov/

heliostatic6y ago

Love Timber.io, so it's great to see some of your innovations coming back out as open source projects. Thanks!

seruman6y ago

Flume, but in Rust?

https://flume.apache.org/

the_duke6y ago

And about 50 other projects in all kinds of languages...

warpspin6y ago

A pity it does not support at least once or exactly once delivery for the Vector sink.

Also, the documentation seems to miss information about which sinks support TLS?

We're currently looking for a distributed-over-the-internet logging setup and are interested in evaluating Rsyslog/RELP alternatives.

rishiloyola6y ago

How do you handle back pressure?

Do you have specific module for it?

eeZah7Ux6y ago

All this bombast and hubris is very unprofessional.

li4ick6y ago

I wonder how popular posts on hackernews would be if people didn't mention "written in rust" in the title.

thenewwazoo6y ago

eeZah7Ux6y ago

Jumping on the hype bandwagon. Don't question it or you get downvoted... oops.

j / k navigate · click thread line to collapse