Skip to content

Top New Best Ask Show Jobs

Show HN: Open-source APM with support for tracing, metrics, and logs | Better HN

Show HN: Open-source APM with support for tracing, metrics, and logs (opens in new tab)

(app.uptrace.dev)

112 pointsvmihailenco3y ago46 comments

Uptrace is an all-in-one tool that supports distributed tracing, metrics, and logs. It uses OpenTelelemetry observability framework to collect data and ClickHouse database to store it.

You can ingest data using OpenTelemetry Protocol (OTLP), Vector Logs, and Zipkin API. You can also use OpenTelemetry Collector to collect Prometheus metrics or receive data from Jaeger, X Ray, Apache, PostgreSQL, MySQL and many more.

The latest Uptrace release introduces support for OpenTelemetry Metrics which includes:

- User interface to build table-based and grid-based dashboards.

- Pre-built dashboard templates for Golang, Redis, PostgreSQL, MySQL, and host metrics.

- Metrics monitoring aka alerting rules inspired by Prometheus.

- Notifications via email/Slack/PagerDuty using AlertManager integration.

There are 2 quick ways to try Uptrace:

- Using the Docker container - https://github.com/uptrace/uptrace/tree/master/example/docke...

- Using the public demo - https://app.uptrace.dev/play

I will be happy to answer your questions in the comments.

46 comments

Nice to see so many new projects in the area of APM in the last few months.

We recently tried Signoz and Grafana Tempo and while I can't say something about uptrace yet (will definitely try it out) I want to list some pros and cons about them.

Grafana Tempo

Pros:

- Easy and smooth integration into our existing Grafana instance, no additional frontend needed

- No new storage engine needed (No additional Clickhouse, Postgres, etc) as it saves its data to S3

- Supports OTLP

Cons:

- Search is limited by param size and unique params (as its baked to be indexed)

- Ingestion is not in real time, but configurable (time to finish span)

Signoz:

Pros:

- Supports OTLP

- Integrates Logs and Metrics within the same service (for Grafana you need Loki then)

- Supports real time querying

Cons:

- Uses new storage engines (or extends the software stack) with adding ClickHouse

- Adds an additional frontend (might not be relevant for everyone)

- Doesn't provide SSO yet, so you need to manage users differently

Interesting to see, that UpTrace also chose ClickHouse (btw I love ClickHouse!)

Some questions:

- Can I easily disable certain features? (e.g. alerting)

- Is there support for SSO for self-hosted installation?

- Are there any recommendations for scaling (e.g. benchmarks) on how many spans/s are supported on what hardware?

Thanks in advance!

vmihailencoOP3y ago

Thanks for the feedback!

>- Are there any recommendations for scaling (e.g. benchmarks) on how many spans/s are supported on what hardware?

With Uptrace, I was able to achieve 10k spans / second on a single core by running the binary with GOMAXPROCS=1. That is 1-3 terrabytes of compressed data each month which is more than most users need.

Practically, you are limited by the $$$ you are willing to spend on ClickHouse servers, not by Uptrace ingestion speed.

So my recommendation is to scale Uptrace vertically by throwing more cores at it. That will allow you to go very very far.

>- Is there support for SSO for self-hosted installation?

So far the only way to add news users is via the YAML config. We are considering to add a REST API or a CLI tool for the same purpose, but it is not clear how that would work with the YAML.

Regarding the SSO, it would be nice if you can provide an app that already does that so we can better estimate the complexity. But so far we don't have such plans.

>- Can I easily disable certain features? (e.g. alerting)

Yes, most YAML sections just can be removed / commented out to disable the feature.

Thanks for the answer.

I have no other good examples for SSO than Grafana.

But something I would love to see more for logins are Application Tokens. We use Cloudflare Access for Team related logins, which will send such a token in the header, so the application can use it to authorize a user and by which group a user is in it can enable/disable features

https://developers.cloudflare.com/cloudflare-one/identity/au...

This solved our needs for multiple sign in options, as all is now managed through Cloudflare access, but this is obviously not a solution for everyone

thanks for the mention. I am one of the maintainers at SigNoz [1].

Thanks for laying out the points in Pro section. We also recently launched logs witg v0.11.0 so you may want to give it a try again - we now have have metrics, logs and traces in a single app.

Would love to understand a few points in more details you have mentioned in Cons for SigNoz

> - Uses new storage engines (or extends the software stack) with adding ClickHouse Can you explain a bit more on the concern here?

> - Doesn't provide SSO yet, so you need to manage users differently

This is in our roadmap and we will be shipping it soon.

[1] https://github.com/SigNoz/signoz

Havoc3y ago

Didn’t know about the grafana option thanks. Interesting - so one can do metrics dash, logs and tracing in one interface!

vmihailencoOP3y ago

You can ingest data using OpenTelemetry Protocol (OTLP), Vector Logs, and Zipkin API. You can also use OpenTelemetry Collector to collect Prometheus metrics or receive data from Jaeger, X Ray, Apache, PostgreSQL, MySQL and many more.

The latest Uptrace release introduces support for OpenTelemetry Metrics which includes:

- User interface to build table-based and grid-based dashboards.

- Pre-built dashboard templates for Golang, Redis, PostgreSQL, MySQL, and host metrics.

- Metrics monitoring aka alerting rules inspired by Prometheus.

- Notifications via email/Slack/PagerDuty using AlertManager integration.

There are 2 quick ways to try Uptrace:

- Using the Docker container - https://github.com/uptrace/uptrace/tree/master/example/docke...

- Using the public demo - https://app.uptrace.dev/play

I will be happy to answer your questions in the comments.

0JzW3y ago

this looks amazing! i would definitely like to use this for log monitoring. however, i have a question. is it possible to get logs for individual docker containers?

vmihailencoOP3y ago

It is possible using Vector Logs which Uptrace supports out-of-the-box, for example:

- https://vector.dev/docs/reference/configuration/sources/dock...

- https://uptrace.dev/get/ingest/vector.html

If you are having troubles making it work, feel free to open an issue on Github and I will provide a complete example.

bovermyer3y ago

I see lots of new tracing options these days, and that seems to have taken over the "APM" term.

I still have yet to see new profiling options. When I think of APM, I think of CPU profiling and automatic instrumentation of black box systems, not request tracing. I should be able to see which function calls are slow/problematic, without having to add code to the application.

vmihailencoOP3y ago

I use memory profiling with Go and it is indeed very useful. I think that whatever Go already provides is enough, but I guess Uptrace could try to automate some things and/or provide some fancy UI.

But I find CPU profiling a lot less useful, because production profiles tend to be too broad and it is hard to make sense from them, for example, Uptrace profile mostly consists of memory allocations and network calls.

So I would not say that CPU profiling is superior/better/can replace tracing.

bovermyer3y ago

Tracing is a piece of the puzzle and necessary. Profiling occupies a different part of the monitoring ecosystem.

nijave3y ago

I think there is some interesting work being done with eBPF in the profiling space

jonasdevops3y ago

Before you try, please make sure you are comfortable with their license - https://github.com/uptrace/uptrace/blob/master/LICENSE (Business Source License 1.1), which as License says "The Business Source License (this document, or the “License”) is not an Open Source license"

jarym3y ago

It’s a shame that whenever something cool is posted on HN some long ass thread about licensing floats to the top.

Nothing wrong with the parent comment but do we really need a ‘yes it is’ / ‘no it isn’t’ back and forth that goes on seemingly forever almost EVERY TIME?

I’m gonna need a ‘license nitpick remover’ to complement my adblocker!

It says Open Source right in the HN headline. That the license contradicts that seems like important information to discuss.

Fnoord3y ago

Go the comment and click next. Congrats you now avoided the entire discussion the comment spawned.

vmihailencoOP3y ago

It is the same license used by MariaDB, Sentry, CockroachDB, Couchbase and many others. Technically it is not open source and instead is called source available, but you can enjoy pretty much the same benefits.

Out of curiosity, what makes you uncomfortable about the license?

fmajid3y ago

Er, MariaDB is GPL2, and will forever be since it is derived from MySQL.

I'm guessing they aligned the license terms on those of ClickHouse, which is the underlying data store for Uptrace. From my understanding, if you use Uptrace and ClickHouse to manage your internal telemetry and don't offer it to clients, you should be fine. Still, non-standard license terms give pause, as there is always the possibility they will be restricted further in a bait-and-switch operation like that done by MongoDB or ElasticSearch.

> Technically it is not open source and instead is called source available, but you can enjoy pretty much the same benefits.

Since you already aware about it, could you update the OP from open source to source available, for the sake of transparency? If edit option is not available, you can request mods / dang

traceroute663y ago

> Out of curiosity, what makes you uncomfortable about the license?

I can't speak for the OP, but my view is that all these semi-open (AGPL,BSL etc.) licenses do is muddy the waters. Its essentially giving the developer's lawyers enough grey area to work with in order to find something they can pin you on.

IMHO a company's code should either be closed source or open source. Wishy-washy no-mans-land wordings in the middle don't really help anyone (except the lawyer's bank balance, of course).

jonasdevops3y ago

I cannot run uptrace in Production:

> The Licensor hereby grants you the right to copy, modify, create derivative works, redistribute, and make non-production use of the Licensed Work.

So what is the point of even trying to install it in other environments?

This seems like a pretty cool project!

Currently using Apache Skywalking myself, because it's reasonably simple to get up and running, as well as integrate with some of the more popular stacks: https://skywalking.apache.org/

I do wonder how ClickHouse (which Uptrace uses) would compare with something like ElasticSearch (which is used by Skywalking and some others) and how badly/well an attempt to use something like MariaDB/MySQL/PostgreSQL for a similar workload would actually go.

I mean, something like Matomo Analytics already uses a traditional RDBMS for storing its data, albeit it might be an order of magnitude or two off from the typical APM solution.

vmihailencoOP3y ago

When compared with ElasticSearch, ClickHouse can handle the same amount data using 10x less resources and that is not an exaggeration. It is even worse with MariaDB/MySQL/PostgreSQL.

I guess ElasticSearch is still relevant when it comes to searching text, but ClickHouse is much faster when it comes to filtering and analyzing the data.

Give ClickHouse a try and you won't be disappointed.

https://benchmark.clickhouse.com/

tylergetsay3y ago

I think the log interface should be optimized for keyboard navigation and larger screens. On my 4k monitor it only takes up 1/2 the width and only shows 10 lines at a time, id expect closer to ~100

vmihailencoOP3y ago

Thanks for the feedback. Any projects that you could recommend that do it right?

tmd833y ago

I wonder if anyone can answer some question on distributed tracing for me.

The difference between old days of APM vs. tracing as I understand is two things.

1. Originally APM was single process and it was language aware, usually do sampling stacktrace to find where times are being taken and some very well know place to instrument for exact timing say response time or query time.

Tracers are more working by instrumenting methods of framework/servers/runtime at well known point and getting the timing. In man ways it's a lot more coarse as it might know of a hot loop that I have in my code. But it can trace very well with exact timing at framework boundary like web, cache, db etc.

2. The APM were primarily single process and couldn't really show a different service/process which doesn't work in a micro-service/distributed world.

The way I understand it is that Tracers would allow me to narrow down to the service/component very easily. Whether I can find out why that component is slow might not be as easy (not sure what granularity tracing happens inside a component).

I wonder if this understanding of mine is correct.

The second thing I am really unsure about is sampling and overhead. What's the usual overhead of a single tracing (I know it's variable) but generally are they more expensive at a single request level. Also do they usually sample and is there a good/recommend way to sample this. I forgot exactly who but (probably NewRelic) was saying they collect all traces (like every request?) and discard if they are not anomalous (to save on storage). But does that mean taking a trace is very cheap? And is that end of the request sampling decision something that's common or that's a totally unique capability some have.

vmihailencoOP3y ago

My understanding is that APM became or always was a marketing term which is used rather freely. For that reason I try to avoid it, but search engines love it and I don't know a better alternative.

>Whether I can find out why that component is slow might not be as easy (not sure what granularity tracing happens inside a component).

It is true that you can't always guess what operation going to be slow and instrument it, but it is almost always a network or a database call. There is still no way to tell *why* it is slow, but the more data you have the more hints you get.

>What's the usual overhead of a single tracing

Depending on what is your base comparison point the answer can be very different.

Usually, you trace or instrument network/filesystem/database calls and in those cases the overhead is negligible (few percents at most).

>But does that mean taking a trace is very cheap?

What you've described is tail-based sampling and it only helps with reducing storage requirements. It does not reduce the sampling overhead. Check https://uptrace.dev/opentelemetry/sampling.html

But is taking a trace cheap? Definitely. Billions of traces? Not so.

>request sampling decision something that's common or that's a totally unique capability some have.

It is a common practice to reduce cost of sampling when you have billions of traces, but it is an advanced feature because it requires backends to buffer incoming spans in memory so you can decide if the trace is anomalous or not.

Besides, you can't store only anomalous traces because you will lose a lot of useful details and you can't really detect anomaly without knowing what is the norm.

Hopefully that helps.

tmd833y ago

By traditional APM I primarily meant stacktrace sampling based monitoring of applications.

As for overhead of tracing I wanted roughly compare (obviously it depends on the application a lot) stacktrace sampling vs. tracing based one. Are they usually of similar overhead or say tracing is lighter?

I was thinking tail based sampling could be a lot more expensive because say a head based sampling is doing trace for 10% request whereas regardless of how many sample are kept a tail based one is dong 100% trace. So tracing overhead would be much higher right?

I'm not sure why head based sampling is being called accurate in your doc? Isn't it the least accurate in a sense that it's purely statistical and rare outlier like latency spike or error could be missed?

And yes obviously a tail based sampling has to be something like (trace 5% random request or 1 every five + any outlier that gets calculated based on the captured trace)/

PeterZaitsev3y ago

False Advertising!

BSL Licensed is not Open Source. To Be fair Utrace restrictions are relatively light but it is still Source Available project not Open Source

nik7363y ago

Nice! Exactly what I've been looking for, will give it a try for sure. Sentry eats a lot of resources so I was looking for an alternative.

vmihailencoOP3y ago

Thanks! Don't hesitate to send any feedback you have so we have a chance to improve :)

edf133y ago

Looks nice... I'm a bit out of touch in this space but my last solution for similar would be Datadog. How does this compare?

vmihailencoOP3y ago

DataDog has a high learning curve and can be rather expensive if you need to monitor a lot of hosts and microservices.

Uptrace tries hard to stay simple while providing almost the same set of features. It can also be self-hosted without paying anything which can save a lot of money.

Uptrace aims to be an open source alternative to DataDog, but realistically we are not there yet.

xyzzy_plugh3y ago

I've been out of the loop for a while but...

> OpenTelemetry Protocol (OTLP)

> OTLP

> OLTP

I'm going back to bed.

xfer3y ago

Anyways to export dashboard for public viewing, maybe even static image? It looks like all drawing is done client side at present.

vmihailencoOP3y ago

Echarts which we use supports exporting charts as images so it probably can be added relatively easy. Embedding is another possible option.

can you elaborate more on why clickhouse for backend? And what challenges if any are you facing with clickhouse?

wdb3y ago

How does it compare to Opstrace? (www.opstrace.com)

j / k navigate · click thread line to collapse