You can ingest data using OpenTelemetry Protocol (OTLP), Vector Logs, and Zipkin API. You can also use OpenTelemetry Collector to collect Prometheus metrics or receive data from Jaeger, X Ray, Apache, PostgreSQL, MySQL and many more.
The latest Uptrace release introduces support for OpenTelemetry Metrics which includes:
- User interface to build table-based and grid-based dashboards.
- Pre-built dashboard templates for Golang, Redis, PostgreSQL, MySQL, and host metrics.
- Metrics monitoring aka alerting rules inspired by Prometheus.
- Notifications via email/Slack/PagerDuty using AlertManager integration.
There are 2 quick ways to try Uptrace:
- Using the Docker container - https://github.com/uptrace/uptrace/tree/master/example/docke...
- Using the public demo - https://app.uptrace.dev/play
I will be happy to answer your questions in the comments.
We recently tried Signoz and Grafana Tempo and while I can't say something about uptrace yet (will definitely try it out) I want to list some pros and cons about them.
Grafana Tempo
Pros:
- Easy and smooth integration into our existing Grafana instance, no additional frontend needed
- No new storage engine needed (No additional Clickhouse, Postgres, etc) as it saves its data to S3
- Supports OTLP
Cons:
- Search is limited by param size and unique params (as its baked to be indexed)
- Ingestion is not in real time, but configurable (time to finish span)
Signoz:
Pros:
- Supports OTLP
- Integrates Logs and Metrics within the same service (for Grafana you need Loki then)
- Supports real time querying
Cons:
- Uses new storage engines (or extends the software stack) with adding ClickHouse
- Adds an additional frontend (might not be relevant for everyone)
- Doesn't provide SSO yet, so you need to manage users differently
Interesting to see, that UpTrace also chose ClickHouse (btw I love ClickHouse!)
Some questions:
- Can I easily disable certain features? (e.g. alerting)
- Is there support for SSO for self-hosted installation?
- Are there any recommendations for scaling (e.g. benchmarks) on how many spans/s are supported on what hardware?
Thanks in advance!
>- Are there any recommendations for scaling (e.g. benchmarks) on how many spans/s are supported on what hardware?
With Uptrace, I was able to achieve 10k spans / second on a single core by running the binary with GOMAXPROCS=1. That is 1-3 terrabytes of compressed data each month which is more than most users need.
Practically, you are limited by the $$$ you are willing to spend on ClickHouse servers, not by Uptrace ingestion speed.
So my recommendation is to scale Uptrace vertically by throwing more cores at it. That will allow you to go very very far.
>- Is there support for SSO for self-hosted installation?
So far the only way to add news users is via the YAML config. We are considering to add a REST API or a CLI tool for the same purpose, but it is not clear how that would work with the YAML.
Regarding the SSO, it would be nice if you can provide an app that already does that so we can better estimate the complexity. But so far we don't have such plans.
>- Can I easily disable certain features? (e.g. alerting)
Yes, most YAML sections just can be removed / commented out to disable the feature.
I have no other good examples for SSO than Grafana.
But something I would love to see more for logins are Application Tokens. We use Cloudflare Access for Team related logins, which will send such a token in the header, so the application can use it to authorize a user and by which group a user is in it can enable/disable features
https://developers.cloudflare.com/cloudflare-one/identity/au...
This solved our needs for multiple sign in options, as all is now managed through Cloudflare access, but this is obviously not a solution for everyone
Thanks for laying out the points in Pro section. We also recently launched logs witg v0.11.0 so you may want to give it a try again - we now have have metrics, logs and traces in a single app.
Would love to understand a few points in more details you have mentioned in Cons for SigNoz
> - Uses new storage engines (or extends the software stack) with adding ClickHouse Can you explain a bit more on the concern here?
> - Doesn't provide SSO yet, so you need to manage users differently
This is in our roadmap and we will be shipping it soon.
The latest Uptrace release introduces support for OpenTelemetry Metrics which includes:
- User interface to build table-based and grid-based dashboards.
- Pre-built dashboard templates for Golang, Redis, PostgreSQL, MySQL, and host metrics.
- Metrics monitoring aka alerting rules inspired by Prometheus.
- Notifications via email/Slack/PagerDuty using AlertManager integration.
There are 2 quick ways to try Uptrace:
- Using the Docker container - https://github.com/uptrace/uptrace/tree/master/example/docke...
- Using the public demo - https://app.uptrace.dev/play
I will be happy to answer your questions in the comments.
- https://vector.dev/docs/reference/configuration/sources/dock...
- https://uptrace.dev/get/ingest/vector.html
If you are having troubles making it work, feel free to open an issue on Github and I will provide a complete example.
I still have yet to see new profiling options. When I think of APM, I think of CPU profiling and automatic instrumentation of black box systems, not request tracing. I should be able to see which function calls are slow/problematic, without having to add code to the application.
But I find CPU profiling a lot less useful, because production profiles tend to be too broad and it is hard to make sense from them, for example, Uptrace profile mostly consists of memory allocations and network calls.
So I would not say that CPU profiling is superior/better/can replace tracing.
Nothing wrong with the parent comment but do we really need a ‘yes it is’ / ‘no it isn’t’ back and forth that goes on seemingly forever almost EVERY TIME?
I’m gonna need a ‘license nitpick remover’ to complement my adblocker!
Out of curiosity, what makes you uncomfortable about the license?
I'm guessing they aligned the license terms on those of ClickHouse, which is the underlying data store for Uptrace. From my understanding, if you use Uptrace and ClickHouse to manage your internal telemetry and don't offer it to clients, you should be fine. Still, non-standard license terms give pause, as there is always the possibility they will be restricted further in a bait-and-switch operation like that done by MongoDB or ElasticSearch.
Since you already aware about it, could you update the OP from open source to source available, for the sake of transparency? If edit option is not available, you can request mods / dang
I can't speak for the OP, but my view is that all these semi-open (AGPL,BSL etc.) licenses do is muddy the waters. Its essentially giving the developer's lawyers enough grey area to work with in order to find something they can pin you on.
IMHO a company's code should either be closed source or open source. Wishy-washy no-mans-land wordings in the middle don't really help anyone (except the lawyer's bank balance, of course).
> The Licensor hereby grants you the right to copy, modify, create derivative works, redistribute, and make non-production use of the Licensed Work.
So what is the point of even trying to install it in other environments?
Currently using Apache Skywalking myself, because it's reasonably simple to get up and running, as well as integrate with some of the more popular stacks: https://skywalking.apache.org/
I do wonder how ClickHouse (which Uptrace uses) would compare with something like ElasticSearch (which is used by Skywalking and some others) and how badly/well an attempt to use something like MariaDB/MySQL/PostgreSQL for a similar workload would actually go.
I mean, something like Matomo Analytics already uses a traditional RDBMS for storing its data, albeit it might be an order of magnitude or two off from the typical APM solution.
I guess ElasticSearch is still relevant when it comes to searching text, but ClickHouse is much faster when it comes to filtering and analyzing the data.
Give ClickHouse a try and you won't be disappointed.
The difference between old days of APM vs. tracing as I understand is two things.
1. Originally APM was single process and it was language aware, usually do sampling stacktrace to find where times are being taken and some very well know place to instrument for exact timing say response time or query time.
Tracers are more working by instrumenting methods of framework/servers/runtime at well known point and getting the timing. In man ways it's a lot more coarse as it might know of a hot loop that I have in my code. But it can trace very well with exact timing at framework boundary like web, cache, db etc.
2. The APM were primarily single process and couldn't really show a different service/process which doesn't work in a micro-service/distributed world.
The way I understand it is that Tracers would allow me to narrow down to the service/component very easily. Whether I can find out why that component is slow might not be as easy (not sure what granularity tracing happens inside a component).
I wonder if this understanding of mine is correct.
The second thing I am really unsure about is sampling and overhead. What's the usual overhead of a single tracing (I know it's variable) but generally are they more expensive at a single request level. Also do they usually sample and is there a good/recommend way to sample this. I forgot exactly who but (probably NewRelic) was saying they collect all traces (like every request?) and discard if they are not anomalous (to save on storage). But does that mean taking a trace is very cheap? And is that end of the request sampling decision something that's common or that's a totally unique capability some have.
>Whether I can find out why that component is slow might not be as easy (not sure what granularity tracing happens inside a component).
It is true that you can't always guess what operation going to be slow and instrument it, but it is almost always a network or a database call. There is still no way to tell *why* it is slow, but the more data you have the more hints you get.
>What's the usual overhead of a single tracing
Depending on what is your base comparison point the answer can be very different.
Usually, you trace or instrument network/filesystem/database calls and in those cases the overhead is negligible (few percents at most).
>But does that mean taking a trace is very cheap?
What you've described is tail-based sampling and it only helps with reducing storage requirements. It does not reduce the sampling overhead. Check https://uptrace.dev/opentelemetry/sampling.html
But is taking a trace cheap? Definitely. Billions of traces? Not so.
>request sampling decision something that's common or that's a totally unique capability some have.
It is a common practice to reduce cost of sampling when you have billions of traces, but it is an advanced feature because it requires backends to buffer incoming spans in memory so you can decide if the trace is anomalous or not.
Besides, you can't store only anomalous traces because you will lose a lot of useful details and you can't really detect anomaly without knowing what is the norm.
Hopefully that helps.
As for overhead of tracing I wanted roughly compare (obviously it depends on the application a lot) stacktrace sampling vs. tracing based one. Are they usually of similar overhead or say tracing is lighter?
I was thinking tail based sampling could be a lot more expensive because say a head based sampling is doing trace for 10% request whereas regardless of how many sample are kept a tail based one is dong 100% trace. So tracing overhead would be much higher right?
I'm not sure why head based sampling is being called accurate in your doc? Isn't it the least accurate in a sense that it's purely statistical and rare outlier like latency spike or error could be missed?
And yes obviously a tail based sampling has to be something like (trace 5% random request or 1 every five + any outlier that gets calculated based on the captured trace)/
BSL Licensed is not Open Source. To Be fair Utrace restrictions are relatively light but it is still Source Available project not Open Source
Uptrace tries hard to stay simple while providing almost the same set of features. It can also be self-hosted without paying anything which can save a lot of money.
Uptrace aims to be an open source alternative to DataDog, but realistically we are not there yet.
> OpenTelemetry Protocol (OTLP)
> OTLP
> OLTP
I'm going back to bed.