> Grafana Cloud offers a hosted Grafana, Prometheus metrics scraping and storage, and Log tailing and storage (via Loki)
I haven't looked at their pricing before, but for small-ish environments, their standard plan looks really good and simple. None of the "per host, but also per function, and extra for each feature, and extra for usage" approach like other providers (datadog, I'm looking at you).
> None of the "per host, but also per function, and extra for each feature, and extra for usage" approach like other providers (datadog, I'm looking at you).
I was thinking "God, this is exactly why I hate Datadog" as I was reading your description and got a great laugh when I reached the end. Their billing is absolutely byzantine.
I don't know that I've ever seen a company that had such a stark difference between great engineering/product and awful business/sales practices. Their product is really the best turn-key option out there, but I'm always hesitant to use its features without double checking it's not going to add 50% to my bill. Their sales teams are some of the worst I've dealt with, and I deal with a lot of vendors. They're starting to get a really bad reputation as well.
I'm a customer that uses most of their tools (no network performance monitoring since it's less useful than a service mesh and no logging because we need longer history than most and cost would be prohibitive).
Is it really that expensive when compared to other vendors? Thought their newer logging tool was a lot cheaper than splunk and their apm tool for distributed tracing is also pretty cheap when compared with something like new relic. Sure it's more expensive than free tools that you need to setup yourself. But the velocity it lets your teams have is so much better than having to use something like grafana with tools like Prometheus. Again, sure it can be done for cheap, but the time it takes to manage those tools and the velocity that you lose when doing that doesn't seem like it's worth it for smaller companies but I can see it making more sense as you scale a company.
It's not the cost per se, though I do think they're pretty high for some features. It's the pricing models and the associated patterns.
For instance, you have to pay Datadog per host you install the agent on. In addition to the per host cost, you have to pay per container you run on that host (past a very small baseline per host), and the per contain cost turns out to be nearly as high as the per host cost if you have reasonable density. Why am I paying Datadog per container I run? Aside from a not particularly useful dashboard, why does a process namespace and some cgroup metrics nearly double my bill? They are literally just processes on a server. Because Datadog wants you to run more hosts, so you install more agents.
Every feature they add also seems to be charged separately, but is not behind any sort of feature gate. This means new features just show up for my developers, and they have no clue if it costs money to use them. I can't just disable or cap, for example, their custom metrics per user, per project, or at all. So when my developers see a useful feature and start using it, all of a sudden I have an extra $10k on my monthly bill. Even more fun are features that show up and are initially free but then start charging.
This is such a pain that we've had to tell dev teams not to use Datadog features outside of a curate list. Every product has some rough edges, but with Datadog the patterns are all setup such that you end up paying them thousands of extra dollars. Again, great product, but not a business I would be interested in associating with again given the choice.
It's not so much the total cost, but the fact that there's so much nickel and diming. When Trace Analytics came out they tried getting us to turn it on, and its like...we're already paying for APM and you want to charge us more, at least tell us how much more and they couldn't. I think it probably ended up not being a ton of money, but just the question was enough for us to not do it. From working with other providers, it's also much easier working with our finance if we can say 'it costs at most this' instead of 'it costs at least this'
> Their product is really the best turn-key option out there, but I'm always hesitant to use its features without double checking it's not going to add 50% to my bill.
You might want to check out New Relic One, especially with the new pricing model. I think they even added a Prometheus integration recently?
It depends where you are in the world. When I was working in Switzerland, most SaaS pricing were no-brainers for us. But since I work in Latin America for small companies with local costumers, all the different services and tools you might want to use, with prices targeted at "western" customers, much more quickly add up to the equivalent of having multiple people on staff full time.
Still it is often not worth to roll your own, so it is nice to have alternatives for different price points and company scales.
Exactly this. We operate in Eastern Europe with local clients, offering on-prem SaaS. If I added all my clients' servers on datadog it would very easily eat through our profit margins.
> Still it is often not worth to roll your own
I tried hard not to, but at at the end, after spending 1 week trying to setup netdata and failed, I decided not to spend another week trying to setup grafana/influx/prometheus (lot's of docs to go through), and just have some bash scripts send metrics on a $10 digital ocean node service that sends me emails/sms when something "looks bad" (eg. high cpu temperature, stopped docker containers, etc).
I gave up on aggregated logging for the time being, since I can just ssh on each server and check journal and docker logs if I need to (as long as the hard drives don't crash).
Yeah, having looked at what the script does I decided to 'containerize' the agent, and that led to other issues like configuring email alerts etc.
I was already a week deep into looking at various options and had to deliver on basic metrics and alerting, so I figured a couple of bash scripts, that log into local files with log rotation, systemd, and a dump/memory only receiving end running on nodejs for the alerts would be much faster and easier to maintain.
I guess it refers to them servicing (updating, reacting to downtime, etc.) the software, while it being deployed on premise? In contrast to the clients IT department doing so.
If you're in an established company making money - yes. If you're bootstrapping a service and counting on $50 total monthly cost while initial users are signing up - no.
I haven't looked at their pricing before, but for small-ish environments, their standard plan looks really good and simple. None of the "per host, but also per function, and extra for each feature, and extra for usage" approach like other providers (datadog, I'm looking at you).