I want improvements like “dropdown menu of tags not limited to 50 items” and “one button forecast on any graph”. Unsure if any SQL query language will solve my annoyance that stuff in Datadog was one click, and in Grafana is 14 clicks or impossible without referencing a Book of the Arcane.
Heck I’m dying for “I can copy a graph between dashboards”. Grafana allows this but if any variable is in the graph but doesn’t exist in the destination, pasting just creates an empty graph.
For Grafana graphs, I think there's a graph library feature that solves that problem but it's been a couple years since I seriously used Grafana so don't remember the specifics
As for forecasting/trends, usually that's offloaded to the underlying data storage system so really depends on where the data is being stored (Prometheus, Influx, Mimir, Elastic, etc)
I use New Relic and I'm not sure I would recommend it. Sure it was easy to install on all the server but now I keep getting alerts every months (on like the 2nd day) that I ran out of some "quota" of 100GB. I thought that's impossible as I almost don't use it but apparently they (used to) send by default list of running processes every second and that gunks things up.
Also I setup alerts - for >50% CPU or >90% disk full. I do get an alert but it doesn't say which volume or how full is it - what was the actual value that triggered sending the alert. WTF.
That's really a problem with all of them (generating too much data which ends up being expensive) but defaults definitely could have been better/more clear there
We migrated from newrelic to datadog (for cost reasons LMAO) a while back and I miss NRQL every single day I'm building a dashboard.
I enjoy having everything instrumented and in one spot, it's super powerful, but I am currently advocating for self hosting loki so that we can have debug+ level logs across all environments for a much much lower cost. Datadog is really good at identifying anomalies, but the cost for logs is so high there's a non-trivial amount of savings in sampling and minimizing logging. I HATE that we have told devs "don't log so much" -- that misses the entire point of building out a haystack. And sampling logs at 1%, and only logging warnings+ in prod makes it even harder to identify anomalies in lower environments before a prod release.
last hot take: The UX in kibana in 2016 was better than anything else we have now for rapidly searching through a big haystack, and identifying and correlating issues in logs.
Datadog log ingest isn't too expensive which will still enrich the logs and can dump to S3 (log archive) where you can use Athena/Trino
Log indexing is $$$, though for sure
Curious on Loki cost. When I priced out ELK at a smaller company it didn't come in much cheaper than $0.50/Gi everyone seems to charge (30 day retention, 2 shards, object storage backups). Back when I worked at JPMC, their internal service was also billed right around there.
They have a SQL like query language that I think can do most of what you're describing