Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My solution is to just be OK with http status checking (run a webserver on important machines), and use a service like updown.io which is so cheap it's almost free.

e.g. For 1 machine, hourly checking is ~$0.25/year



Do you do regular backups? If your backup system breaks and stop making new backups, what will let you know? What if your RAID is failing, running out of space, remounted read-only after an error?

I have found that "machine is online" is usually not what I need monitoring for, at all. I'll notice if it's down. It's all the mission-critical-but-silently-breakables that I bother to monitor.


Not OP, https://healthchecks.io is great for monitoring automated tasks like backup scripts. Also has the option to immediately signal failure and send an alert: https://healthchecks.io/docs/signaling_failures/


That's what I use for cron-type things. Experience has been great. I also run it as a watchdog in my alertmanager container, so I am alerted if the alerts are broken.


updown.io also has a relatively new feature called cron monitoring[0] that allows you to regularly check in to signal success. If there has been no check-in in a configured time it will alert you. For backups you could add a simple curl somewhere into your backup process to do just that.

[0] https://updown.io/doc/how-pulse-cron-monitoring-works




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: