How do you monitor your servers / VPS:es?

Hello selfhosters.

We all have bare-metal servres, VPS:es, containers and other things running. Some of them may be exposed openly to the internet, which is populated by autonomous malicious actors, and some may reside on a closed-off network since they contain sensitive data.

And there is a lot of solutions to monitor your servers, since none of us want our resources to be part of a botnet, or mine bitcoins for APTs, or simply have confidential data fall into the wrong hands.

Some of the tools I’ve looked at for this task are check_mk, netmonitor, monit: all of there monitor metrics such as CPU, RAM and network activity. Other tools such as Snort or Falco are designed to particularly detect suspicious activity. And there also are solutions that are hobbled together, like fail2ban actions together with pushover to get notified of intrusion attempts.

So my question to you is - how do you monitor your servers and with what tools? I need some inspiration to know what tooling to settle on to be able that detect unwanted external activity on my resources.

Image

Image alternative text

Strit, 1 year ago

I’m pretty old school, but as I only have 1 server, I just use ssh, df, du and top.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

loudwhisper, 1 year ago

I run Prometheus on a separate cluster, so I plug my servers with node_exporter and scrape metrics. I then alert with grafana. To be honest, the setup is heavier (resource usage-wise) than I would like for my use case, but it’s what I am used to, and scales well to multiple machines.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

JonnyJaap, 1 year ago

I used zabbix at some point, but I never looked at the data so I stopped. Zabbix shows all kind of stuff.

I have cockpit on my bare-metal that has some stats, and netdata on my firewall, I do not track any of my VM’s (except vnstat that runs on everything device).

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Cyberflunk, 1 year ago

Reduce your threat profile. Run sslh, 443 handles both SSL and ssh. Adjust your host based firewall to just 443 Attack yourself on that port, identify the logs Add the new profiles to fail2ban Enable fail2ban email If you don’t like email, use a service that translates email to notification. Ntfy.sh is free notifications Or… Use something like tailscale and don’t offer a remote login to the general Internet.

I submitted your post to got here’s what it thought

shareg.pt/Tz0El4k

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

TheGreenGolem, 1 year ago

It cannot notify you, you have to check it manually, but: I use DaRemote on my phone to periodically check my bare metal.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

avidamoeba, 1 year ago (edited 1 year ago)

Prometheus.

It’s open source, it’s easy to setup, its agents are available for nearly anything including OpenWrt, it can serve the simplest use case of “is it down” as well as much more complicated ones that stem from its ability to collect data over time.

Personally I’m monitoring:

Is it up?

Is the storage array healthy?

Are the services I care about running?

I used to run it ephemerallly - wiping data on restart. Recently started persisting its data so I can see data over the longer run.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

surewhynotlem, 1 year ago

What do you use to see the data? Prometheus itself is easy to set up, but getting to the data seemed complicated.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

lud, 1 year ago

You can use grafana to visualise the data.

Grafana isn’t too hard to use.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

taladar, 1 year ago

Icinga2 works reasonably well for us. It is easy to write new checks as small shell scripts (or any other binary that can print and set and exit status code).

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

dataprolet, 1 year ago

Uptime-Kuma

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

vegetaaaaaaa, 1 year ago (edited 1 year ago)

Netdata (agent only/not the cloud-based features), and a bunch of scanners running from cron/systemd timers, rsyslog for logs (and graylog for larger setups)

My base ansible role for monitoring.

Since your question is also related to securing your setup, inspect and harden the configuration of all running services and the OS itself. Here is my common ansible role for basic stuff. Find (prefereably official) hardening guides for your distribution and implement hardening guidelines such as DISA STIG, CIS benchmarks, ANSSI guides, etc.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

lemann, 1 year ago

I used to pass all the data through to Home Assistant and show it on some dashboards, but I decided to move over to Zabbix.

Works well but is quite full-featured, maybe moreso than necessary for a self hoster. Made a mediatype integration for my announciator system so I hear issues happening with the servers, as well as updates on things, so I don’t really need to check manually. Also a custom SMART template that populates the disk’s physical location/bay (as the built in one only reports SMART data).

It’s notified me of a few hardware issues that would have gone unnoticed on my previous system, and helped with diagnosing others. A lot of the sensors may seem useless, but trust me, once they flag up you should 100% check on your hardware. Hard drives losing power during high activity because of loose connections, and a CPU fan failure to name two.

It has a really high learning curve though so not sure how much I can recommend it over something like Grafana+Prometheus - something I haven’t used but the combo looks equally as comprehensive as long as you check your dashboard regularly.

Just wish there were more android apps

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

its_me_gb, 1 year ago (edited 1 year ago)

Prometheus for metrics

Loki for logs

Grafana for dashboards.

I use node exporter for host metrics (Proxmox/VMs/SFFs/RaspPis/Router) and a number of other *exporters:

exportarr

plex-exporter

unifi-exporter

bitcoin node exporter

I use the OpenTelemetry collector to collect some of the above metrics, rather than Prometheus itself, as well as docker logs and other log files before shipping them to Prometheus/Loki.

Oh, I also scrape metrics from my Traefik containers using OTEL as well.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

lud, 1 year ago

Have you tried the proxmox exporter? I have tried it briefly for a grafana lab and it seemed pretty good.

github.com/…/prometheus-pve-exporter

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

its_me_gb, 1 year ago

I haven’t, but it looks like I’ve got another exporter to install and dashboard to create 😁

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

lud, 1 year ago

If you want to run the exporter without docker (like I did) and you get problems with installing the exporter try using this guide: github.com/…/PVE-Exporter-on-Proxmox-VE-Node-in-a…

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

namelivia, 1 year ago

What does having OpenTelemetry improve? I have a setup similar to yours but data goes from Prometheus to Grafana and I never thought I would need anything else.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

its_me_gb, 1 year ago

Not a whole lot to be honest. But I work with OpenTelemetry everyday for my day job, so it was a little exercise for me.

Though, OTEL does have some advantages in that It is a vendor agnostic collection tool. allowing you to use multiple different collection methods and switch out your backend easily if you wish.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

makingrain, 1 year ago

Uptime Kuma and ntfy.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

possiblylinux127, 1 year ago

I don’t do much in the way of monitoring. I guess I should do that.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

johntash, 1 year ago

UptimeKuma is great, I use it for the simple “are my services up?” and is what I pay most attention to.

I still use zabbix for finer grained monitors though like checking raid status, smartctl, disk space, temperatures, etc.

I’ve been trying out librenms with more custom snmp checks too and am considering going that route instead of zabbix in the future

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

namelivia, 1 year ago

Prometheus, Loki and Grafana.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

johannes, 1 year ago

Golden! We use the same :)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Add comment