Blackbox Exporter SSL Probe Alerting: A Practical Guide
As engineers, we've all been there: a certificate expires, a service goes down, and suddenly, the pagers are screaming. Monitoring SSL/TLS certificate expiry is a critical, yet often overlooked, aspect of maintaining reliable systems. While many dedicated solutions exist, a common approach for teams already invested in the Prometheus ecosystem is to leverage blackbox_exporter.
blackbox_exporter is a versatile tool designed to probe endpoints over various protocols (HTTP, HTTPS, TCP, ICMP, DNS) and expose the results as Prometheus metrics. It's excellent for basic "is it up?" checks. But can it truly be your sole guardian against certificate expiry nightmares? This article dives into how to configure blackbox_exporter for SSL/TLS certificate monitoring, its strengths, its limitations, and what you need to consider before relying on it exclusively.
Understanding blackbox_exporter for SSL/TLS Monitoring
At its core, blackbox_exporter acts as a probe. You tell it an endpoint (e.g., https://your-service.com), and it attempts to connect, performs some checks, and then reports back. For SSL/TLS, it can connect to a secure endpoint and inspect the certificate presented by the server.
The key metrics blackbox_exporter provides for SSL/TLS monitoring are:
probe_ssl_earliest_cert_expiry: The earliest expiry timestamp of any certificate in the chain, in Unix seconds. This is your primary metric for expiry alerts.probe_ssl_last_chain_expiry_timestamp_seconds: The expiry timestamp of the last certificate in the chain (usually the root CA), in Unix seconds.probe_ssl_last_chain_info: Information about the last certificate in the chain, including common name, issuer, and serial number.probe_ssl_server_name: The server name used for SNI (Server Name Indication) during the TLS handshake.probe_success: A boolean (1 or 0) indicating if the probe itself was successful. This is crucial for detecting connectivity issues before certificate expiry.
By querying probe_ssl_earliest_cert_expiry, you can determine how much time remains until a certificate expires and set up alerts accordingly.
Setting Up blackbox_exporter for SSL Probes
First, you need blackbox_exporter running. It typically listens on port 9115.
blackbox.yml Configuration
You'll define a module in blackbox.yml that specifies how to perform the TLS probe. Here's an example for a basic HTTPS check:
modules:
http_2xx_tls:
prober: http
http:
preferred_ip_protocol: "ipv4"
tls_config:
insecure_skip_verify: false # Set to true if you need to ignore certificate validation issues (e.g., self-signed)
fail_if_not_ssl: true
fail_if_ssl: false
method: GET
no_follow_redirects: false
tls_config:
# If your target uses an internal CA, specify it here:
# ca_file: /etc/ssl/certs/internal-ca.crt
valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
In this configuration, http_2xx_tls is our module name. We're using the http prober, ensuring that fail_if_not_ssl is true (so we only succeed if TLS is used), and insecure_skip_verify is false (meaning we want proper certificate validation).
Prometheus Configuration
Next, configure Prometheus to scrape blackbox_exporter for your desired targets. You'll specify blackbox_exporter as the static_configs target, but importantly, you'll pass the actual service URL as a param to blackbox_exporter.
Let's say you want to monitor the SSL certificate for google.com.
scrape_configs:
- job_name: 'blackbox_ssl'
metrics_path: /probe
params:
module: [http_2xx_tls] # Use the module defined in blackbox.yml
static_configs:
- targets:
- https://google.com # The actual target to probe
- https://github.com # Another example target
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox-exporter:9115 # Address of your blackbox_exporter instance
Here, https://google.com and https://github.com are passed as the target parameter to blackbox_exporter. Prometheus then scrapes blackbox-exporter:9115/probe?module=http_2xx_tls&target=https://google.com (and https://github.com for the second target). The relabel_configs ensure that the instance label in Prometheus reflects the actual service being probed, rather than the blackbox_exporter itself.
Crafting Alerting Rules with Prometheus and Alertmanager
With the metrics flowing into Prometheus, you can now define alerting rules. You'll typically want to be notified well in advance of an expiry. A common practice is to set up multiple alerts: a warning at 30 days, and a critical alert at 7 days.
Create a file like alerting_rules.yml and include it in your Prometheus configuration.
groups:
- name: ssl_certificate_expiry
rules:
- alert: SSLCertificateExpiresSoon
expr: |
probe_ssl_earliest_cert_expiry - time() < 86400 * 30
AND
probe_success == 1
for: 5m # Ensure the condition persists for 5 minutes before firing
labels:
severity: warning
annotations:
summary: "SSL Certificate for {{ $labels.instance }} expires in less than 30 days"
description: "The SSL certificate for {{ $labels.instance }} is expiring on {{ $value | humanizeTimestamp }}. Please renew it soon."
- alert: SSLCertificateExpiresCritical
expr: |
probe_ssl_earliest_cert_expiry - time() < 86400 * 7
AND
probe_success == 1
for: 5m
labels:
severity: critical
annotations:
summary: "SSL Certificate for {{ $labels.instance }} expires in less than 7 days"
description: "ACTION REQUIRED: The SSL certificate for {{ $labels.instance }} is expiring on {{ $value | humanizeTimestamp }}. Renew immediately!"
- alert: SSLCertificateProbeFailed
expr: |
probe_success == 0
AND
probe_http_status_code != 404 # Exclude cases where the target simply doesn't exist
for: 1m
labels:
severity: critical
annotations:
summary: "Blackbox Exporter failed to probe {{ $labels.instance }}"
description: "Blackbox Exporter failed to successfully probe {{ $labels.instance }}. This could indicate a network issue, DNS problem, or the service being down."
These rules calculate the remaining time until expiry (probe_ssl_earliest_cert_expiry - time()). 86400 is the number of seconds in a day. The probe_success == 1 condition is vital; you don't want an expiry alert if the probe itself is failing due to a network outage or the service being down. A separate alert for probe_success == 0 is also included to catch connectivity issues.
These alerts will then be sent to Alertmanager, which can route them to various notification channels like Slack, email, PagerDuty, etc.
Pitfalls and Edge Cases of blackbox_exporter for SSL Monitoring
While blackbox_exporter is a powerful tool, it's essential to understand its limitations when it comes to comprehensive SSL/TLS certificate monitoring:
- Single Endpoint Focus:
blackbox_exportermonitors the certificate presented by a specific endpoint (IP address and port). If you have a load balancer fronting multiple servers, and only one