Troubleshooting
Service logging
Logs produced by Sensu services – i.e. sensu-backend and sensu-agent – are often the best place to start when troubleshooting a variety of issues.
Log levels
Each log message is associated with a log level, indicative of the relative severity of the event being logged:
Log level | Description |
---|---|
panic | Severe errors causing the service to shut down in an unexpected state |
fatal | Fatal errors causing the service to shut down (status 0) |
error | Non-fatal service error messages |
warn | Warning messages indicating potential issues |
info | Informational messages representing service actions |
debug | Detailed service operation messages to help troubleshoot issues |
These log levels can be configured by specifying the desired log level as the
value of log-level
in the service configuration file (e.g. agent.yml
or
backend.yml
configuration files), or as an argument to the --log-level
command line flag:
sensu-agent start --log-level debug
Changes to log level via configuration file or command line arguments require restarting the service. For guidance on restarting a service, please consult the Operating section of the agent or backend reference, respectively.
Log file locations
Linux
Sensu services print structured log messages to standard output.
In order to capture these log messages to disk or another logging facility, Sensu services
make use of capabilities provided by the underlying operating system’s service
management. For example, logs are sent to the journald when systemd is the service manager,
whereas log messages are redirected to /var/log/sensu
when running under sysv
init schemes. If you are running systemd as your service manager and would rather have logs written to /var/log/sensu/
, see the guide to forwarding logs from journald to syslog.
In the table below, the common targets for logging and example commands for
following those logs are described. The name of the desired service, e.g.
backend
or agent
may be substituted for ${service}
variable.
Platform | Version | Target | Command to follow log |
---|---|---|---|
RHEL/Centos | >= 7 | journald |
|
RHEL/Centos | <= 6 | log file |
|
Ubuntu | >= 15.04 | journald |
|
Ubuntu | <= 14.10 | log file |
|
Debian | >= 8 | journald |
|
Debian | <= 7 | log file |
|
NOTE: Platform versions described above are for reference only and do not supercede the documented supported platforms.
Narrow your search to a specific timeframe
Use the journald
keyword since
to refine the basic journalctl
commands and narrow your search by timeframe.
Here are a few examples.
Retrieve all the logs for Sensu since yesterday:
journalctl -u sensu-backend.service --since yesterday | tee sensu-backend-$(date +%Y-%m-%d).log
Retrieve all the logs for Sensu since a specific time:
journalctl -u sensu-backend.service --since 09:00 --until "1 hour ago" | tee sensu-backend-$(date +%Y-%m-%d).log
Retrieve all the logs for Sensu for a specific date range:
journalctl -u sensu-backend.service --since "2015-01-10" --until "2015-01-11 03:00" | tee sensu-backend-$(date +%Y-%m-%d).log
Windows
The Sensu agent stores service logs to the location specified by the log-file
configuration flag (default: %ALLUSERSPROFILE%\sensu\log\sensu-agent.log
, C:\ProgramData\sensu\log\sensu-agent.log
on standard Windows installations).
For more information about managing the Sensu agent for Windows, see the agent reference.
You can also view agent events using the Windows Event Viewer, under Windows Logs, as events with source SensuAgent.
If you’re running a binary-only distribution of the Sensu agent for Windows, you can follow the service log printed to standard output using the following command.
Get-Content - Path "C:\scripts\test.txt" -Wait
Sensu backend startup errors
The following errors are expected when starting up a Sensu backend with the default configuration.
{"component":"etcd","level":"warning","msg":"simple token is not cryptographically signed","pkg":"auth","time":"2019-11-04T10:26:31-05:00"}
{"component":"etcd","level":"warning","msg":"set the initial cluster version to 3.3","pkg":"etcdserver/membership","time":"2019-11-04T10:26:31-05:00"}
{"component":"etcd","level":"warning","msg":"serving insecure client requests on 127.0.0.1:2379, this is strongly discouraged!","pkg":"embed","time":"2019-11-04T10:26:33-05:00"}
The serving insecure client requests
error is an expected warning from etcd.
TLS configuration is recommended but not required. For more information, see etcd security documentation.
Permission issues
Files and folders within /var/cache/sensu/
and /var/lib/sensu/
need to be owned by the sensu user and group. You will see a logged error similar to the following if there is a permission issue with either the sensu-backend or the sensu-agent:
{"component":"agent","error":"open /var/cache/sensu/sensu-agent/assets.db: permission denied","level":"fatal","msg":"error executing sensu-agent","time":"2019-02-21T22:01:04Z"}
{"component":"backend","level":"fatal","msg":"error starting etcd: mkdir /var/lib/sensu: permission denied","time":"2019-03-05T20:24:01Z"}
You can use a recursive chown
to resolve permission issues with the sensu-backend:
sudo chown -R sensu:sensu /var/cache/sensu/sensu-backend
or the sensu-agent:
sudo chown -R sensu:sensu /var/cache/sensu/sensu-agent
Troubleshooting handlers and filters
Whether implementing new workflows or modifying existing ones, its sometimes necessary to troubleshoot various stages of the event pipeline. In many cases generating events using the agent API will save you time and effort over modifying existing check configurations.
Here’s an example using curl with the API of a local sensu-agent process to generate test-event check results:
curl -X POST \
-H 'Content-Type: application/json' \
-d '{
"check": {
"metadata": {
"name": "test-event"
},
"status": 2,
"output": "this is a test event targeting the email_ops handler",
"handlers": [ "email_ops" ]
}
}' \
http://127.0.0.1:3031/events
Additionally, it’s frequently helpful to see the full event object being passed to your workflows. We recommend using a debug handler like this one to write an event to disk as JSON data:
type: Handler
api_version: core/v2
metadata:
name: debug
spec:
type: pipe
command: cat > /var/log/sensu/debug-event.json
timeout: 2
{
"type": "Handler",
"api_version": "core/v2",
"metadata": {
"name": "debug"
},
"spec": {
"type": "pipe",
"command": "cat > /var/log/sensu/debug-event.json",
"timeout": 2
}
}
With this handler definition installed in your Sensu backend, you can add the debug
to the list of handlers in your test event:
curl -X POST \
-H 'Content-Type: application/json' \
-d '{
"check": {
"metadata": {
"name": "test-event"
},
"status": 2,
"output": "this is a test event targeting the email_ops handler",
"handlers": [ "email_ops", "debug" ]
}
}' \
http://127.0.0.1:3031/events
The event data should be written to /var/log/sensu/debug-event.json
for inspection. The contents of this file will be overwritten by every event sent to the debug
handler.
NOTE: When multiple Sensu backends are configured in a cluster, event processing is distributed across all members. You may need to check the filesystem of each Sensu backend to locate the debug output for your test event.
Troubleshooting assets
Asset filters allow for scoping an asset to a particular operating system or architecture. You can see an example of those in the asset reference documentation. If an asset filter is improperly applied, this can prevent the asset from being downloaded by the desired entity and will result in error messages both on the agent and the backend illustrating that the command was not found:
Agent log entry
{
"asset": "check-disk-space",
"component": "asset-manager",
"entity": "sensu-centos",
"filters": [
"true == false"
],
"level": "debug",
"msg": "entity not filtered, not installing asset",
"time": "2019-09-12T18:28:05Z"
}
Backend event
{
"timestamp": 1568148292,
"check": {
"command": "check-disk-space",
"handlers": [],
"high_flap_threshold": 0,
"interval": 10,
"low_flap_threshold": 0,
"publish": true,
"runtime_assets": [
"sensu-plugins-disk-checks"
],
"subscriptions": [
"caching_servers"
],
"proxy_entity_name": "",
"check_hooks": null,
"stdin": false,
"subdue": null,
"ttl": 0,
"timeout": 0,
"round_robin": false,
"duration": 0.001795508,
"executed": 1568148292,
"history": [
{
"status": 127,
"executed": 1568148092
}
],
"issued": 1568148292,
"output": "sh: check-disk-space: command not found\n",
"state": "failing",
"status": 127,
"total_state_change": 0,
"last_ok": 0,
"occurrences": 645,
"occurrences_watermark": 645,
"output_metric_format": "",
"output_metric_handlers": null,
"env_vars": null,
"metadata": {
"name": "failing-disk-check",
"namespace": "default"
}
},
"metadata": {
"namespace": "default"
}
}
In the event you see a message like this, it’s worth going back and reviewing your asset definition as this will be your clue that the entity wasn’t able to download the required asset due to filter restrictions. If you can’t remember where you stored the information on disk, you can find it via:
sensuctl asset info sensu-plugins-disk-checks --format yaml
or
sensuctl asset info sensu-plugins-disk-checks --format json
One common filter issue is conflating operating systems with the family they’re a part of. For example, though Ubuntu is part of the Debian family of Linux distributions, Ubuntu != Debian. A practical example would look like:
...
- entity.system.platform == 'debian'
- entity.system.arch == 'amd64'
Which would not allow an Ubuntu system to run the asset. Instead, the filter should look like:
...
- entity.system.platform_family == 'debian'
- entity.system.arch == 'amd64'
or
- entity.system.platform == 'ubuntu'
- entity.system.arch == 'amd64'
Which would allow the asset to be downloaded onto the target entity.