As an OpenStack cloud is composed of so many different services, there are a large number of log files. This chapter aims to assist you in locating and working with them and describes other ways to track the status of your deployment.
Most services use the convention of writing their log files to
subdirectories of the /var/log directory
, as listed in Table 13.1, “OpenStack log locations”.
Node type | Service | Log location |
---|---|---|
Cloud controller |
|
|
Cloud controller |
|
|
Cloud controller |
|
|
Cloud controller |
|
|
Cloud controller |
|
|
Cloud controller |
horizon |
|
All nodes |
misc (swift, dnsmasq) |
|
Compute nodes |
libvirt |
|
Compute nodes |
Console (boot up messages) for VM instances: |
|
Block Storage nodes |
cinder-volume |
|
OpenStack services use the standard logging levels, at increasing severity: DEBUG, INFO, AUDIT, WARNING, ERROR, CRITICAL, and TRACE. That is, messages only appear in the logs if they are more "severe" than the particular log level, with DEBUG allowing all log statements through. For example, TRACE is logged only if the software has a stack trace, while INFO is logged for every message including those that are only for information.
To disable DEBUG-level logging, edit
/etc/nova/nova.conf
as follows:
1 | <a id = "d6e5807" class= "indexterm" >debug= false < /a > |
Logging for horizon is configured in
/etc/openstack_dashboard/local_
settings.py
.
Because horizon is a Django web application, it follows the Django
Logging framework conventions.
The first step in finding the source of an error is typically to search for a CRITICAL, TRACE, or ERROR message in the log starting at the bottom of the log file.
2013-02-25 21:05:51 17409 CRITICAL cinder [-] Bad or unexpected response from the storage volume backend API: volume group cinder-volumes doesn't exist 2013-02-25 21:05:51 17409 TRACE cinder Traceback (most recent call last): 2013-02-25 21:05:51 17409 TRACE cinder File "/usr/bin/cinder-volume", line 48, in <module> 2013-02-25 21:05:51 17409 TRACE cinder service.wait() 2013-02-25 21:05:51 17409 TRACE cinder File "/usr/lib/python2.7/dist-packages/cinder/service.py", line 422, in wait 2013-02-25 21:05:51 17409 TRACE cinder _launcher.wait() 2013-02-25 21:05:51 17409 TRACE cinder File "/usr/lib/python2.7/dist-packages/cinder/service.py", line 127, in wait 2013-02-25 21:05:51 17409 TRACE cinder service.wait() 2013-02-25 21:05:51 17409 TRACE cinder File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 166, in wait 2013-02-25 21:05:51 17409 TRACE cinder return self._exit_event.wait() 2013-02-25 21:05:51 17409 TRACE cinder File "/usr/lib/python2.7/dist-packages/eventlet/event.py", line 116, in wait 2013-02-25 21:05:51 17409 TRACE cinder return hubs.get_hub().switch() 2013-02-25 21:05:51 17409 TRACE cinder File "/usr/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 177, in switch 2013-02-25 21:05:51 17409 TRACE cinder return self.greenlet.switch() 2013-02-25 21:05:51 17409 TRACE cinder File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 192, in main 2013-02-25 21:05:51 17409 TRACE cinder result = function(*args, **kwargs) 2013-02-25 21:05:51 17409 TRACE cinder File "/usr/lib/python2.7/dist-packages/cinder/service.py", line 88, in run_server 2013-02-25 21:05:51 17409 TRACE cinder server.start() 2013-02-25 21:05:51 17409 TRACE cinder File "/usr/lib/python2.7/dist-packages/cinder/service.py", line 159, in start 2013-02-25 21:05:51 17409 TRACE cinder self.manager.init_host() 2013-02-25 21:05:51 17409 TRACE cinder File "/usr/lib/python2.7/dist-packages/cinder/volume/manager.py", line 95, in init_host 2013-02-25 21:05:51 17409 TRACE cinder self.driver.check_for_setup_error() 2013-02-25 21:05:51 17409 TRACE cinder File "/usr/lib/python2.7/dist-packages/cinder/volume/driver.py", line 116, in check_for_setup_error 2013-02-25 21:05:51 17409 TRACE cinder raise exception.VolumeBackendAPIException(data=exception_message) 2013-02-25 21:05:51 17409 TRACE cinder VolumeBackendAPIException: Bad or unexpected response from the storage volume backend API: volume group cinder-volumes doesn't exist 2013-02-25 21:05:51 17409 TRACE cinder
2013-02-25 20:26:33 6619 ERROR nova.openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 23 seconds.
When an instance fails to behave properly, you will often have to
trace activity associated with that instance across the log files of
various nova-*
services and across both the cloud controller
and compute nodes.
The typical way is to trace the UUID associated with an instance across the service logs.
Consider the following example:
$ nova list +--------------------------------+--------+--------+--------------------------+ | ID | Name | Status | Networks | +--------------------------------+--------+--------+--------------------------+ | fafed8-4a46-413b-b113-f1959ffe | cirros | ACTIVE | novanetwork=192.168.100.3| +--------------------------------------+--------+--------+--------------------+
If there is not enough information in the existing logs, you may
need to add your own custom logging statements to the nova-*
services.
The source files are located in
/usr/lib/python2.7/dist-packages/nova
.
1 2 | <a id = "d6e5861" class = "indexterm" > from nova.openstack.common import log as logging LOG = logging.getLogger(__name__)< / a> |
To add a DEBUG logging statement, you would do:
1 | <a id = "d6e5861" class = "indexterm" >LOG.debug( "This is a custom debugging statement" )< / a> |
1 | <a id = "d6e5861" class = "indexterm" >LOG.debug(_( "Logging statement appears here" ))< / a> |
This formatting is used to support translation of logging messages into different languages using the gettext internationalization library. You don't need to do this for your own custom log messages. However, if you want to contribute the code back to the OpenStack project that includes logging statements, you must surround your log messages with underscores and parentheses.
Aside from connection failures, RabbitMQ log files are generally not useful for debugging OpenStack related issues. Instead, we recommend you use the RabbitMQ web management interface. Enable it on your cloud controller:
# /usr/lib/rabbitmq/bin/rabbitmq-plugins enable rabbitmq_management
# service rabbitmq-server restart
![]() | Note |
---|---|
Ubuntu 12.04 installs RabbitMQ version 2.7.1, which uses port 55672. RabbitMQ versions 3.0 and above use port 15672 instead. You can check which version of RabbitMQ you have running on your local Ubuntu machine by doing: $ dpkg -s rabbitmq-server | grep "Version:" Version: 2.7.1-0ubuntu4 |
Because your cloud is most likely composed of many servers, you must check logs on each of those servers to properly piece an event together. A better solution is to send the logs of all servers to a central location so that they can all be accessed from the same area.
To begin, configure all OpenStack components to log to syslog in addition to their standard log file location. Also configure each component to log to a different syslog facility. This makes it easier to split the logs into individual components on the central server:
1 2 | <a id = "d6e5907" class= "indexterm" >use_syslog=True syslog_log_facility=LOG_LOCAL0< /a > |
glance-api.conf
and
glance-registry.conf
:
1 2 | <a id = "d6e5907" class= "indexterm" >use_syslog=True syslog_log_facility=LOG_LOCAL1< /a > |
1 2 | <a id = "d6e5907" class= "indexterm" >use_syslog=True syslog_log_facility=LOG_LOCAL2< /a > |
1 2 | <a id = "d6e5907" class= "indexterm" >use_syslog=True syslog_log_facility=LOG_LOCAL3< /a > |
By default, Object Storage logs to syslog.
Next, create /etc/rsyslog.d/client.conf
with
the following line:
1 | <a id = "d6e5907" class= "indexterm" >*.* @192.168.1.10< /a > |
Designate a server as the central logging server. The best
practice is to choose a server that is solely dedicated to this purpose.
Create a file called /etc/rsyslog.d/server.conf
with the following contents:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | # Enable UDP $ModLoad imudp # Listen on 192.168.1.10 only $UDPServerAddress 192.168.1.10 # Port 514 $UDPServerRun 514 # Create logging templates for nova $template NovaFile, "/var/log/rsyslog/%HOSTNAME%/nova.log" $template NovaAll, "/var/log/rsyslog/nova.log" # Log everything else to syslog.log $template DynFile, "/var/log/rsyslog/%HOSTNAME%/syslog.log" *.* ?DynFile # Log various openstack components to their own individual file local0.* ?NovaFile local0.* ?NovaAll & ~ |
This example configuration handles the nova service only. It first configures rsyslog to act as a server that runs on port 514. Next, it creates a series of logging templates. Logging templates control where received logs are stored. Using the last example, a nova log from c01.example.com goes to the following locations:
/var/log/rsyslog/c01.example.com/nova.log
/var/log/rsyslog/nova.log
This is useful, as logs from c02.example.com go to:
/var/log/rsyslog/c02.example.com/nova.log
/var/log/rsyslog/nova.log
You have an individual log file for each compute node as well as an aggregated log that contains nova logs from all nodes.
StackTach is a tool created by Rackspace to collect and report the
notifications sent by nova
. Notifications are essentially the
same as logs but can be much more detailed. A good overview of
notifications can be found at System Usage Data.
To enable nova
to send notifications, add the following
to nova.conf
:
1 2 | <a id = "d6e5956" class= "indexterm" >notification_topics=monitor notification_driver=nova.openstack.common.notifier.rabbit_notifier< /a > |
Once nova
is sending notifications, install and
configure StackTach. Since StackTach is relatively new and constantly
changing, installation instructions would quickly become outdated. Please
refer to the StackTach GitHub
repo for instructions as well as a demo video.
There are two types of monitoring: watching for problems and watching usage trends. The former ensures that all services are up and running, creating a functional cloud. The latter involves monitoring resource usage over time in order to make informed decisions about potential bottlenecks and upgrades.
A basic type of alert monitoring is to simply check and see
whether a required process is running. For example, ensure that the nova-api
service is running on the cloud controller:
# ps aux | grep nova-api nova 12786 0.0 0.0 37952 1312 ? Ss Feb11 0:00 su -s /bin/sh -c exec nova-api --config-file=/etc/nova/nova.conf nova nova 12787 0.0 0.1 135764 57400 ? S Feb11 0:01 /usr/bin/python /usr/bin/nova-api --config-file=/etc/nova/nova.conf nova 12792 0.0 0.0 96052 22856 ? S Feb11 0:01 /usr/bin/python /usr/bin/nova-api --config-file=/etc/nova/nova.conf nova 12793 0.0 0.3 290688 115516 ? S Feb11 1:23 /usr/bin/python /usr/bin/nova-api --config-file=/etc/nova/nova.conf nova 12794 0.0 0.2 248636 77068 ? S Feb11 0:04 /usr/bin/python /usr/bin/nova-api --config-file=/etc/nova/nova.conf root 24121 0.0 0.0 11688 912 pts/5 S+ 13:07 0:00 grep nova-api
define service { host_name c01.example.com check_command check_nrpe_1arg!check_nova-compute use generic-service notification_period 24x7 contact_groups sysadmins service_description nova-compute }
Then on the actual compute node, create the following NRPE configuration:
\command[check_nova-compute]=/usr/lib/nagios/plugins/check_procs -c 1: \ -a nova-compute
Nagios checks that at least one nova-compute
service is running at all times.
Resource alerting provides notifications when one or more resources are critically low. While the monitoring thresholds should be tuned to your specific OpenStack environment, monitoring resource usage is not specific to OpenStack at all—any generic type of alert will work fine.
Some of the resources that you want to monitor include:
define service { host_name c01.example.com check_command check_nrpe!check_all_disks!20% 10% use generic-service contact_groups sysadmins service_description Disk }
On the compute node, add the following to your NRPE configuration:
command[check_all_disks]=/usr/lib/nagios/plugins/check_disk -w $ARG1$ -c \ $ARG2$ -e
An integrated OpenStack project (code-named ceilometer) collects metering data and provides alerts for Compute, Storage, and Networking. Data collected by the metering system could be used for billing. Depending on deployment configuration, metered data may be accessible to users based on the deployment configuration. The Telemetry service provides a REST API documented at http://api.openstack.org/api-ref-telemetry.html. You can read more about the project at http://docs.openstack.org/developer/ceilometer.
Resources such as memory, disk, and CPU are generic resources that
all servers (even non-OpenStack servers) have and are important to the
overall health of the server. When dealing with OpenStack specifically,
these resources are important for a second reason: ensuring that enough
are available to launch instances. There are a few ways you can see
OpenStack resource usage. The first is through the nova
command:
# nova usage-list
Next, the nova
database contains three tables that
store usage information.
mysql> select project_id, resource, hard_limit from quotas; +----------------------------------+-----------------------------+------------+ | project_id | resource | hard_limit | +----------------------------------+-----------------------------+------------+ | 628df59f091142399e0689a2696f5baa | metadata_items | 128 | | 628df59f091142399e0689a2696f5baa | injected_file_content_bytes | 10240 | | 628df59f091142399e0689a2696f5baa | injected_files | 5 | | 628df59f091142399e0689a2696f5baa | gigabytes | 1000 | | 628df59f091142399e0689a2696f5baa | ram | 51200 | | 628df59f091142399e0689a2696f5baa | floating_ips | 10 | | 628df59f091142399e0689a2696f5baa | instances | 10 | | 628df59f091142399e0689a2696f5baa | volumes | 10 | | 628df59f091142399e0689a2696f5baa | cores | 20 | +----------------------------------+-----------------------------+------------+
The nova.quota_usages
table keeps track of how many
resources the tenant currently has in use:
mysql> select project_id, resource, in_use from quota_usages where project_id like '628%'; +----------------------------------+--------------+--------+ | project_id | resource | in_use | +----------------------------------+--------------+--------+ | 628df59f091142399e0689a2696f5baa | instances | 1 | | 628df59f091142399e0689a2696f5baa | ram | 512 | | 628df59f091142399e0689a2696f5baa | cores | 1 | | 628df59f091142399e0689a2696f5baa | floating_ips | 1 | | 628df59f091142399e0689a2696f5baa | volumes | 2 | | 628df59f091142399e0689a2696f5baa | gigabytes | 12 | | 628df59f091142399e0689a2696f5baa | images | 1 | +----------------------------------+--------------+--------+
+----------------------------------+------------+-------------+---------------+ | some_tenant | +-----------------------------------+------------+------------+---------------+ | Resource | Used | Limit | | +-----------------------------------+------------+------------+---------------+ | cores | 1 | 20 | 5 % | | floating_ips | 1 | 10 | 10 % | | gigabytes | 12 | 1000 | 1 % | | images | 1 | 4 | 25 % | | injected_file_content_bytes | 0 | 10240 | 0 % | | injected_file_path_bytes | 0 | 255 | 0 % | | injected_files | 0 | 5 | 0 % | | instances | 1 | 10 | 10 % | | key_pairs | 0 | 100 | 0 % | | metadata_items | 0 | 128 | 0 % | | ram | 512 | 51200 | 1 % | | reservation_expire | 0 | 86400 | 0 % | | security_group_rules | 0 | 20 | 0 % | | security_groups | 0 | 10 | 0 % | | volumes | 2 | 10 | 20 % | +-----------------------------------+------------+------------+---------------+
The preceding information was generated by using a custom script that can be found on GitHub.
![]() | Note |
---|---|
This script is specific to a certain OpenStack installation and must be modified to fit your environment. However, the logic should easily be transferable. |
Intelligent alerting can be thought of as a form of continuous
integration for operations. For example, you can easily check to see
whether the Image Service is up and running by ensuring that
the glance-api
and glance-registry
processes are running or by seeing whether glace-api
is
responding on port 9292.
1 2 3 4 5 6 7 8 9 10 11 12 | <a id = "d6e6093" class= "indexterm" > #!/bin/bash # # assumes that reasonable credentials have been stored at # /root/auth . /root/openrc wget https: //launchpad .net /cirros/trunk/0 .3.0/+download/ \ cirros -0.3.0-x86_64-disk.img glance image -create --name='cirros image' --is-public=true --container-format=bare --disk-format=qcow2 < cirros -0.3.0-x8 6_64-disk.img< /a > |
![]() | Note |
---|---|
You must remove the image after each test. Even better, test whether you can successfully delete an image from the Image Service. |
Trending can give you great insight into how your cloud is performing day to day. You can learn, for example, if a busy day was simply a rare occurrence or if you should start adding new compute nodes.
Trending takes a slightly different approach than alerting. While alerting is interested in a binary result (whether a check succeeds or fails), trending records the current state of something at a certain point in time. Once enough points in time have been recorded, you can see how the value has changed over time.
All of the alert types mentioned earlier can also be used for trend reporting. Some other trend examples include:
# grep INFO /var/log/nova/nova-api.log | wc
You can obtain further statistics by looking for the number of successful requests:
# grep " 200 " /var/log/nova/nova-api.log | wc
A tool such as collectd can be used to store this information. While collectd is out of the scope of this book, a good starting point would be to use collectd to store the result as a COUNTER data type. More information can be found in collectd's documentation.
For stable operations, you want to detect failure promptly and determine causes efficiently. With a distributed system, it's even more important to track the right items to meet a service-level target. Learning where these logs are located in the file system or API gives you an advantage. This chapter also showed how to read, interpret, and manipulate information from OpenStack services so that you can monitor effectively.