Chapter 3. Designing for Cloud Controllers and Cloud Management

OpenStack is designed to be massively horizontally scalable, which allows all services to be distributed widely. However, to simplify this guide, we have decided to discuss services of a more central nature, using the concept of a cloud controller. A cloud controller is just a conceptual simplification. In the real world, you design an architecture for your cloud controller that enables high availability so that if any node fails, another can take over the required tasks. In reality, cloud controller tasks are spread out across more than a single node.

The cloud controller provides the central management system for OpenStack deployments. Typically, the cloud controller manages authentication and sends messaging to all the systems through a message queue.

For many deployments, the cloud controller is a single node. However, to have high availability, you have to take a few considerations into account, which we'll cover in this chapter.

The cloud controller manages the following services for the cloud:

For our example, the cloud controller has a collection of nova-* components that represent the global state of the cloud; talks to services such as authentication; maintains information about the cloud in a database; communicates to all compute nodes and storage workers through a queue; and provides API access. Each service running on a designated cloud controller may be broken out into separate nodes for scalability or availability.

As another example, you could use pairs of servers for a collective cloud controller—one active, one standby—for redundant nodes providing a given set of related services, such as:

Now that you see the myriad designs for controlling your cloud, read more about the further considerations to help with your design decisions.

 Hardware Considerations

A cloud controller's hardware can be the same as a compute node, though you may want to further specify based on the size and type of cloud that you run.

It's also possible to use virtual machines for all or some of the services that the cloud controller manages, such as the message queuing. In this guide, we assume that all services are running directly on the cloud controller.

Table 3.1, “Cloud controller hardware sizing considerations” contains common considerations to review when sizing hardware for the cloud controller design.

Table 3.1. Cloud controller hardware sizing considerations
Consideration Ramification

How many instances will run at once?

Size your database server accordingly, and scale out beyond one cloud controller if many instances will report status at the same time and scheduling where a new instance starts up needs computing power.

How many compute nodes will run at once?

Ensure that your messaging queue handles requests successfully and size accordingly.

How many users will access the API?

If many users will make multiple requests, make sure that the CPU load for the cloud controller can handle it.

How many users will access the dashboard versus the REST API directly?

The dashboard makes many requests, even more than the API access, so add even more CPU if your dashboard is the main interface for your users.

How many nova-api services do you run at once for your cloud?

You need to size the controller with a core per service.

How long does a single instance run?

Starting instances and deleting instances is demanding on the compute node but also demanding on the controller node because of all the API queries and scheduling needs.

Does your authentication system also verify externally?

External systems such as LDAP or Active Directory require network connectivity between the cloud controller and an external authentication system. Also ensure that the cloud controller has the CPU power to keep up with requests.

 Separation of Services

While our example contains all central services in a single location, it is possible and indeed often a good idea to separate services onto different physical servers. Table 3.2, “Deployment scenarios” is a list of deployment scenarios we've seen and their justifications.

Table 3.2. Deployment scenarios
Scenario Justification

Run glance-* servers on the swift-proxy server.

This deployment felt that the spare I/O on the Object Storage proxy server was sufficient and that the Image Delivery portion of glance benefited from being on physical hardware and having good connectivity to the Object Storage backend it was using.

Run a central dedicated database server.

This deployment used a central dedicated server to provide the databases for all services. This approach simplified operations by isolating database server updates and allowed for the simple creation of slave database servers for failover.

Run one VM per service.

This deployment ran central services on a set of servers running KVM. A dedicated VM was created for each service (nova-scheduler, rabbitmq, database, etc). This assisted the deployment with scaling because administrators could tune the resources given to each virtual machine based on the load it received (something that was not well understood during installation).

Use an external load balancer.

This deployment had an expensive hardware load balancer in its organization. It ran multiple nova-api and swift-proxy servers on different physical servers and used the load balancer to switch between them.

One choice that always comes up is whether to virtualize. Some services, such as nova-compute, swift-proxy and swift-object servers, should not be virtualized. However, control servers can often be happily virtualized—the performance penalty can usually be offset by simply running more of the service.

 Conductor Services

In the previous version of OpenStack, all nova-compute services required direct access to the database hosted on the cloud controller. This was problematic for two reasons: security and performance. With regard to security, if a compute node is compromised, the attacker inherently has access to the database. With regard to performance, nova-compute calls to the database are single-threaded and blocking. This creates a performance bottleneck because database requests are fulfilled serially rather than in parallel.

The conductor service resolves both of these issues by acting as a proxy for the nova-compute service. Now, instead of nova-compute directly accessing the database, it contacts the nova-conductor service, and nova-conductor accesses the database on nova-compute's behalf. Since nova-compute no longer has direct access to the database, the security issue is resolved. Additionally, nova-conductor is a nonblocking service, so requests from all compute nodes are fulfilled in parallel.

[Note]Note

If you are using nova-network and multi-host networking in your cloud environment, nova-compute still requires direct access to the database.

The nova-conductor service is horizontally scalable. To make nova-conductor highly available and fault tolerant, just launch more instances of the nova-conductor process, either on the same server or across multiple servers.

 Extensions

The API Specifications define the core actions, capabilities, and mediatypes of the OpenStack API. A client can always depend on the availability of this core API, and implementers are always required to support it in its entirety. Requiring strict adherence to the core API allows clients to rely upon a minimal level of functionality when interacting with multiple implementations of the same API.

The OpenStack Compute API is extensible. An extension adds capabilities to an API beyond those defined in the core. The introduction of new features, MIME types, actions, states, headers, parameters, and resources can all be accomplished by means of extensions to the core API. This allows the introduction of new features in the API without requiring a version change and allows the introduction of vendor-specific niche functionality.

 Scheduling

The scheduling services are responsible for determining the compute or storage node where a virtual machine or block storage volume should be created. The scheduling services receive creation requests for these resources from the message queue and then begin the process of determining the appropriate node where the resource should reside. This process is done by applying a series of user-configurable filters against the available collection of nodes.

There are currently two schedulers: nova-scheduler for virtual machines and cinder-scheduler for block storage volumes. Both schedulers are able to scale horizontally, so for high-availability purposes, or for very large or high-schedule-frequency installations, you should consider running multiple instances of each scheduler. The schedulers all listen to the shared message queue, so no special load balancing is required.

 Dashboard

The OpenStack dashboard (horizon) provides a web-based user interface to the various OpenStack components. The dashboard includes an end-user area for users to manage their virtual infrastructure and an admin area for cloud operators to manage the OpenStack environment as a whole.

The dashboard is implemented as a Python web application that normally runs in Apache httpd. Therefore, you may treat it the same as any other web application, provided it can reach the API servers (including their admin endpoints) over the network.

Questions? Discuss on ask.openstack.org
Found an error? Report a bug against this page


loading table of contents...