next up previous contents index
Next: Index Up: SCE: Scalable Cluster Environment Previous: SQMS Batch Scheduling   Contents   Index

Subsections


KSIX Cluster Middleware


Overview

KSIX is a user level software, no kernel modification are required to run KSIX. This feature allows an easy installation and highly portable. KSIX is started by a bootstrapping utility called kxboot and stop with a utility kxhalt. After KSIX has been loaded, application can use KSIX services by enrolling into KSIX environment. This is done by calling a function cpi_init(). KSIX functions are in the form of cpi_class_name (CPI comes from Cluster Programming Interface). The term CPI has been used to separate the ideas of API and implementation in the same fashion as MPI. We view KSIX as one possible implementation of these API and services. KSIX Services and APIs can be classified into classes as follows.


Global Process Space

As the application calls KSIX to spawn a new task, this task will be distributed to a node in the cluster. This node has been selected using KSIX scheduling policy which will be open to modification in the future. KSIX allocates a global process ID and process group to this task. This id is used for task identification. There are 3 modes of task supported.

The reasons that we separate task into classes is to support a future implementation of process migration and high-throughput computing. Scheduler system can employ KSIX to take care of these tasks which help simplify the implementation of batch scheduling system dramatically. KSIX also support the creation of tasks groups, this will simplify the implementation of parallel programming system such as MPI2 since KSIX can help maintain group context which map directly to communicator concept in MPI. Finally, KSIX process control APIs also support the sending of UNIX signal, getting process information etc. The APIs are summarized in table 7.1.


Table 7.1: KSIX Process Management APIs.
API Description
int cpi_spawn( char *task, int flag, char *where, int ntask, int *tid, int *gid, int pclass) Spawning tasks
int cpi_spawnIO( char *task, int flag, char *where, int ntask, int *tid, int *gid, char *output, char *error, int pclass) Spawning tasks with specific location of output
int cpi_waitpid(int pid, int *status, int timeout) Wait for process termination
int cpi_setpmode(int pid, int mode) Change class of process
int cpi_setgmode(int gid, int mode) Change class group of process
int cpi_pkill(int pid, int signal) Send signal to process
int cpi_gkill(int gid, int signal) Send signal to process
int cpi_allps(KxProcStat *result) Report process status of all process
int cpi_userps(KxProcStat *result) Report process status of user process


Naming Services

In KSIX, processes can locate each other through naming service. The naming service APIs are as shown in table 7.2. Naming support allow service client and server to be connected logically and KSIX will help maintain low-level association information. This will simplify many programming tasks when using the cluster as a large server.


Table 7.2: Naming Services APIs
API Description
int cpi_ds_reg(int, char *, struct servinfo, int *) Register server with naming service
int cpi_ds_unreg(int, char *, int, int) Unregister server with naming service
int cpi_ds_getinfo(int, char * ,struct servinfo, struct returninfo **) Query information of server
int cpi_ds_free(struct returninfo *) Free dynamic memory

With this service,a server process can register to a logical service name. Then, client process can bind with server using this logical services name. This allows the service server to be restarted or migrated to other node without any disruption of the service. Using this feature, we has built a feature called Fault-Tolerance RPC as shown in table 7.3. This feature can be used to link between stateless server and client and provides a basic level of high-availability. To test the concept, we has developed a simple textual database. First we start database server using cpi_spawn with migration mode. Client connected to server using this fault tolerant RPC. When the server has been kill, it has been automatic started on other node. This action is totally transparent to user client and the connection of frpc is virtually undisturbed.


Table 7.3: Fault Tolerant RPC APIs
API Description
KxFD *cpi_FRPC_cinit(char *service_name) Initialize cpient
KxFD *cpi_FRPC_sinit(char *service_name) Initialize server
KxFD *cpi_FRPC_accept(KxFD *cpifd) Accept a connection on a socket
void cpi_FRPC_close(KxFD *cpifd) Close a socket descriptor
int cpi_FRPC_send(KxFD *cpifd, void *buf, int size) Send a message
int cpi_FRPC_recv(KxFD *cpifd, void *buf, int size) Receive a message


Event Services

Distributed event notification and delivery is crucial part for the implementation of many high level services including High Availability services. KSIX also support event delivering between processes. Process can bind itself to named event. As the event is invoked or trigged by any process on any node. KSIX will reliably deliver the notification to the registered event owner. The APIs are as shown in table 7.4


Table 7.4: Event Services APIs.
API Description
int cpi_em_reg ( int, void *, struct servinfo, int * ) Register event handler
int cpi_em_unreg ( int, void *, int, int ) Unregister event handler
int cpi_em_raise ( int, char *, struct servinfo, char *, int, struct answer **, int ) Raise event
int cpi_em_read ( int *, char *, int, int *, struct timeval ) Event handler read message from event manager or raw TCP/IP
int cpi_em_write ( int, char *, int ) Event handler write message to event manager
int cpi_em_free ( struct answer *) Free dynamic memory


Ensemble Management

For large cluster, system software, tools and application must be acknowledge about the change in cluster topology. KSIX subsystem that responsible for this task are called ensemble management. KSIX delete mal-function node from the ensemble automatically system it has been detected. KSIX also automatic add a new node to ensemble after the boot up process. The APIs for this class of service are illustrated in table 7.5.


Table 7.5: Ensemble Management APIs.
API Description
int cpi_addhost(char *hostname) Add host to KSIX system
int cpi_delhost(char *hostname) Delete host from KSIX system
int cpi_gethostbyrank(int rank, char *result) Convert rank to hostname
int *cpi_getrankbyhost(char *hostname) Convert hostname to rank
int cpi_getallhost(KxHostInfo *hostinfo) Get array of hostname sort by rank


KSIX Implementation

KSIX architecture consists of 2 layers of software: KSIX Kernel Layer and KSIX Services Layer(See Figure 7.2). KSIX Kernel layer provides basic services such as process control and ensemble management. This layer consists of KSIX daemon named KXD running on each node. Also, there is a system master controller called KSIX partition manager(KXPM) KXPM keeps track of global process ID and status of every processes in KSIX.

Figure 7.1: KSIX Architecture
\begin{figure}
\begin{center}
\epsfig {file=images/kx1,width=2.3in}\\
\vspace{0...
... {file=images/kx2,width=2.3in}\index{KSIX!Architecture}
\end{center}\end{figure}

Currently, KXPM and KXD organized into a form of tree as also shown in Figure 7.2. This allows us an easy support for global heartbeat and collective communication mechanism. KSIX Service layer is on top of KSIX kernel and provides high level KSIX services to KSIX based application. Each service will consist of service server, services and API library that link client of service to the service server itself. Current available services are Event service and Directory Service (DS). Since KSIX operate in pure message passing manner, KSIX services can be used by application on any node in the cluster.


Experimental Evaluation

One effective use of KSIX application is the use of fast process creation to support scalable unix commands. Currently, our SCMS cluster management tool provides a set of shell based parallel unix command with the package. KSIX can be used in stead of rsh to improve the performance especially for a short command. We have conducted the experiments as follows.

The experimental system is our Beowulf Cluster System called PIRUN. The system configuration is as listed below:

The experiments has been conducted using a program that start a set of remote tasks on multiple hosts. This program will be referred to as master task. Master task starts a set of slave tasks on multiple hosts and wait for the reply messages from all of the slave tasks. Once started, each slave task sends a UDP message back to the master task. After received all UDP messages from every task, master tasks terminate. The time from the start of all the slave tasks to when the master task received all the reply has been measured. We compared two versions of the experiment. One using ``rsh'' mechanism to start the remote tasks and one using KSIX process service to start the remote tasks. The experiment has been repeated for five times on 4,8,16,32, and 64 processes on 4,8,16, and 32 nodes. The results shown is the average value obtained from each data point. The purpose of this test is to simulate the working of scalable unix command and measure how fast the scalable unix command can work using KSIX support. The results of the test are as shown in Figure 7.3.

Figure 7.2: Comparison of KSIX and rsh based remote process creation time
\begin{figure}
\begin{center}
\epsfig {file=images/kxspawn.ps,width=3.0in,height=3.0in}\end{center}\end{figure}

Figure 7.3: Speedup of KSIX over rsh based remote process creation time
\begin{figure}
\begin{center}
\epsfig {file=images/speedup.ps,width=3.0in,height=3.0in}\end{center}\end{figure}

The results can be summarized as follows.

The efficiency provided by KSIX is very important for scalable unix command. The reason is that it allows users to use these commands frequently without degrading the system performance.


Implementation Status

Currently all of the functions and APIs mentioned in this paper has already been implemented. KSIX version 1.1 is now available for download at http://smile.cpe.ku.ac.th/software. KSIX has been tested on 76 nodes Pentium III Beowulf Cluster that we currently have.


Conclusion

KSIX is a very improtant part of our full cluster environment called SCE (SMILE Cluster Environment) which consists of SCMS Cluster Management System, KCAP Web based management and Visualization system, and SQMS Queuing System. KSIX is a rapidly evolving project. Many improvements are currently working on such as:

  1. Extending the APIs to meet more needs in cluster development. We are currently designing a simple file services. We do not intended to implement full support for parallel file system since there are many works such as PVFS that can be used. KSIX file service is intend for user to move and manipulate single file object easily and transparently.
  2. Enhancing the scalability. Current implementation still have many centralize part since we focus on exploring API and services and deliver working system to researchers first. The plan is to make KSIX to scale to $10^{4}$ nodes in the future.
  3. Adding the high-availability support. We are extending KSIX so it can detect many performance problems at an early stage. When a malfunction has been detected, KSIX will help notify a correct subsystem that handle the recovery using KSIX event mechanism.
  4. Implementing an extensible service architecture that allows developer to easily plug in a new service to KSIX.

With all these features, we hope to build a very effective open source environment for Beowulf cluster system that allows user to fully exploit the power of Beowulf cluster for their works.


next up previous contents index
Next: Index Up: SCE: Scalable Cluster Environment Previous: SQMS Batch Scheduling   Contents   Index
Sugree Phatanapherom
2001-06-21
I also have a line of punk t-shirts and art t-shirts featuring Bas Couture, artcore designs