As the application calls KSIX to spawn a new task, this task will be distributed to a node in the cluster. This node has been selected using KSIX scheduling policy which will be open to modification in the future. KSIX allocates a global process ID and process group to this task. This id is used for task identification. There are 3 modes of task supported.
The reasons that we separate task into classes is to support a future implementation of process migration and high-throughput computing. Scheduler system can employ KSIX to take care of these tasks which help simplify the implementation of batch scheduling system dramatically. KSIX also support the creation of tasks groups, this will simplify the implementation of parallel programming system such as MPI2 since KSIX can help maintain group context which map directly to communicator concept in MPI. Finally, KSIX process control APIs also support the sending of UNIX signal, getting process information etc. The APIs are summarized in table 7.1.
API | Description |
int cpi_spawn( char *task, int flag, char *where, int ntask, int *tid, int *gid, int pclass) | Spawning tasks |
int cpi_spawnIO( char *task, int flag, char *where, int ntask, int *tid, int *gid, char *output, char *error, int pclass) | Spawning tasks with specific location of output |
int cpi_waitpid(int pid, int *status, int timeout) | Wait for process termination |
int cpi_setpmode(int pid, int mode) | Change class of process |
int cpi_setgmode(int gid, int mode) | Change class group of process |
int cpi_pkill(int pid, int signal) | Send signal to process |
int cpi_gkill(int gid, int signal) | Send signal to process |
int cpi_allps(KxProcStat *result) | Report process status of all process |
int cpi_userps(KxProcStat *result) | Report process status of user process |
In KSIX, processes can locate each other through naming service. The naming service APIs are as shown in table 7.2. Naming support allow service client and server to be connected logically and KSIX will help maintain low-level association information. This will simplify many programming tasks when using the cluster as a large server.
API | Description |
int cpi_ds_reg(int, char *, struct servinfo, int *) | Register server with naming service |
int cpi_ds_unreg(int, char *, int, int) | Unregister server with naming service |
int cpi_ds_getinfo(int, char * ,struct servinfo, struct returninfo **) | Query information of server |
int cpi_ds_free(struct returninfo *) | Free dynamic memory |
With this service,a server process can register to a logical service name. Then, client process can bind with server using this logical services name. This allows the service server to be restarted or migrated to other node without any disruption of the service. Using this feature, we has built a feature called Fault-Tolerance RPC as shown in table 7.3. This feature can be used to link between stateless server and client and provides a basic level of high-availability. To test the concept, we has developed a simple textual database. First we start database server using cpi_spawn with migration mode. Client connected to server using this fault tolerant RPC. When the server has been kill, it has been automatic started on other node. This action is totally transparent to user client and the connection of frpc is virtually undisturbed.
API | Description |
KxFD *cpi_FRPC_cinit(char *service_name) | Initialize cpient |
KxFD *cpi_FRPC_sinit(char *service_name) | Initialize server |
KxFD *cpi_FRPC_accept(KxFD *cpifd) | Accept a connection on a socket |
void cpi_FRPC_close(KxFD *cpifd) | Close a socket descriptor |
int cpi_FRPC_send(KxFD *cpifd, void *buf, int size) | Send a message |
int cpi_FRPC_recv(KxFD *cpifd, void *buf, int size) | Receive a message |
Distributed event notification and delivery is crucial part for the
implementation of many high level services including High Availability
services. KSIX also support event delivering between processes.
Process can bind itself to named event. As the event is invoked or trigged
by any process on any node. KSIX will reliably deliver the notification to
the registered event owner. The APIs are as shown in table 7.4
API | Description |
int cpi_em_reg ( int, void *, struct servinfo, int * ) | Register event handler |
int cpi_em_unreg ( int, void *, int, int ) | Unregister event handler |
int cpi_em_raise ( int, char *, struct servinfo, char *, int, struct answer **, int ) | Raise event |
int cpi_em_read ( int *, char *, int, int *, struct timeval ) | Event handler read message from event manager or raw TCP/IP |
int cpi_em_write ( int, char *, int ) | Event handler write message to event manager |
int cpi_em_free ( struct answer *) | Free dynamic memory |
For large cluster, system software, tools and application must be acknowledge about the change in cluster topology. KSIX subsystem that responsible for this task are called ensemble management. KSIX delete mal-function node from the ensemble automatically system it has been detected. KSIX also automatic add a new node to ensemble after the boot up process. The APIs for this class of service are illustrated in table 7.5.
API | Description |
int cpi_addhost(char *hostname) | Add host to KSIX system |
int cpi_delhost(char *hostname) | Delete host from KSIX system |
int cpi_gethostbyrank(int rank, char *result) | Convert rank to hostname |
int *cpi_getrankbyhost(char *hostname) | Convert hostname to rank |
int cpi_getallhost(KxHostInfo *hostinfo) | Get array of hostname sort by rank |
KSIX architecture consists of 2 layers of software: KSIX Kernel Layer and KSIX Services Layer(See Figure 7.2). KSIX Kernel layer provides basic services such as process control and ensemble management. This layer consists of KSIX daemon named KXD running on each node. Also, there is a system master controller called KSIX partition manager(KXPM) KXPM keeps track of global process ID and status of every processes in KSIX.
Currently, KXPM and KXD organized into a form of tree as also shown in Figure 7.2. This allows us an easy support for global heartbeat and collective communication mechanism. KSIX Service layer is on top of KSIX kernel and provides high level KSIX services to KSIX based application. Each service will consist of service server, services and API library that link client of service to the service server itself. Current available services are Event service and Directory Service (DS). Since KSIX operate in pure message passing manner, KSIX services can be used by application on any node in the cluster.
One effective use of KSIX application is the use of fast process creation to support scalable unix commands. Currently, our SCMS cluster management tool provides a set of shell based parallel unix command with the package. KSIX can be used in stead of rsh to improve the performance especially for a short command. We have conducted the experiments as follows.
The experimental system is our Beowulf Cluster System called PIRUN. The system configuration is as listed below:
The experiments has been conducted using a program that start a set of remote tasks on multiple hosts. This program will be referred to as master task. Master task starts a set of slave tasks on multiple hosts and wait for the reply messages from all of the slave tasks. Once started, each slave task sends a UDP message back to the master task. After received all UDP messages from every task, master tasks terminate. The time from the start of all the slave tasks to when the master task received all the reply has been measured. We compared two versions of the experiment. One using ``rsh'' mechanism to start the remote tasks and one using KSIX process service to start the remote tasks. The experiment has been repeated for five times on 4,8,16,32, and 64 processes on 4,8,16, and 32 nodes. The results shown is the average value obtained from each data point. The purpose of this test is to simulate the working of scalable unix command and measure how fast the scalable unix command can work using KSIX support. The results of the test are as shown in Figure 7.3.
The results can be summarized as follows.
The efficiency provided by KSIX is very important for scalable unix command. The reason is that it allows users to use these commands frequently without degrading the system performance.
Currently all of the functions and APIs mentioned in this paper has already been implemented. KSIX version 1.1 is now available for download at http://smile.cpe.ku.ac.th/software. KSIX has been tested on 76 nodes Pentium III Beowulf Cluster that we currently have.
With all these features, we hope to build a very effective open source environment for Beowulf cluster system that allows user to fully exploit the power of Beowulf cluster for their works.