An Analysis of Aircraft Assignment in CTAS
6.894, Fall 1998
Team A: Daniel Jackson, Jay Corbett and John Chapin
October 6, 1998
DRAFT VERSION
Overview
This document describes the results of an analysis of the CTAS code to
determine how aircraft are assigned to Route Analyzers (RAs). We chose
to investigate this aspect because much of the code is devoted to it, and
because we have been told by NASA that it has been a source of difficulty,
in both performance and code structuring, and is therefore a good candidate
for redesign.
We confined our analysis to the following aspects of the Communications
Manager (CM):
-
how aircraft records enter the CM;
-
how aircraft records leave the CM (ie, are deleted from its internal data
structures);
-
how aircraft processing is assigned to RAs.
Our concern has been the primary modes of operation, so we have provided
little or no detail regarding:
-
fake and trial plans;
-
entry of flight plans from the user interface.
We considered only aircraft proper, and not blocked slots (which are essentially
reservations made in anticipation of the arrival of aircraft).
Our aim was to produce two kinds of documentation. First, we wanted
to understand how the existing code is organized; we documented this with
traditional call graph representations. Second, we wanted to represent
more fundamental properties of the problem domain and the system, at a
more abstract level, that might serve as a specification for our subsequent
design work. We documented these properties with object models and entity
life histories.
Approach
We analyzed the code with a variety of tools:
-
Lexical. Much of the code reading was done with the help of editor
search facilities and grep. This was particularly useful for semantic properties
that are highly data-dependent. For example, determining where a given
message is sent is easily done by searching for the constant that identifies
the message type.
-
Syntactic. We used Team B's call graph generator and function concordance
to find callers for functions and show the dependency relationships between
files. This was often a useful first step. Its main deficiency is that
is often hard to get a graph that is small enough but still shows functions.
Lackwit addresses this to some extent, by allowing a query to confine the
call graph to functions that handle a particular structure.
-
Semantic. We used Lackwit, a static analyzer based on type inference
to find code that read or wrote values that flowed into and out of relevant
fields of the aircraft record. For example, we determined which functions
were responsible for updating ac->active, ac->ra_index, and ac->landed
by querying these fields of the ac structure passed to the remove_flight_plan
function. Unlike lexical methods, Lackwit finds functions that read or
write values that flow into locations of interest (thus directing us, for
example, to get_next_available_ra, which produces a value for ac->ra_index
but does not set it), and does not depend on mention of a particular identifier
or type (especially uninformative here, since all fields of interest have
primitive types). We also used Lackwit to generate a call graph showing
functions that passed around the ac data structure. These include essentially
all of the functions mentioned below. Lackwit's main deficiency is that
it does not always find code connected through type casts: the function
load_output_buffer, for example, was not found to be relevant to the
ac data structure passed into remove_flight_plan.
Informal Description
Basic Flows
Aircraft data enters the CM primarily by messages from the Input Source
Manager (ISM). Trial plans may be entered at the user interface, but these
are not our concern. Two categories of data about aircraft are received:
track data and flight plan information. Track data gives basic details
of physical state (position, velocity, etc). Flight plan information is
given in three kinds of message:
-
flight plan add: a flight plan is obtained for an aircraft for the first
time. Such a message may be received long in advance of an aircraft's arrival.
-
flight plan amend: an amendment to an existing flight plan. This may include
the fact that an aircraft has been diverted from a center airport to an
airport outside the center.
-
flight plan deletion: indicates that the aircraft is known to have landed,
or perhaps that the flight never occurred as planned.
The information from these messages is stored in the CM's aircraft database
(the "aircraft tree"). The information associated with some messages is
not saved. A flight plan addition for an aircraft that is deemed to be
too far from the center, for example, is not recorded. An amendment for
an aircraft that is not known by the system is therefore not an extraordinary
occurrence. Relevant functions: ism_socket:ism_socket_read, update_aircraft:add_flight_plan,
amend_flight_plan, remove_flight_plan, etc.
If the CM is in 'playback mode', received flight information from a
prerecorded file, the aircraft tree is updated instead from the file. Relevant
functions: update_aircraft:update_tree_with_active_aircraft.
Flight plan amendments and deletions are forwarded immediately to RAs.
Relevant
functions: ism_socket:ism_handle_fp_amd, cm_send:forward_flight_plan_amend,
send_category_amendment, ism_socket:ism_handle_fp_del, update_aircraft:remove_flight_plan,
cm_send:forward_flight_plan_delete.
The aircraft tree is then walked to find new additions; these are forwarded
to RAs in batches whose size is determined by the constant UAC_CM2RA_FPS_COUNT_MAX.
This strategy is designed to avoid overload that would otherwise occur
when a large of new flight plans are received in a short period of time:
for example, when opening the connection to the ISM for the first time,
or when switching from a mode in which departures are not tracked to one
in which they are. Relevant functions: update_aircraft:find_and_send_fps,
cm_send:forward_flight_plan_add.
The tree is then walked to find aircraft that are active (ie, for which
tracks have been received) but which have not been assigned to RAs. Assignment
is done to spread load evenly: each such aircraft is assigned to the RA
with the fewest aircraft already assigned to it. A message is sent to the
chosen RA giving it the identifier of the aircraft, and the tree record
is updated accordingly. It should be noted that information for an aircraft
is broadcast to all RAs, and all RAs essentially hold a copy of the aircraft
tree; this assignment determines 'ownership' of an aircraft by an RA. Relevant
functions: distribute_ac:distribute_aircraft_to_ras, cm_send:handoff_aircraft_to_ra.
Track information is passed to the RAs, in contrast, for the entire
aircraft tree. Relevant functions: update_aircraft:load_output_buffer,
cm_send:send_aircraft.
The tree is walked to find aircraft that have either landed or become
inactive. Landed aircraft are deleted from the tree. Inactive aircraft
are marked as such. Whether an aircraft is inactive appears to be determined
by how recently track data was received (but we have not looked into this
properly). Relevant functions: update_aircraft:weed_out_old_aircraft,
mark_deletable, delete_aircraft, set_inactive.
Periodically, a check is performed to determine whether aircraft are
uniformly distributed amongst RAs. Unevenness arises because RA processes
may fail, and have their load passed on to other RAs; new RAs may be added
at any time. If the spread between the most and least loaded RAs exceeds
some constant, the aircraft are redistributed in round-robin fashion amongst
RAs. Relevant functions: distribute_ac:check_and_redistribute_aircraft,
cm_send:handoff_aircraft_to_ra.
A periodic check for landed aircraft is also made. An aircraft that
has been within the landing zone for some airport-specific amount of time
is deemed to have landed; a message is broadcast to all RAs, and the aircraft
tree record is updated accordingly. Relevant functions: update_aircraft:check_for_landed_aircraft,
cm_send:send_aircraft_landed_to_all.
We have not yet analyzed the handling of RA sockets. Roughly, information
about the state of RA's is maintained in a global array. In each execution
of the main loop, the status of all socket connections is examined; if
a socket appears to be malfunctioning, the connection is closed and the
state is recorded in the array. When a new RA process is added, it replaces
an array entry for a closed socket. There appears to be no code that marks
the connection to an RA as temporarily degraded. Sockets seem to be associated
with sectors: why? (see cm_struct:Socket_st). Relevant functions:
cm_sock_util:check_sockets, check_socket, read_ra_socket, close_socket,
connect_to_ra.
Distribution of Aircraft to RAs
Each active aircraft is assigned to a route analyzer (RA). This assignment
is stored in the ra_index field of the aircraft record in the
communications manager (CM) and set as follows:
-
When an aircraft record is initialized in response to a call to add_flight_plan,
the ra_index is initialized to NOT_SET (i.e., no assignment
is made at this time).
-
On every iteration of the main loop, update_aircraft calls distribute_aircraft_to_ras,
which assigns any active unassigned aircraft to RAs. The RA with the fewest
number of assigned aircraft is chosen to receive the next unassigned aircraft.
The CM keeps a count of how many aircraft are assigned to each RA for this
purpose.
-
On every iteration of the main loop, update_aircraft calls check_and_redistribute_aircraft_to_ras,
which redistributes the aircraft among the RAs if the distribution has
become too unbalanced. In particular, if the difference between the maximum
number of aircraft assigned to any RA and the minimum number of aircraft
assigned to any RA is greater than some threshold, then a redistribution
occurs. A redistribution assigns all aircraft to RAs in a round-robin fashion
(the function get_index_of_next_available_ra keeps a static counter
that cycles through the RAs).
-
If an aircraft is added to the system from the PGUI via function read_pgui,
then it is immediately assigned to the next available RA in round-robin
order (using get_index_of_next_available_ra).
-
If an RA shuts down, the code that handles the closing of the socket connection
will set the ra_index fields of all aircraft owned by that RA
to NOT_SET (the distribute_aircraft_to_ras function will
reassign these aircraft).
-
If an aircraft is marked as inactive by set_inactive (which occurs,
for example, when the flight plan is removed), the aircraft is unassigned
-- its ra_index field is set to NOT_SET.
Whenever the assignment of an aircraft to an RA is modified, messages are
sent to the RAs informing them of this change. If the assignment is changed
from one RA to another, both of these RAs are notified. If the assignment
is changed from unassigned to some RA, then all RAs are notified.
Call Graph
We have constructed a call graph that shows the calling relationships amongst
the principal functions responsible for the behaviour described above.
Postscript, GIF.
Analysis
In this section, we summarize our understanding of this aspect of the CM's
behaviour in a pair of models: an object model and an entity life history
model.
The entity life history model designates some significant events in
the life of aircraft, and specifies constraints on their ordering.
The object model describes the state of an aircraft (at the grossest
level) and the relationships between aircraft and RAs, and how these change
when events occur.
The models are given here in textual form. Diagrammatic versions will
be added later.
Entity Life History Model
The following events in the life of an aircraft are significant as far
as allocation of aircraft to RA's is concerned:
-
fp_add: A flight plan is received for the aircraft for the first time.
-
fp_amend: The aircraft's flight plan is amended. This may involve a category
amendment, in which an aircraft is reclassified (eg, from arrival to departure).
-
fp_delete: The flight plan is deleted; this is a signal that the aircraft
is no longer of interest to CTAS. A flight plan deletion may be caused
by the aircraft landing, or by it leaving the center (?), or may reflect
the fact that the aircraft had not actually taken off and the flight was
cancelled (?).
-
become_active: A radar track is received for the first time, or is received
after a period during which tracks have not been received.
-
become_inactive: Radar tracks have not been recently received, and the
aircraft is deemed not to merit analysis by CTAS.
-
enter_landing_zone: Radar tracks indicate that the aircraft has entered
a landing zone at a center airport.
-
presume_land: The aircraft has been within the landing zone of an airport
for an airport-specific period of time, on account of which it has been
deemed to have landed.
-
assign: The aircraft's processing is assigned to an RA.
-
unassign: The assignment of the aircraft's processing to an RA is cancelled.
The following entity life histories each constitute a role of the aircraft
entity. We have not modelled the RA entity; if we did, we would show, for
example, that an unassign event always follows an assign event to the same
RA.
fp_add (fp_amend | become_active | become_inactive | enter_landing_zone
| presume_land)* [fp_delete]
// most events must follow fp_add and cannot follow fp_delete
[(become_active become_inactive)* [become_active]]
// might never become active, or might become active once, or might
have periods of activeness
[become_active assign (unassign assign)* [unassign]]
// cannot assign until active
((unassign assign) | fp_amend | become_inactive)*
// no fp_amend or become_inactive between unassign and assign
[enter_landing_zone presume_landed]
// only presume_landed after enter_landing_zone
[become_active enter_landing_zone]
// only enter_landing_zone if once active
Questions to resolve:
Can fp_amend occur after enter_landing_zone?
Can become_inactive occur after enter_landing_zone but before presume_landed?
Object Model
There are two domains:
-
AIRCRAFT: An aircraft is a vehicle tracked by CTAS. An aircraft might not
be physically within the center, since aircraft information is entered
when flight plan information is received, perhaps long in advance of the
aircraft's arrival.
-
RA: A route analyzer is a process that executes at runtime whose main purpose
is to computer (estimated and scheduled) arrival times. We shall regard
RA's as identified by their process ids, so an RA cannot be killed and
restarted: the second process is a distinct RA.
AIRCRAFT are divided amongst the following subsets:
-
ACTIVE_AC: aircraft for which recent radar data exists and whose trajectories
are being computed by RAs.
-
INACTIVE_AC: defined to be all aircraft that are not active.
-
ENTERED_LANDING_ZONE_AC: aircraft whose radar tracks indicate that they
have entered a landing zone at an airport within the center.
-
LANDED_AC: aircraft that have been deemed to have landed on account of
having been within the landing zone for more than some airport-specific
period.
-
ASSIGNED_AC: aircraft whose processing has been assigned to RA's.
The formal object model (below) makes the following assertions:
-
landed aircraft have entered the landing zone;
-
each assigned aircraft is assigned to exactly one RA;
-
landed aircraft are inactive.
We also specify, for each event, the effect it has on the abstract state.
The specifications are unremarkable, except insofar as they specify event
ordering constraints in their (implicit) preconditions. For example, the
specification of presume_landed asserts that the aircraft must already
have entered the landing zone.
model AircraftAssignment {
// Uses Alloy textual notation
// Diagram is equivalent to domain and state parts
// Invariants were given as additional annotations to diagrams in lecture
// Operations have not been discussed yet
domain {AIRCRAFT, RA}
state {
partition ACTIVE_AC, INACTIVE_AC : AIRCRAFT
ENTERED_LANDING_ZONE_AC : AIRCRAFT
LANDED_AC : ENTERED_LANDING_ZONE_AC
ASSIGNED_AC : AIRCRAFT
assigned_to : ASSIGNED_AC -> RA !
}
def INACTIVE_AC {
INACTIVE_AC = AIRCRAFT - ACTIVE_AC
}
inv {
LANDED_AC in INACTIVE_AC ???
}
op fp_add (ac: AIRCRAFT' !) {
INACTIVE_AC := INACTIVE_AC + ac
}
op fp_delete (ac: AIRCRAFT !) {
AIRCRAFT := AIRCRAFT - ac
}
op fp_amend (ac: AIRCRAFT !) {
}
op assign (ac: AIRCRAFT !) {
ac in ACTIVE_AC - ASSIGNED_AC
ASSIGNED_AC := ASSIGNED_AC + ac
}
op unassign (ac: AIRCRAFT !) {
ac in ASSIGNED_AC
ASSIGNED_AC := ASSIGNED_AC - ac
}
op enter_landing_zone (ac: AIRCRAFT !) {
ac not in ASSIGNED_AC
ASSIGNED_AC := ASSIGNED_AC + ac
}
op presume_landed (ac: ENTERED_LANDING_ZONE
!) {
LANDED := LANDED + ac
}
op become_active (ac: INACTIVE_AC !) {
ACTIVE_AC := ACTIVE_AC + ac
}
op become_inactive (ac: ACTIVE_AC !) {
INACTIVE_AC := INACTIVE_AC + ac
}
}
Questions: can become_active occur for an active aircraft?
Distinguish abstract operation from execution of function?
Abstraction Functions
The abstract state given by the object model is related to the values of
aircraft record fields as follows:
Abstract state |
Concrete state |
aircraft ac is active, inactive otherwise |
ac->active is TRUE |
aircraft ac has landed |
ac->landed is TRUE |
aircraft ac is assigned |
ac->ra_index is not NOT_SET |
aircraft ac is assigned to RA ra |
Ra [ac->ra_index] = ra |
A rough assignment of operations to functions in the code can be made.
The difficulty is that we would like to associate operations with leaves
of the call graph, but in most cases, the leaf function performs only one
concrete aspect of the operation (eg, setting the aircraft record field
or sending a message). It is therefore necessary to find a function higher
in the call graph whose execution maintains all code invariants (see below).
There are no doubt invariants we have missed; the functions listed here
are those that appear to maintain the invariants discussed below.
Function |
Operations performed |
update_aircraft:add_flight_plan* |
fp_add |
ism_socket:ism_handle_fp_amd |
fp_amend |
ism_socket:ism_handle_fp_del |
fp_delete |
update_aircraft:set_inactive |
become_inactive, unassign |
update_aircraft:update_inactive_ac_specific_data |
become_active, unassign (for which RA, ac?) |
update_aircraft:update_active_ac_specific_data |
become_inactive ? |
cm_send:handoff_aircraft_to_ra |
assign, unassign |
cm_send:new_category_set_ra_owner |
assign ? |
update_aircraft:check_entry_to_landed_zone |
enter_landing_zone |
update_aircraft:check_for_landed_aircraft |
presume_landed |
*Broadcast of flight plan addition and updating of aircraft tree and
performed in distinct phases.
Code Invariants
A number of invariants are maintained by the code. Most significantly,
the state of the aircraft tree is synchronized with the state of the RAs:
for example, an RA should believe that it has ownership of an aircraft
when that aircraft's entry in the tree indicates ownership by that RA.
We assumed that the messages received by the RA update its internal state
appropriately. The relevant messages are:
-
UNSET_RA_OWNERSHIP: ownership of a given aircraft by a given RA is cancelled.
-
SET_RA_OWNERSHIP: an RA is assigned ownership of a given aircraft.
-
AIRCRAFT_INACTIVE: an RA is informed that an aircraft is inactive.
It seems that the updating of the aircraft tree and the sending of messages
is related as expected, with the following exceptions:
-
The function cm_send:new_category_set_ra_owner sends a SET_RA_OWNERSHIP
message to the RA given by ac->ra_index, but does not mutate the aircraft
record. This seems to assume that the RA specified in the aircraft record
does not in fact believe that it owns the aircraft. The case arises when
a flight plan amendment has been received that switches the category of
an aircraft from one that is not of interest to one that is. This issue
should be investigated. We have not yet found code that prevents flight
plans in categories not of interest from being distributed to RAs. (There
are other anomalies in this function. Its header asserts that it should
be called only from read_ra and read_pgui. It may also set ra_index to
zero, which seems to have no rationale: this is not the NOT_SET value,
but the first index in the array.)
-
The function update_aircraft:check_entry_to_landed_zone broadcasts
AIRCRAFT_INACTIVE but does not set ac->inactive.
-
The function update_aircraft:update_active_ac_specific_data broadcasts
an AIRCRAFT_INACTIVE message but does not update the aircraft tree record
(by setting ac->active to false). This seems to arise in the case in which
an aircraft has passed its freeze horizon.
-
Explicit initializations of the aircraft tree record fields appear to be
missing. The function update_aircraft:initialize_aircraft sets the
ac->ra_index field. The fields ac->active and ac->landed do not appear
to be explicitly initialized anywhere.
The balancing of aircraft load amongst RAs is aided by a global array distribute_ac:number_of_ac_in_each_ra.
The expected invariant that this does indeed track assignments correctly
appears to be maintained, although the function
update_aircraft:load_and_send_one_aircraft sends ownership messages
but does not update the array. This is perhaps because the function is
only called for trial flight plans entered at the user interface, and maybe
there are few enough of these for them to be safely ignored.
Anomalies
Functionality is not systematically allocated amongst the files ism_socket,
update_aircraft and cm_send. For example, cm_send:forward_flight_plan_amend
is called directly by ism_socket:ism_handle_fp_amend, but cm_send:forward_flight_plan_delete
is called by update_aircraft:remove_flight_plan, which in turn is
called by ism_socket:ism_handle_fp_del. Flight plan addition is
by necessity treated quite differently, since broadcast and tree updating
are in separate phases.
The code associated with category amendments appears to be a late addition.
The function cm_send:send_category_amendment is documented incorrectly:
its header claims that it is only called by read_ra and read_pgui,
but it is in fact called by ism_socket:ism_handle_fp_amd. Moreover,
it sets ac->ra_index to zero, thus assigning the aircraft to the first
RA, without apparent justification.
The array number_of_ac_in_each_ra is not updated by update_aircraft:
load_and_send_one_aircraft, violating the invariant that the array
gives the number of aircraft assigned to RAs. Is this because the aircraft
represents a trial plan with only a transient assignment?
The function cm_send:handoff_aircraft_to_ra appears to unnecessarily
broadcast an UNSET_RA_OWNERSHIP message to RAs when the ra_index of the
aircraft is not set. This presumes that an RA might believe it has ownership
of an aircraft even though the aircraft tree record does not indicate it.
Why? Can the CM die and be restarted while RAs remain running? If this
were so, we would have expected more fault tolerance code of this sort
throughout the CM.
The function update_aircraft:update_active_ac_specific_data broadcasts
an AIRCRAFT_INACTIVE message but does not update the aircraft tree record
(by setting ac->active to false). This seems to arise in the case in which
an aircraft has passed its freeze horizon.
Followup Issues
Look into RA socket handling: include dead RAs in object model?
Categories of flight plans: code for changing category of desired flights.
What exactly does active mean? Look at RA code?
Look into unbuffered/buffered ism track input.
Lackwit search on ac->landed to check operation table.
Freeze horizon and active aircraft? See update_aircraft:update_inactive_ac_specific_data.
Add a freeze event to model?
Aircraft ignored if outside center: subsequent handling.
Flight plan amendments after entering landing zone?