6.863 Project: Policy Parser

Results

We have fully implemented the Policy Parser which parses constrained natural language sentences into AIR policies and deployed it as a Firefox Sidebar extension. The following is a list of example sentences for which our system is able to successfully generate AIR policies:

MIT may use prox card data for criminal investigations
MIT students can not transfer prox cards to another student
A person may access data on the private mit domain if the person has permission for that data
A service provider may not use phone records to deny service to a customer
TSA can transfer data to the FBI if the data is associated with terrorism
Anyone can access data in the possession of the state of Massachusetts
Police may search people's homes if the police have people's permission

This represents only a small subset of all sentences that our system is able to handle. For example, since our underlying CFG uses spelling change automata, it can successfully parse all sentences with variations that are recognized by these automata - for example, adding "s" to pluralize a noun or recognizing that 1st-person and 3rd-person verbs with the same root. In addition, the system (in particular the grammar and the lexicon) has been designed such that reasonable variations of a sentence give the same values for the features of interest. For example, the following two sentences result in the generation of the same AIR policies in RDF:

MIT can give records to FBI to investigate crimes.
MIT can give FBI records for criminal investigations.

Our results indicate that the system does a good job of accurately extracting semantic information from the natural language policies. It is fairly flexible in handling different forms of the input sentence. We were able to parse a variety of natural language policies into AIR policies and use them with the reasoner to check complaince of data logs. Also, while we only consider a constrained set of sentences and do not handle many interesting phenomenon (e.g. wh movement, relating pronouns to appropriate nouns in the sentence, quantifiers etc.), we believe that the system can successfully be extended to address these shortcomings. We did not attempt these more ambitious tasks due to the limited time frame of the project, and our system is only a first step towards the eventual goal of automatically parsing a wide variety of natural language policies.

Three components of the system are central to the performance (generality, accuracy etc.) of the Parser . The first is the FCFG (available here), which determines the structure of sentences that the Parser is able to parse. As discussed in the Implementation page, the current FCFG recognizes the sentences of the form

subject [mod] action object [secondary_object] [for/to purpose] [if condition (and condition)*]

The FCFG also determines the structure of each of these components and how they are assigned feature values. The FCFG can be extended to parse additional policy templates.

The second vital component is the lexicon and the spelling change rules, both of which together provide the terminals used in the FCFG. As mentioned before, this allows one to parse a wider variety of sentences by simply modifying the lexicon and/or the spelling change rules while using the same grammar. (Note: The lexicon does not, at the moment, recognize any words with capital words: all recognized words use lower case letters only)

The third component that determines the capacity of the Policy Parser is the set of domain ontologies (combined into a single file), which act as vocabularies during RDF generation. Therefore, even if the system is able to parse (NLTK parser) and interpret (Policy Interpreter) an input policy, it cannot generate a corresponding RDF if the constituents of the policy cannot be mapped to corresponding components of an ontology. However, once all ontologies relevant to a domain (e.g. MIT prox card data access) have been defined, anyone can very easily make new rules or modify existing ones, and it is from this fact that makes such a system valuable.

All these three parameters together define the limits of what our system can do, and they can be suitably extended to make the system more flexible. For example, the FCFG can be modified to handle a wider variety of sentence structures; the lexicon can be expanded; and new ontologies can be added to generate more AIR policies. These possible future extensions are discussed more thoroughly in the Future Directions section.

Policy Parser

6.863 Spring 2008 Project

Results