Hampi

A Solver for String Constraints



Hampi is a solver for string constraints. Hampi solves constraints that contain regular expressions and (bounded-size) context-free grammars.

Example (simplified SQL injection)
Example (bounded CFG intersection)
Example (bounded CFG / regexp intersection)
Input language
Formal grammar
Standalone mode
Server mode
System requirements

Example (SQL injection)

The following is an example Hampi constraint. The example illustrates how Hampi can be used in creating SQL injection attack strings. The constraint expresses the following: "find a 12-character string v such that SELECT msg FROM messages WHERE topicid='v' is a valid SQL statement that contains a tautology in the OR condition".
//string variable of fixed size 12 characters
var v : 12;
//simple SQL context-free grammar
cfg SqlSmall := "SELECT " (Letter)+ " FROM " (Letter)+ " WHERE " Cond;
cfg Cond := Val "=" Val | Cond " OR " Cond;
cfg Val := (Letter)+ | "'" (LetterOrDigit)* "'" | (Digit)+;
cfg LetterOrDigit := Letter | Digit;
cfg Letter := ['a'-'z'] ;
cfg Digit := ['0'-'9'] ; //the SELECT query
val q := concat("SELECT msg FROM messages WHERE topicid='", v, "'");

//constraints
assert q in SqlSmall;        //q is valid SQL
assert q contains "OR '1'='1'"; //q contains tautology

NOTES:
  • The var keyword declares the string variable. In the example, variable v has size 12 characters. The variable is the unknown for which Hampi finds a value.
  • The cfg keywords introduce a context-free grammar for a fragment of the SQL grammar of SELECT statements.
  • The val keyword introduces temporary variables. In the example, q is declared as a concatenation of constant strings and the string variable v.
  • The assert keyword defines regular-language constraints. In the example, the first constraint is a regular-language constraint because the size of strings on the left-hand side is bounded and any bounded language is regular. The Hampi solver finds values of the string variable that satisfy all constraints.
  • The reg keyword introduces regular languages (see examples below).
  • Comments are C/Java style: /*block comment*/, //line comment

Example (bounded CFG intersection)

A Hampi input can declare more than one CFG, so it's easy to compute a string from a bounded intersection of two CFGs, e.g.,
var v : 1 .. 10;
cfg E1 := "()" | "(" E1 ")" | E1 E1 ;
cfg E2 := | "(" E2 ")" ; //first production is an epsilon-production
assert v in E1 ;
assert v in E2 ;


Example (bounded CFG / regexp intersection)

Here is how you can find a string from the intersection of a bounded context-free language and a regular language;
var v : 5 .. 10;
cfg E := "()" | "(" E ")" | E E ;
reg R := star("()");
assert v in E ;
assert v in R ;

Input language

A Hampi input (a Hampi constraint) consists of a variable declaration followed by a series of context-free grammar declarations, regular language declarations, temporary variables and constraints. See the formal grammar for the full description.

String variable

Each Hampi input must declare one string variable, of bounded size. The goal of Hampi is to find a string that, when assigned to the string variable, satisfies all the constraints. An example variable declaration (of size between 2 and 18) is:
var v : 2 .. 18;

Context-free grammar declarations

Context-free grammars are defined using a notation similar to Extended Backus-Naur Form. Terminal symbols are quoted. Each nonterminal must be defined. Declarations may contain repetitions (+ and *), and options (?). Character ranges declare sets of terminals, e.g., ['a'-'z'], and are syntactic shortcuts.

Regular-expression declarations

Regular languages can be defined by regular expressions, e.g.,
reg Bstar := star("b")                                    // b*
reg EvenA := star(concat(Bstar, "a", Bstar, "a", Bstar)); //(b*ab*ab*)*

Regular languages can also be defined by bounding context-free languages using the fix operator, e.g.,
cfg E := "()" | "(" E ")" | E E ; //all balanced parentheses
reg R = fix(E, 10); //10 balanced parentheses

Temporary-variable declarations

Temporary variables are shortcuts that help to express constraints on expressions that are concatenations of the string variable and constants, e.g.,
var v : 2;
val S := concat("(((", v, ")))"); //v with fixed prefix and suffix
Now, you can put constraints on S, e.g.,
assert S contains ")(";

Assertions

Assertions specify membership of variables in context-free and regular languages, e.g.,
assert v in R; //where R is a defined regular language 
assert v not contains "()()";
assert v in E; //where E is a defined context-free language

Running Hampi in standalone mode


Put the above constraint in a file a.hmp and run
./hampi a.hmp

Hampi outputs a result like this:
{VAR(v)=b' OR '1'='1}

which means that Hampi found the value of v that satisfies the constraints.


Running Hampi in server mode

Running Hampi in server mode improves solving time for small constraints.

Run the server, (4444 is an available port on localhost),
./hampi_server.sh 4444

Run a query,
./hampi_client.sh 4444 a.hmp

Shut down the server,
./hampi_client.sh 4444 SHUTDOWN

System Requirements

Hampi has been tested on Linux (32 and 64bit). Hampi is a Java program but uses STP, which works only on Linux. With some effort, it should be possible to make Hampi work on other platforms. Hampi requires Java 6.