Taint tracking

Note: These lecture notes were slightly modified from the ones posted on the 6.858 course website from 2014.

Android security policies

What problem does the paper try to solve?

What does Android malware actually do?

TaintDroid overview

TaintDroid tracks sensitive information as it propagates through the system.

Examples:

int lat = gps.getLatitude();
                // The lat variable is now
                // tainted!

Dalvik VM is a register-based machine,
so taint assignment happens during the
execution of Dalvik opcodes [see Table 1].

   move_op dst src          // dst receives src's taint
   binary_op dst src0 src1  // dst receives union of src0
                            // and src1's taint

Interesting special case, arrays:

   char c = //. . . get c somehow.
   char uppercase[] = ['A', 'B', 'C', . . .];
   char upperC = uppercase[c];
                     // upperC's taint is the
                     // union of c and uppercase's
                     // taint.

How are taint flags represented in memory?

Example:

                 .
                 .
        |        .         |
        +------------------+
        |     local0       |
        +------------------+
        | local0 taint tag |
        +------------------+
        |     local1       |
        +------------------+
        | local1 taint tag |
        +------------------+
                 .
                 .
                 .

    _TaintDroid_ uses a similar approach
    for class fields, object fields,
    and arrays -- put the taint tag
    next to the associated data.

So, given all of this, the basic idea in TaintDroid is simple: taint sensitive data as it flows through the system, and raise an alarm if that data tries to leave via the network!

The authors find various ways that apps misbehave:

TaintDroid's rules for information flow might lead to counterintuitive/interesting results. Imagine that an application implements its own linked list class.

    class ListNode{
        Object data;
        ListNode next;
    }

Suppose that the application assigns tainted values to the "data" field. If we calculate the length of the list, is the length value tainted?

Adding to a linked list involves:

  1. Allocating a ListNode
  2. Assigning to the data field
  3. Patching up next pointers

Note that Step 3 doesn't involve tainted data! So, "next" pointers are tainted, meaning that counting the number of elements in the list would not generate a tainted value for length.

What are the performance overheads of TaintDroid?

Questions and answers

Q: Why not track taint at the level of x86 instructions or ARM instructions?

A: It's too expensive, and there are too many false positives.

Q: Taint tracking seems expensive---can't we just examine inputs and outputs to look for values that are known to be sensitive?

A: This might work as a heuristic, but it's easy for an adversary to get around it.

Implicit flows

As described, taint tracking cannot detect implicit flows.

Implicit flows happen when a tainted value affects another variable without directly assigning to that variable.

     if (imei > 42) {
         x = 0;
     } else {
         x = 1;
     }

Instead of assigning to x, we could try to leak information about the IMEI over the network!

Implicit flows often arise because of tainted values affecting control flow.

Can try to catch implicit flows by assigning a taint tag to the PC, updating it with taint of branch test, and assigning PC taint to values inside if-else clauses, but this can lead to a lot of false positives.

Example:

     if (imei > 42) {
         x = 0;
     } else {
         x = 0;
     }

     // The taint tracker thinks that
     // x should be tagged with imei's
     // taint, but there is no information
     // flow!

Applications

Interesting application of taint tracking: keeping track of data copies.

Tightlip

TaintDroid detects leaks of sensitive data, but requires language support for the Java VM -- the VM must implement taint tags. Can we track sensitive information leaks without support from a managed runtime? What if we want to detect leaks in legacy C or C++ applications?

Decentralized information flow control

TaintDroid and Tightlip assume no assistance from the developer ...but what if developers were willng to explicitly add taint labels to their code?

  int {Alice --> Bob} x;  // Means that x is controlled
                          // by the principal Alice, who
                          // allows that data to be seen
                          // by Bob.

Input channels: The read values get the label of the channel.

Output channels: Labels on the channel must match a label on the value being written.