Belief: compiler == faithful translator

int main()
{
    ...
    return 0;
}
...
movq %rsp, %rbp
xorl %eax, %eax
popq %rbp
ret


Not true if your code invokes undefined behavior

  • Security implications

Example: compiler discards sanity check

char *buf        = ...;
char *buf_end    = ...;
unsigned int off = /* read from untrusted input */;
if (buf + off >= buf_end) return; /* validate off: buf+off too large*/
if (buf + off < buf) return; /* validate off: overflow, buf+off wrapped around */
/* access buf[0..off-1] */
buf buf_end buf + off
  • C spec: pointer overflow is undefined behavior
    • gcc: buf + off cannot overflow (different from hardware!)
    • gcc: if (buf + off < buf) ⇒ if (false)
  • Attack: craft a large off to trigger buffer overflow

Undefined behavior allows such optimizations

Undefined behavior: the spec “imposes no requirements,”
typically for certain arguments to an operation

  • Original goal: emit efficient code
  • Example: C spec imposes no requirements if program accesses an out-of-bounds pointer (doing so is undefined behavior)
  • Compiler: no need to emit bounds checks; assume bug-free code
  •     *p = 42;         /* store 42 to p */
           
        mov $42, (%rdi)  /* no bounds checks */
    
  • Compilers assume a program never invokes undefined behavior
  • if (buf + off < buf) ⇒ if (false) legal: any affected program invoked undefined behavior (by overflowing pointers)

Examples of undefined behavior in C

Meaningless checks from real code: pointer p; signed integer x

Pointer overflow: if (p + 100 < p)
Signed integer overflow: if (x + 100 < x)
Oversized shift: if (!(1 << x))
Null pointer dereference: *p; if (p)
Absolute value overflow: if (abs(x) < 0)

Problem: unstable code confuses programmers

    Unstable code: compilers discard code due to undefined behavior

    ...
    if (p + 100 < p)
      return;
    
    ...
    nop
    nop
    
    • Security checks discarded
    • Weakness amplified
    • Unpredictable system behavior

Contributions

  • A case study of unstable code in real world
  • An algorithm for identifying unstable code
  • A static checker STACK
    • 160+ previously unknown bugs confirmed and fixed
    • Users: companies (e.g., Intel), several open-source projects, ...

Example: broken check in Postgres

Implement 64-bit signed division x/y in SQL

  • Zero check: y == 0
  • Overflow check: y == -1 && x == -263
    • Result not representable in [-263,+263-1]
if (y == -1 && x < 0 && (x / y < 0))   /* -263/-1 < 0? */
  error();

Example: fix check in Postgres

Our proposal:

if (y == -1 && x == INT64_MIN) /* INT64_MIN is -263*/

Developer’s fix:

if (y == -1 && ((-x < 0) == (x < 0)))
  • Still unstable code: time bomb for future compilers
    • “it’s an overflow check so it should check for overflow”
    • “we don’t want the constant INT64_MIN; it’s less portable”
This will create MAJOR SECURITY ISSUES in ALL MANNER OF CODE. I don’t care if your language lawyers tell you gcc is right. . . . FIX THIS! NOW!
a gcc user
bug #30475 - assert(int+100 > int) optimized away
I am sorry that you wrote broken code to begin with . . . GCC is not going to change.
a gcc developer
bug #30475 - assert(int+100 > int) optimized away

Test existing compilers

12 C/C++ compilers
gcc clang
aCC (HP) armcc (ARM)
icc (Intel) msvc (Microsoft)
open64 (AMD) pathcc (PathScale)
suncc (Oracle) xlc (IBM)
ti (TI’s TMS320C6000) windriver (Wind River’s Diab)

Examples of unstable code

Meaningless checks from real code: pointer p; signed integer x

Compilers often discard unstable code

if(p+100<p) if(x+100<x) if(!(1<<x)) ∗p; if(!p) if(abs(x)<0)
gcc-4.8.1 O2 O2 O2 O2
clang-3.3 O1 O1 O1
aCC-6.25 O3
armcc-5.02 O2
icc-14.0.0 O1 O2
msvc-14.0.0 O1
open64-14.0.0 O1 O2 O2
pathcc-1.0.0 O1 O2 O2
suncc-5.12 O3
ti-7.4.2 O0 O0
windriver-5.9.2 O0
xlc-12.1 O3

Compilers become more aggressive over time

if(p+100<p) if(x+100<x) if(!(1<<x)) ∗p; if(!p) if(abs(x)<0)
(1992) gcc-1.42
(2001) gcc-2.95.3 O1
(2006) gcc-3.4.6 O1 O2
(2007) gcc-4.2.1 O0 O2 O2
(2013) gcc-4.8.1 O2 O2 O2 O2
(2009) clang-1.0 O1
(2010) clang-2.8 O1 O1
(2013) clang-3.3 O1 O1 O1

Observations

  • Different compilers behave in different ways
    • Change/upgrade compiler ⇒ broken system
  • Compilers perform many legitimate optimizations
    • Hard to distinguish unstable code from legal optimizations
  • Unstable code eliminated without warnings
  • Need a systematic approach

Our approach: precisely flag unstable code

C/C++ source → LLVM IR → STACK → warnings

% ./configure
% stack-build make        # intercept cc & generate LLVM IR
% poptck                  # run STACK in parallel

STACK provides informative warnings

res = x / y;
if (y == -1 && x < 0 && res < 0)
  return;

The check at line 2 is simplified into false, due to division at line 1

model: |                          # possible optimization
  %cmp3 = icmp slt i64 %res, 0
  -->  false
stack:                            # location of unstable code
  - div.c:2
core:                             # why optimized away
  - div.c:1
    - signed division overflow

Design overview of STACK

  • What’s the difference, compilers vs most programmers?
    • Assumption Δ: programs don’t invoke undefined behavior
  • What can compilers do only with assumption Δ?
    • Optimize away unstable code
  • STACK: mimic a compiler that selectively enables Δ
    • Phase I: optimize w/o Δ
    • Phase II: optimize w/ Δ
    • Unstable code: difference between the two phases

Example of identifying unstable code

res = x / y;
if (y == -1 && x < 0 && res < 0)
  return;
  • Assumption Δ:
    • No division by zero: y ≠ 0
    • No division overflow: y ≠ -1 OR x ≠ INT_MIN
  • STACK can optimize “res < 0” to “falseonly with Δ
    • Phase I: is “res < 0” equivalent to “false” in general? No.
    • Phase II: is “res < 0” equivalent to “false” with Δ? Yes!
  • Report “res < 0” as unstable code

Compute assumption Δ

One must not trigger undefined behavior at any code fragment

  • Reach(e): when to reach and execute code fragment e
  • Undef(e): when to trigger undefined behavior at e
Δ = ∀e: Reach(e) → ¬Undef(e)

Example: compute assumption Δ

One must not trigger undefined behavior at any code fragment

Δ = ∀e: Reach(e) → ¬Undef(e)
res = x / y;
if (y == -1 && x < 0 && res < 0)
  return;
Δ = true → ¬((y == 0) ∨ (x == -1 ∧ y == INT_MIN)) # line 1
  ∧ true → ¬false                                 # line 2
  ∧ ((y == -1) ∧ (x < 0) ∧ (x/y < 0)) → ¬false    # line 3

Δ = ¬((y == 0) ∨ (x == -1 ∧ y == INT_MIN))

Find unstable code by selectively enabling Δ

res = x / y;
if (y == -1 && x < 0 && res < 0)
  return;
(res < 0) ≡ false
w/o Δ?
(res < 0) ≡ false
w/ Δ?
do nothing
(typical dead code)
unstable code
do nothing
(not dead code)
N
N
Y
Y
solver
Phase I
Phase II

Summary of STACK

  • Compute assumption Δ: no undefined behavior
  • Two-phase framework: w/o and w/ Δ
    • Report unstable code from difference
  • Limitations
    • Missing unstable code: Phase II not powerful enough
    • False warnings: Phase I not powerful enough

Implementation of STACK

  • LLVM
  • Boolector solver
  • ~4,000 lines of C++ code
  • Per-function for better scalability
    • Could miss bugs

Evaluation

  • Is STACK useful for finding unstable code?
  • How precise are STACK’s warnings?
  • How much time to analyze a large code base using STACK?
  • How prevalent is unstable code?

STACK finds new bugs

  • Applied STACK to many popular systems
  • Inspected warnings and submitted patches to developers
    • Binutils, Bionic, Dune, e2fsprogs, FFmpeg+Libav, file, FreeType, GMP, GRUB, HiStar, Kerberos, libX11, libarchive, libgcrypt, Linux kernel, Mosh, Mozilla, OpenAFS, OpenSSH, OpenSSL, PHP, plan9port, Postgres, Python, QEMU, Ruby+Rubinius, Sane, uClibc, VLC, Wireshark, Xen, Xpdf
  • Developers accepted most of our patches
    • 160+ new bugs

STACK warnings are precise

  • Kerberos: STACK produced 11 warnings
    • Developers accepted every patch
    • No warnings for fixed code
    • Low false warning rate: 0/11
  • Postgres: STACK produced 68 warnings
    • 9 patches accepted: server crash
    • 29 patches in discussion: developers blamed compilers
    • 26 time bombs: can be optimized away by future compilers
    • 4 false warnings: benign redundant code
    • Low false warning rate: 4/68

STACK scales to large code bases

Intel Core i7-980 3.3 GHz, 6 cores

build time analysis time # files
Kerberos 1 min 2 min 705
Postgres 1 min 11 min 770
Linux kernel 33 min 62 min 14,136

Unstable code is prevalent

  • Applied STACK to all Debian Wheezy packages
    • 8,575 C/C++ packages
    • ∼150 days of CPU time to build and analyze
  • STACK warns in ∼40% of C/C++ packages

BSD/OS X: broken random number seeding

unsigned long junk;              /* left uninitialized on purpose */
struct timeval tv;

gettimeofday(&tv, NULL);
srandom(junk
^ (getpid() << 16) /* process id */ ^ tv.tv_sec ^ tv.tv_usec /* time */
);

CPU emulator: undefined behavior + more

uint64_t mul(uint16_t a, uint16_t b) {
    uint32_t c = a * b;
    return c;
}

Question: what’s the result of mul(0xffff, 0xffff)?

  • (a) 1
  • (b) 0xfffe0001
  • (c) 0xfffffffffffe0001

Integer promotion rule: unsigned shortint for multiplication

Answer: “a * b” overflows int and invokes undefined behavior;
any answer is legal, gcc generates both (b) and (c).

Undefined behavior in PHP

  • High-level languages have fewer undefined constructs
  • But their interpreters are often written in C/C++

Question: what’s the output of the following PHP code?

<?php
  echo (1<<63) * 2
?>
  • (a) 18446744073709551616 or 264
  • (b) 0
  • (c) -4294967296 or -232
  • (d) -1.844674407371E+19

Answer: PHP VM invokes undefined behavior, prints (b), (c), or (d)

How to avoid unstable code

  • Programmers
    • Fix bugs
    • Workaround: disable certain optimizations
  • Compilers & checkers
    • Many bug-finding tools fail to model C spec correctly
    • Use our ideas to generate better warnings
  • Language designers: revise the spec
    • Eliminate undefined behavior? Perf impact?
    • C++ committee now exploring this question (SG12)

Other application

Reflections on trusting trust [Thompson84]

  • Hide backdoors
    • Submit a new feature with unstable code
    • Could easily slip through code review

Summary

  • Compilers optimize away unstable code
    • Subtle bugs
    • Significant security implications
  • Compiler writers: use our techniques to generate better warnings
  • Language designers: trade-off between performance & security
  • Programmers: check your C/C++ code using STACK

http://css.csail.mit.edu/stack/