alien goo
a lightweight c embedding facility
jonathan bachrach

introduction

a big challenge in dynamic language design is how best to interface to c code. problems stem from type, syntax, and semantic mismatches between languages. many solutions have been proposed through the years. we first briefly present four c extensions facilities for python. we then introduce a simple yet powerful c embedding facility for goo which permits direct inlining of c inside of goo code escaping back into goo as needed.

previous work

swig is a language-neutral semi-automatic header parsing mechanism that produces separate c files that get compiled and linked into your application. users are allowed to tailor type mappings according to their application.

ctypes is a python facility that allows one to import dlls, to enumerate their exported symbols, and to manually specify their type interfaces with a special python-based c type system.

pyinline permits the definition of c code fragments as python objects to be executed later. c snippets are specified as python strings.

finally, pyrex allows the intermixing of c and python code implicitly requesting c code by defining python variables with c types and using them within expressions. in this way, loops can be completely performed in low overhead c if all loop related variables are defined with c types.

the following presents a pros/cons evaluation of these four facilities:

name pros cons
swig semi-automatic heavyweight
ctypes loads dlls another c type system
pyinline lighter weight cumbersome and no python escapes
pyrex even lighter weight whole other python dialect

in summary, either the solutions are too heavyweight or too complicated, where the weight is measured in terms of space and speed as well as amount of extra glue code and the complication is measured in terms of amount extra mechanism and nuisance to use. pyrex is the most similar in spirit in terms of mixing c and python code, but it introduces a whole other c type system and makes compilation dependent on the types of variables.

the alien gooway

what we think you really want to do is quite simply be able to inline c code directly into goo, escaping back into goo when necessary, relying on c for its type system instead of having to mirror it in goo. this approach is simpler than pyrex in that the execution proceeds either entirely in c or goo and the distinction is syntactically obvious. inlining c turns out to be quite straightforward and appropriate for goo because goo relies heavily on a c backend (including during dynamic evaluation).

c statements

as a running example, consider the construction of a simple goo layer on top of a 2d subset of opengl. we start by defining a goo method for a simplified version of initializing the graphics system:

(dm gl-setup ()
  #{ glutInitWindowSize( 640, 480 ); })

where the #{ ... } form escapes to c executing a series of c statements and evaluates to false.

goo escapes

next i define a drawing function as follows:

(dm gl-vertex (x|<flo> y|<flo>) 
  #{ glVertex3f($x, $y); })

where the $ operator escapes back into goo evaluating the subsequent sexpression (ala unquote in a quasiquote expression). unfortunately, in this case, the x and y variables contain goo format floats and must first be exported (i.e., unboxed) to c format as follows:

(dm gl-vertex (x|<flo> y|<flo>)
  #{ glVertex3f($(to-c x), $(to-c y)); })

where to-c unboxes the float. to-c methods are defined for all the basic goo value types (e.g., <log>, <chr>, <int>, <flo> and <str>). furthermore, users can define their own to-c methods.

unfortunately, embedding direct calls to to-c is verbose and thus we introduce a shorthand '@s' which is equivalent to $(to-c s). now gl-vertex becomes:

(dm gl-vertex (x|<flo> y|<flo>)
  #{ glVertex3f(@x, @y); })

in more advanced situations, the $ operator can also be used to assign and create pointers to goo variables (as shown below in the macros' section).

c expressions

often times users will need to get values back from c in a functional style. for this we introduce the c expression #ex{ ... } which is the same as the c statement form except that its value is the value of the enclosed c expression and where the x modifier in the #ex( ... ) specifies the way to interpret the expression as a goo value. valid modifiers are i for <int>, f for <flo>, s for <str>, c for <chr>, b for <log>, and l for <loc>. for example, one can grab an integer c macro constant as follows:

(dv $gl-line-loop #ei{ GL_LINE_LOOP })

top level c

top level c definitions can be defined at goo top level with #{ }. for example, a callback can be defined as follows:

#{ int gl_idle(int x) { $(gl-idle); } }

this can also be used to introduce structure definitions, typedefs, includes, etc.

macros

now suppose that one wants to define a goo layer to a large and regular c library. for example, consider writing a bignum module using the gnu multiprecision library (aka gmp). we show how alien goo can be used in conjunction with macros, greatly amplifying its power. we start by defining the bignum class in goo and goo to gmp and gmp to goo conversion functions in embedded c at top-level. from there we would start defining each of the goo arithmetic methods. let's start with addition:

(dm + (x|<bignum> y|<bignum> => <int>)  
  (let ((res 0))
    #{ mpz_t z;
       mpz_init_zero(z);
       mpz_add(z, bignum_to_mpz($x), bignum_to_mpz($y));
       $res = mpz_to_goo(z); }
    res))

now given that we're going to be defining a large number of these methods, it makes sense to invent some macro machinery. first, let's make returning values easier in c's statement oriented world:

(ds with-returning (,res ,@body)
  `(let ((,res #f)) ,@body ,res))

making the original look as follows:

(dm + (x|<bignum> y|<bignum> => <int>)  
  (with-returning res
    #{ mpz_t z;
       mpz_init_zero(z);
       mpz_add(z, bignum_to_mpz($x), bignum_to_mpz($y));
       $res = mpz_to_goo(z); }))

it turns out that a large number of the bignum methods are going to have similar form starting with gmp variable initialization, goo specific body, and then conversion back to goo. we can thus make a body definining macro:

(ds with-gmp-returning (,z ,body)
  (let ((res (gensym))
        (zc  (to-str z)))
    `(with-returning ,res
       #{ mpz_t $,zc; 
          mpz_init_zero(z);
          $,body
          $,res = mpz_to_goo($,zc); })))

notice how quasiquote's unquote works inside an embedded c form: it follows a goo escape turning back on goo evaluation. the unquoted goo expression becomes more embedded c if it evaluates to a string otherwise it is evaluated at runtime in goo. now our original addition method becomes:

(dm + (x|<bignum> y|<bignum> => <int>)  
  (with-gmp-returning z
    #{  mpz_add(z, bignum_to_mpz($x), bignum_to_mpz($y)); }))    

it turns out that a large number of bignum methods are going to have an even more constrained form that only differs between arithmetic functions in the gmp function to be called. we can thus finally define a method defining macro as follows:

(ds def-b-b (,name ,c-fun)
  `(dm ,name (x|<bignum> y|<bignum> => <int>)
     (with-gmp-returning z
       #{ $,c-fun(z, bignum_to_mpz($x), bignum_to_mpz($y)); })))

the addition method can now be defined more declaratively and succinctly as:

(def-b-b + "mpz_add")

from here, we can define the other method defining macros for the mixed type inputs (e.g., fixnum x bignum).

conclusions

we have introduced a lightweight c embedding facility for goo making simple c call outs easy, c interfaces simple and more complicated interfaces manageable. we have shown how alien goo can be used in conjunction with macros making it considerably more powerful. this facility can be used to define interfaces in a declarative manner, while avoiding excess interface layers, combining calls, type conversions and goo specific operations. all of these features makes goo/x an extremely lightweight and powerful c interface mechanism. the major limitations with goo/x are that it relies on a c compiler and some amount of compilation to c, does not do error checking on the embedded c code, and still requires a certain amount of manual intervention.

acknowledgements

james knight provided the original inspiration and has provided helpful feedback along the way.