Previous: , Up: The Language Compiled   [Contents][Index]


3.6 Suggestions for writing fast code

The following suggestions may help you to write well-optimizable and fast code for the hobbit-scm combination. Roughly speaking, the main points are:

Here come the details.

  1. Immediate arithmetics (ie using small, up to 30 bits integers) is much faster than generic (reals and bignums) arithmetics. If you have to use generic arithmetic in your program, then try to use special immediate arithmetics operations %=, %<=, %+, %*, … for speed-critical parts of the program whenever possible.

    Also, if you use bitwise logical operations, try to use the immediate-integer-only versions

    logand logior logxor lognot logsleft logsright
    

    and not logical:logand or ash, for example.

  2. Due to its inner stack-based architecture, the generic (not escape-only) continuations are very slow in SCM. Thus they are also slow in compiled code. Try to avoid continuations (calls to the procedure call-with-current-continuation and calls to the continuations it produces) in speed-critical parts.
  3. In speed-critical parts of your program try to avoid using procedures which are redefined or defined by set!:
    (set! bar +)
    (set! f (lambda (x) (if (zero? x) 1 (* x (f (- x 1))))))
    

    anywhere in the compiled program. Avoid using compiler flags (see Hobbit Options):

    (define compile-all-proc-redefined t)
    (define compile-new-proc-redefined t)
    
  4. Do not use complicated higher-order procedures in speed-critical parts. By complicated we mean not clonable, where clonability is defined in the following way (Note Bene: the primitives ‘map’ and ‘for-each’ are considered clonable and do not inflict a speed penalty).

    A higher-order procedure (HOP for short) is defined as a procedure with some of its formal arguments occuring in the procedure body in a function position, that is, as a first element of a list. Such an argument is called a higher-order argument.

    A HOP ‘bar’ is clonable iff it satisfies the following four conditions:

    1. bar’ is defined as
      (define bar (lambda …))
      

      or

      (define (bar …) …)
      

      on top level and bar is not redefined anywhere.

    2. the name ‘bar’ occurs inside the body of bar only in a function position and not inside an internal lambda-term.
    3. Let f be a higher-order argument of bar. Any occurrence of f in bar has one of the following two forms:
      • f occurs in a function position,
      • f is passed as an argument to bar and in the call it occurs in the same position as in the argument list.
    4. Let f be a higher-order argument of bar. f does not occur inside a lambda-term occurring in bar.

      Examples:

      If ‘member-if’ is defined on top level and is not redefined anywhere, then ‘member-if’ is a clonable HOP:

      (define (member-if fn lst)
         (if (fn (car lst))
             lst
             (member-if fn (cdr lst)) ))
      

      member-if-not is not a clonable HOP (fn occurs in a lambdaterm):

      (define (member-if-not fn lst)
        (member (lambda (x) (not (fn x))) lst) )
      

      show-f is not a clonable HOP (fn occurs in a non-function position in (display fn)):

      (define (show-f fn x)
          (set! x (fn x))
          (display fn)
          x)
      
    5. In speed-critical parts avoid using procedures which return procedures.

      Eg, a procedure

      (define plus
        (lambda (x)
           (lambda (y) (+ y x)) ))
      

      returns a procedure.

    6. A generalisation of the previous case 5:

      In speed-critical parts avoid using lambda-terms except in non-set! function definitions like

      (define foo (lambda …)),
      (let ((x 1) (f (lambda …))) …)
      (let* ((x 1) (f (lambda …))) …)
      (let name ((x 1) (f (lambda …))) …)
      (letrec ((f (lambda …)) (g (lambda …))) …)
      

      or as arguments to clonable HOP-s or primitives map and for-each, like

      (let ((x 0)) (map (lambda (y) (set! x (+ 1 x)) (cons x y)) list))
      (member-if (lambda (x) (< x 0)) list)
      

      where member-if is a clonable HOP.

      Also, avoid using variables with a procedural value anywhere except in a function position (first element of a list) or as an argument to a clonable HOP, map or for-each.

      Lambda-terms conforming to the current point are said to be liftable.

      Examples:

      (define (bar x) (let ((f car)) (f (f x))))
      

      has ‘car’ in a non-function and non-HOP-argument position in (f car), thus it is slower than

      (define (bar x) (let ((f 1)) (car (car x))))
      

      Similarly,

      (define (bar y z w)
        (let ((f (lambda (x) (+ x y))))
           (set! w f)
           (cons (f (car z))
                 (map f z) )))
      

      has ‘f’ occurring in a non-function position in (set! w f), thus the lambda-term (lambda (x) (+ x y)) is not liftable and the upper ‘bar’ is thus slower than the following equivalent ‘bar’ with a liftable inner lambda-term:

      (define (bar y z w)
        (let ((f (lambda (x) (+ x y))))
           (set! w 0)
           (cons (f (car z))
                 (map f z) )))
      

      Using a procedure bar defined as

      (define bar (let ((x 1)) (lambda (y) (set! x y) (+ x y))))
      

      is slower than using a procedure bar defined as

      (define *bar-x* 1)
      (define bar (lambda (y) (set! *bar-x* y) (+ *bar-x* y)))
      

      since the former definition contains a non-liftable lambda-term.

    7. Try to minimize the amount of consing in the speed-critical program fragments, that is, a number of applications of cons, list, map, quasiquote (‘) and vector->list during the time program is running. ‘cons’ (called also by ‘list’, ‘map’ and ‘quasiquote’) is translated into a C call to an internal cons procedure of the SCM interpreter. Excessive consing also means that the garbage collection happens more often. Do (verbose 3) to see the amount of time used by garbage collection while your program is running.

      Try to minimize the amount of creating new vectors, strings and symbols in the speed-critical program frgaments, that is, a number of applications of make-vector, vector, list->vector, make-string, string-append, *->string, string->symbol. Creating such objects takes typically much more time than consing.

    8. The Scheme iteration construction ‘do’ is compiled directly into the C iteration construction ‘for’. We can expect that the C compiler has some knowledge about ‘for’ in the optimization stage, thus it is probably faster to use ‘do’ for iteration than non-mutual tailrecursion (which is recognized by hobbit as such and is compiled into a jump to a beginning of a procedure) and certainly much faster than non-tail-recursion or mutual tailrecursion (the latter is not recognized by hobbit as such).
    9. Declare small nonrecursive programs which do not contain let-s or lambdaterms as being inlinable.

      Declare globally defined variables which are never set! or redefined and whose value is a small integer, character or a boolean, as being inlinable. See Hobbit Options.

    10. If possible, declare vectors as being stable. See Speeding up vectors. This gives a minor improvement in speed.
    11. If possible, declare critical global vars as being uninterned. See Speeding up and hiding certain global variables. This gives a minor improvement in speed. Declare the global variables which are never set! and have an (unchanged) numeric or boolean value as being inlined. See Hobbit Options.

    In addition, take the following into account:

    • When using the C compiler to compile the C code output by hobbit, always use strong optimizations (eg. ‘cc -xO3’ for cc on Sun, ‘gcc -O2’ or ‘gcc -O3’ for gcc). Hobbit does not attempt to do optimizations of the kind we anticipate from the C compiler, therefore it often makes a big difference if the C compiler is run with a strong optimization flag or not.
    • hobbit does not give proper tailrecursion behaviour for mutual tailrecursion (foo calling bar, bar calling foo tailrecursively).

      Hobbit guarantees proper tailrecursive behaviour for non-mutual tailrecursion (foo calling foo tailrecursively), provided that foo is not redefined anywhere and that foo is not a local function which occurs also in a non-function and non-clonable-HOP-argument position (i.e. cases 3 and 6 above).


Previous: Force and Delay, Up: The Language Compiled   [Contents][Index]