[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: One Man's Search for Smaller Codebases

It's always nice to see thoughtful comparisons of C/C++ (especially VC++/COM) with high-level languages.
As someone who has written a fair bit of both Common Lisp and C++/ATL/COM, I suggest that the real difference
in codebase size/maintainability comes with larger systems.  In C++, I find myself copying and modifying code because
it is too difficult to abstract common operations properly.  These are cases where I use Lisp macros extensively, especially in
network apps where you repeatedly do things like set up and process requests.  Combine this with the expressiveness of
CLOS, and I think you really have a chance of limiting redundancy.

Unfortunately not many software engineers, myself included, take the time to do the recoding experiment you describe.  
I know there have been some industrial-level comparisons of C++ and Lisp on large projects, can anyone speak to those?
Certainly some of the Symbolics guys on this list could speak to pros and cons of maintaining large Lisp systems.

What does this have to do with lightweight languages?  Good Lisp programs have varying levels of abstraction, where
high-level logic can be implemented with Scheme-like simplicity, while lower-level code is highly optimized with type
declarations and more obscure syntax.  I tend to carry this principle over to "mainstream languages" by implementing
high-level logic in socially acceptable scripting languages, and the optimized bits packaged as C++ components.  I feel like
macros and a flexible object system in my lightweight language would help me develop my core system logic as a concise,
maintainable code base.

Thanks again for sharing your experience!


Sent by:        owner-ll1-discuss@ai.mit.edu

To:        <LL1-Discuss@ai.mit.edu>
Subject:        One Man's Search for Smaller Codebases


Someone just wrote and told me to post this note on your thread, saying it
would be appropriate. He also said I would find interesting company here...
so far I do see some interesting names, Steele, McKay, Felliesen, and
several others... So here goes...

I posted this on the OCaml thread last night, and on the comp.lang.lisp
thread too... Understand that part of the motivation for this study is that
I have 300 KLOC of working codebase to maintain against bit-rot, and induced
bit-rot via OS changes beneath me... I'm eager to find a way to minimize
this maintenance activity, do it correctly, and robustly.


I just finished my experiment to reduce the size of a fielded application by
recoding in either of Lisp or OCaml. I had early indications that, aside
from pure ease of programming in these HLL's, the overall code base would be
drastically reduced (5x to 6x). That is certainly true if you count all the
source code needed to produce the application, but an honest, impartial,
comparison of the lines I actually had to write, of non-reusable,
application specific code shows somewhat disappointing results on this basis

The application is a system network server that performs recursive prefix
mappings of file pathnames, including environment variable substitutions.
This is a variation on the system provided by the Sprite experimental OS
developed at UCB by John Ousterhout, et. al. in the late 1980's and early

The existing version was coded in M$ VC++ making heavy use of STL. It is a
COM/OLE process server based on M$ ATL. All three versions retain a machine
generated ATL wrapper code for this COM/OLE behavior -- I only needed to
write a few lines of IDL to produce the basic skeleton, and all three
versions use identical stuff here...

For the application specific coding, the scores are:

Existing App:
C/C++ = 1106 LOC

Lisp Version:
C/C++ = 284 LOC, Lisp = 798 LOC  --> Total = 1082 LOC

OCaml Version:
C/C++ = 284 LOC, Lisp = 58 LOC, OCaml = 453 LOC --> Total = 888

These LOC counts do *not* include blank lines and comment only lines.

On the basis of code-base size reduction, these results are nearly a tie.

But on the basis of ease of programming, I have to award Lisp first,
followed by OCaml, and distantly trailed by C/C++. The reasons for this are:

1. Lisp is a huge langauge with nearly everything you need already built in.
But it produces very bulky DLL's -- on the order of 15 MBytes.

2. OCaml is equally terse as Lisp, or even slightly better, but needs a fair
amount of additional support routines written, to cover the application
needs. Some of this is in C/C++ (very little) but most has to do with
providing things like unwind-protect, generalized string handling,
generalized list operations. It produces very fast runtime code (not needed
here) and quite reasonably sized DLL's -- about 300 KBytes (50x smaller than

3. C/C++, making heavy use of classes and STL is nearly unreadable, took a
long time to program, and is frightening to revisit after some time away
from it (1 year or more since original writing). C/C++ retains the
capability to utilize Unicode (FWIW -- I don't really need it), but it was
written with some embedded bugs that I found only when I was able to remain
at the abstract levels permitted by HOL's.

Both the Lisp and OCaml versions were written in the course of 2-3 hours.
Writing the C/C++ version took the better part of 1 week. Prior to that I
had written experimental versions in Lisp and had more than a year of
playing with the system to get an understanding of the needed algorithms.

I will say that both Lisp and OCaml allowed me to spot some errors in the
C/C++ implementation, fix those errors, and add some extra capability (about
20 LOC in both Lisp and OCaml for the extra stuff).  I estimate the time
needed to go back and refamiliarize myself with STL and the internal
architecture of the existing application -- in order to fix the bugs I
discovered and add the additional capabilities -- would be several days.

I find it remarkable that OCaml has a slight edge on Lisp for terseness of
expression. OCaml is a highly expressive syntax and you can say quite a lot
in a few keystrokes. Lisp tends to be more wordy, use longer identifiers,
and the code is quite a bit sparser for semantic content over a given number
of LOC.

This is as close as I can come to providing an honest, impartial, comparison
of these languages for the purpose of rewriting existing code to be more
maintainable, robust, and correct. I definitely think the effort is
worthwhile, but not entirely for the reasons I had originally anticipated.


- David McClain, Sr. Scientist, Raytheon Systems Co., Tucson, AZ