http://people.csail.mit.edu/jaffer/DBManifesto

Database Manifesto

Most nontrivial programs contain databases: Makefiles, configure scripts, file backup, calendars, editors, source revision control, CAD systems, display managers, menu GUIs, games, parsers, debuggers, profilers, and even error reporting are all rife with databases. Coding databases is such a common activity in programming that many may not be aware of how often they do it.

A database often starts as a dispatch in a program. The author, perhaps because of the need to make the dispatch configurable, the need for correlating dispatch in other routines, or because of changes or growth, devises a data structure to contain the information, a routine for interpreting that data structure, and perhaps routines for augmenting and modifying the stored data. The dispatch must be converted into this form and tested.

The programmer may need to devise an interactive program for enabling easy examination and modification of the information contained in this database. Often, in an attempt to foster modularity and avoid delays in release, intermediate file formats for the database information are devised. It often turns out that users prefer modifying these intermediate files with a text editor to using the interactive program in order to do operations (such as global changes) not forseen by the program's author.

In order to address this need, the conscientious software engineer may even provide a scripting language to allow users to make repetitive database changes. Users will grumble that they need to read a large manual and learn yet another programming language (even if it almost has language "xyz" syntax) in order to do simple configuration.

All of these facilities need to be designed, coded, debugged, documented, and supported; often causing what was very simple in concept to become a major developement project.

This view of databases just outlined is somewhat the reverse of the view of the originators of the Relational Model of database abstraction. The relational model was devised to unify and allow interoperation of large multi-user databases running on diverse platforms. A fairly general purpose "Comprehensive Language" for database manipulations is mandated (but not specified) as part of the relational model for databases.

One aspect of the Relational Model of some importance is that the "Comprehensive Language" must be expressible in some form which can be stored in the database. This frees the programmer from having to make programs data-driven in order to use a database.

This package includes as one of its basic supported types Scheme expressions. This type allows expressions as defined by the Scheme standards to be stored in the database. Using slib:eval retrieved expressions can be evaluated (in the top-level environment). Scheme's lambda facilitates closure of environments, modularity, etc. so that procedures (which could not be stored directly in most databases) can still be effectively retrieved. Since slib:eval evaluates expressions in the top-level environment, built-in and user defined procedures can be easily accessed by name.

This package's purpose is to standardize (through a common interface) database creation and usage in Scheme programs. The relational model's provision for inclusion of language expressions as data as well as the description (in tables, of course) of all of its tables assures that relational databases are powerful enough to assume the roles currently played by thousands of ad-hoc routines and data formats.

Such standardization to a relational-like model brings many benefits:

Tables, fields, domains, and types can be dealt with by name in programs.
The underlying database implementation can be changed (for performance or other reasons) by changing a single line of code.
The formats of tables can be easily extended or changed without altering code.
Consistency checks are specified as part of the table descriptions. Changes in checks need only occur in one place.
All the configuration information which the developer wishes to group together is easily grouped, without needing to change programs aware of only some of these tables.
Generalized report generators, interactive entry programs, and other database utilities can be part of a shared library. The burden of adding configurability to a program is greatly reduced.
Scheme is the "comprehensive language" for these databases. Scripting for configuration no longer needs to be in a separate language with additional documentation.
Scheme's latent types mesh well with the strict typing and logical requirements of the relational model.
Portable formats allow easy interchange of data. The included table descriptions help prevent misinterpretation of format.

Unresolved Issues for the SLIB Relational Model

Although `rdms.scm' is not large, I found it very difficult to write (six rewrites). I am not aware of any other examples of a generalized relational system (although there is little new in CS). I left out several aspects of the Relational model in order to simplify the job. The major features lacking (which might be addressed portably) are views, transaction boundaries, and protection.

Protection needs a model for specifying priveledges. Given how operations are accessed from handles it should not be difficult to restrict table accesses to those allowed for that user.

The system catalog has a field called view-procedure. This should allow a purely functional implementation of views. This will work but is unsatisfying for views resulting from a selection (subset of rows); for whole table operations it will not be possible to reduce the number of keys scanned over when the selection is specified only by an opaque procedure.

Transaction boundaries present the most intriguing area. Transaction boundaries are actually a feature of the "Comprehensive Language" of the Relational database and not of the database. Scheme would seem to provide the opportunity for an extremely clean semantics for transaction boundaries since the builtin procedures with side effects are small in number and easily identified.

These side-effect builtin procedures might all be portably redefined to versions which properly handled transactions. Compiled library routines would need to be recompiled as well. Many system extensions (delete-file, system, etc.) would also need to be redefined.

There are 2 scope issues that must be resolved for multiprocess transaction boundaries:

Process scope: The actions captured by a transaction should be only for the process which invoked the start of transaction. Although standard Scheme does not provide process primitives as such, dynamic-wind would provide a workable hook into process switching for many implementations.
Shared utilities with state: Some shared utilities have state which should not be part of a transaction. An example would be calling a pseudo-random number generator. If the success of a transaction depended on the pseudo-random number and failed, the state of the generator would be set back. Subsequent calls would keep returning the same number and keep failing. Pseudo-random number generators are not reentrant; thus they would require locks in order to operate properly in a multiprocess environment. Are all examples of utilities whose state should not be part of transactions also non-reentrant? If so, perhaps suspending transaction capture for the duration of locks would solve this problem.

I am a guest and not a member of the MIT Computer Science and Artificial Intelligence Laboratory. My actions and comments do not reflect in any way on MIT.
	agj @ alum.mit.edu	Go Figure!