Date: Fri, 23 Sep 1994 20:49:27 -0500 From: gasarch@cs.UMD.EDU (William Ian Gasarch) To: rivest@theory.lcs.mit.edu Subject: coltbib README update ****************************************************************************** COLT MAILING LIST (colt@cs.uiuc.edu) The following message was received for distribution to subscribers of the colt mailing list. ****************************************************************************** This is being send in Late September 1994 The reason I am sending this update ahead of schedule is becasue there are some changes: 1) Information about them is in a file called bibinfo which is part of the package you can access OR you can just ftp that file separately. 2) We are using a program bibclean in our software. It made our software more maintainable and shorter but should have no affect on the user. 3) We are no longer supplying coltbib.dvi.Z We are instead supplying coltbib.tex from which you can create coltbib.dvi.Z. dvi files do not email that well. If enough people object to this change let me know and I'll change it back. 4) I've updated the list of available files in this README file. Its in the section called WHAT FILES ARE AVAILABLE? -------------------------------------- REQUEST: PLEASE check papers that YOU wrote that may require updated references. Send updates by just submitting new entries (I will take care of duplicates) I would like updates by OCTOBER 30 to put into the next version. PLEASE look at the file LOOKUP and if you have any info to clarify those entries please let me know. ____________________________________________________________- HISTORY AND OVERVIEW This bibliography, an ongoing project whose goal is to offer a reasonably complete database of publications in computational learning theory, came into existence in 1993 after the publication of the Computational Geometry Column in the Spring 1993 SIGACT News, describing the bibliogrpahy maintained by that community. At the suggestion of Lenny Pitt, Bill Gasarch and Anselm Blumer volunteered to begin a similar effort for computational learning theory. Dana Angluin, Sanjay Jain, Phil Long and Ron Rivest provided invaluable help by allowing their existing online bibliographies to be merged to provide a starting point. The tables of contents from all the COLT conferences were then added, as well as those papers from STOC and FOCS which seemed the most relevant. Gisli Hjaltason provided invaluable help by programming awk scripts to put everything into a standard format and find duplicates. WHAT FILES ARE AVAILABLE? Several files are available via anonymous ftp or email. (Next section says HOW to access them) The files available are the following. 1) colt.bib.Z. This is a compressed bibtex file. If you do decompress it (command: uncompress colt.bib.Z) then you will have a file colt.bib which is a latex biblio file. 2) coltbib.tex This is a short tex file. If you have the colt bib file in a file named colt.bib, and you latex coltbib.tex then you will get a file that, if printed out, would have EVERY ENTRY as they would appear in a bibliography. You probably do not want to print this out, but you might want to preview it. 3) coltbib.ps.Z This is a compressed file that when decompressed is a postscript file that, when printed out, gives you all the entries as they would appear in a bibliography. 4) authority This file contains the conventions that we use in naming conferences and journals in the bibilography file. (e.g. FOCS conference is referred to by Proc. Nth Annu. IEEE Sympos. Found. Comput. Sci.) 5) This README file. 6) coltall One file that when executed produces the five files above. To execute the file type sh coltall or type coltall 7) SOFT is a directory that contains our software and misc files. There is no real need to look in this directory and it is not organized for public use. Its only there to make the job of maintaining this bib easy. 8) pick- a tool for looking through the bib 9) tools- a file with some advise on where to find more sophisticated tools. 10) bibinfo- a file with some advise to find even more sophisticated tools that are becoming standard. 11) LOOKUP- a file of entries, in pairs, that I am trying to find more about so I can better classify them. The file you will be working with the most is colt.bib You are encouraged to use it actively, make additions and corrections, and send the changes to coltbib@cs.umd.edu for integration into a new revision. After unpacking the file you should preserve the original biblio Source file of colt.bib in some read-only form (called oldcolt.bib, say) and make a writeable copy for your actual changes. Later on, you should email me your changes by emailing me either entries that are not in the file or updated versions of entries that are in the file. (Details in next section.) HOW TO ACCESS You can access these files two ways 1) Using ftp Establish an anonymous ftp connection to cs.umd.edu, then change directories to pub/coltbib, then retrieve colt.bib.Z with ftp in binary mode. (This site is on Eastern Standard Time and although we place no restrictions on access time, the hours at which you can retrieve files efficiently may vary with the load on intervening networks.) $ ftp cs.umd.edu Name: anonymous Password: yourname@yoursite ftp> cd pub/coltbib ftp> dir total 924 -rw-r--r-- 1 gasarch 28679 Dec 10 17:57 README drwxr-xr-x 2 gasarch 1024 Dec 10 17:56 SOFT -rw-r--r-- 1 gasarch 7342 Nov 19 15:14 authority -rw-r--r-- 1 gasarch 87055 Nov 19 15:14 colt.bib.Z -rwxr-xr-x 1 gasarch 518953 Nov 19 15:17 coltall -rw-r--r-- 1 gasarch 112879 Nov 19 15:14 coltbib.dvi.Z -rw-r--r-- 1 gasarch 149048 Nov 4 15:13 coltbib.ps.Z ftp> binary ftp> get colt.bib.Z ftp> quit (you could get any of these files this way) Unpack it with uncompress colt.bib.Z If you get the file coltall then put it into an empty directory and execute it. It will produce all the files in compressed form. 2) Using email. If you mail to coltbib@cs.umd.edu then you will get the file coltall in return. If you want all five files or if you cannot ftp then this is the best way to get them. If you just want colt.bib then ftp is better to use. WARNING: Standard bibtex comes with a limit of 750 entries per bibliography, which is laughably small for us. People will be able to produce hardcopy according to their own tastes only once they get their local bibtex reconfigured. HOW TO SUBMIT Email new entries or updated versions of old entries to coltadd@cs.umd.edu They will be intergrated into the next version. What forms are acceptable for entries is discussed below. We will be updating this bibliography on a regular schedule (perhaps every four months). We will email to the colt mailing list a short while before the deadline that we are updating soon, though updates can be sent anytime. BIBLIOGRAPHERS This bibliography is and always has been a product of the good will of volunteers from our community. An essential idea of this project is that it ought to be a social effort, and that everyone who is using this bibliography should feel willing to contribute his or her share in improving it. The following are the volunteers who have helped produce this electronic bibliography, whether in getting it up and running, or converted to bibtex, or continuing to improve its coverage and accuracy. Dana Angluin, Anselm Blumer, Bill Gasarch, Gisli Hjaltason, Sanjay Jain, Bill Jones, Phil Long, Joseph O'Rourke, Ron Rivest However, if the project is to continue, it can only do so through widespread participation. We ask that, if you are using this bibliography, find it helpful, and wish it to carry on, you ``pay forward'' a fraction of the time which it saves you by joining us and contributing updates as described herein. CREATING ENTRIES What goes in? Papers relevant to computational learning theory, which for us means the study of the computational complexity of well-defined learning problems. Thus we are talking algorithms, data structures, analysis of time and storage, lower and upper bounds, etc. We interpret relevance in a rather broad sense, although we prefer that references from cognate areas (such as statistics, recursion theory, psychology of learning, etc.) emphasize books or survey articles rather than individual papers, and that these be included only if they would be referenced by several different papers in the bibliography. In the end, your judgement as a working researcher decides what is relevant and worth inclusion. (A pragmatic test: have you cited or would you cite the item in your own papers?) Future maintenance is easiest if you include only papers which are "stable"; i.e. published and openly available at least in the form of a numbered techreport, and preferably in a conference proceedings. However, it is okay to include preprints too. If the paper is slated to appear somewhere else, that information can usefully annotate the entry for an existing appearance. Mary-Claire van Leunen's book _A Handbook for Scholars_ (Knopf, New York, 1979) suggests that for utmost scholarship nothing short of the original title page should be trusted: "To write a reference, you must have the work you're referring to in front of you.... The temptation to write a reference without having the work before you will be powerful. Resist it. A vague recollection is worthless; a vivid recollection is probably the result of your imagination --- ingenious, no doubt, but of little use to your reader. Don't rely on your memory.... If you must not rely on your own memory, even less should you rely on someone else's. If your only access to a reference is through a secondary source, then you must refer to the secondary source as well as the primary one." We are less concerned with the sheer volume of what you add to the bibliography than with its accuracy and relevance. But please bear in mind that there is a minimum overhead of at least an hour to process each submission in the merging process, making larger submissions more efficient than tiny ones. Coordinating your changes with those of other colleagues, grad students, and so on at your site before sending is greatly appreciated. We are always open to suggestions on how to capture data with best efficiency and least overlap, but at the moment would suggest the following approach: 1) use the bibliography as a bibtex database when typesetting references for your own papers, so that adding entries and making corrections can happen as a natural side effect of your own work. 2) during that process, you will likely wind up referring to papers from some conference or journal year which isn't known to have been covered by the bibliography (see the list below). It would be very helpful if you took the time for at least one such paper to check through the whole volume and ensure that all relevant learning papers have been incorporated in the bibliography. (This doesn't take so long as you might think: by keeping an entry template with repetitive details ready in your editor, within an hour you can enter a full conference of about 50 papers.) 3) please look carefully at entries for papers written by you, or by people at your institution, to ensure their correctness. No one else can do this more accurately or more efficiently. 4) if you are caught up with current events and looking for a pastime, you can work on something from our open problems list; or you can check back through unexamined years of a journal or conference to ensure that all relevant papers are included, correct, and keywordized. FORMATTING ENTRIES Because of the distributed nature of updates, it seems desirable to have some written guidelines for the format of entries, in order that the final product have a consistent style. The following suggestions are based on common practice where discernible, established authorities where possible, and personal opinion where unavoidable. You will likely find existing entries in disagreement with these guidelines. Either the entry or the guidelines should be fixed. If some entry can't be decently handled by the current guidelines, or you think they're just plain wrong in any case, please let us know about it. In the hope of keeping future input work reasonably simple and error-free, a few lexical conventions were set at the time of bibtex conversion, as follows. Where possible you should use lower case (for simplicity), a leading comma ``, volume = 12'' (to make missing commas obvious), and put all text for a field on a single line (to avoid spending time on prettyprinting). If you must break lines, such as in the abstract= or annote= or comments= or note= fields, start subsequent lines with a tab. Special characters and diacriticals should be entered as specified on p.52 of the TeXbook. The common single-letter ones are described below. \' acute sup{\'e}rieur {\'}O'D{\'}unlaing [\*' in troff -ms] \` grave probl{\`e}me Bruy{\`e}re [\*` in troff -ms] \^ circumflex m{\^e}me Lema{\^i}tre [\*' in troff -ms] \~ tilde ma{\~n}ana N{\'u}{\~n}ez [\*' in troff -ms] \v hacek h{\'a}{\v c}ek Matou{\v s}ek [\*C in troff -ms] \c cedilla fran{\c c}ais {\'}Swi{\c a}tek [\*, in troff -ms] \" umlaut f{\"u}r G{\"u}ting [\*: in troff -ms] \H Hung. umlaut Erd{\H o}s Note that diacriticals precede the letter affected. A complication is that in TeX, control sequences specified using letters must somehow be separated from the ordinary letters that follow. A simple way is to use spaces as in "Erd\H os", but this will look like two separate words to bibtex. Another is to use braces as in "Erd\H{o}s", but this too is confounded by bibtex, which (1) normally wants to decapitalize text in titles not protected by braces, to support variant capitalization styles, and (2) will interpret an umlaut \" as the end of a quoted string, unless specially protected. Initially it might seem enough to put braces around the whole word when it contains either a fussy diacritical or (in a title field) a capital letter. However, it turns out that, because of how it handles the author field, bibtex dictates the convention to follow. Since adding a feature to recognize and handle accented characters in author fields (for benefit of the alpha bibliography styles), bibtex requires that we "place the entire accented character in braces; in this case either {\"o} or {\"{o}} will do .... furthermore these braces must not themselves be enclosed in braces (other than the ones that might delimit the entire field or the entire entry), and there must be a backslash as the very first character inside the braces". Thus you should use, for example, {\'O}'D{\'u}nlaing, Matou{\v s}ek, G{\"u}ting, and Erd{\H o}s, and we recommend that for consistency you treat all accents this way in whatever bibtex fields they appear. However you will further have to embrace the whole of any capitalized name that appears in a title field. Such is life with bibtex. Mathematical expressions, including numbers in titles, should always be entered in TeX notation. Author, title, and page information from other than the title page of the paper itself is untrustworthy: you might want to do data entry from a proceedings table of contents for speed, but please take time to proofread against title pages for accuracy. Below is a quick naming of parts for entries in the database, with discussions of the conventions that have evolved. More detailed information on entry formats can be found in the bibtex documentation. Entry type: we ignore some of the fine distinctions available in bibtex and map most everything onto the types of article, book, inbook, incollection, inproceedings, mastersthesis, phdthesis, and techreport. Preprints (a.k.a. "Manuscripts") are considered unnumbered techreports for our purposes, since they are often later distributed in that form. If present entries in the bibliography are any guide, you should rarely need other entry types. In particular, note that low-grade items like personal communications should not be included since our charter is to cover only openly available materials. If you need such an entry in your papers' reference lists, please keep it in a supplementary bibliography file until it is published. For example, "\bibliography{mine,mygroup,geom}" specifies a search path of three files bibtex can use to satisfy references. Citation tag: it's easy to come up with citetags that are mnemonic, short, or unique, but not to have all three at the same time. The system we use is a compromise. Our citetags consist of an author part (first letter of surname of each author), a title part (first letter or digit string of each significant word in the title, up to 7 characters), and a year part (last two digits of year of publication), separated by dashes. Thus "J. O'Rourke, Art Gallery Theorems and Algorithms, 1987" reduces to "o-agta-87". The tricky part of this is how to define "significant" words of the title, particularly when punctuation and mathematical strings are involved. Here are the formal rules, which are intended to produce a commonsense result as often as possible: - remove articles, conjunctions, and prepositions (i.e. words which wouldn't be capitalized in a title) - convert Roman numerals to Arabic, remove diacriticals and braces, remove quotes and apostrophes, convert other punctuation to spaces - retain only the first alpha/numeric token within $...$ delimiters - take the first letter, or first digit string, of remaining words - take the first 5 characters so produced For just under 99% of entries the citetag generated by this procedure will not conflict with that of any existing entry. But if it does, you'll have to find some way to break the tie. In our experience, collisions at this stage have come about only from various forms of the ``same paper'' conflicting with each other, and one of the following tiebreaking rules suffices: - for multipart papers, add the part number, in Arabic, to the title field (c-lbors1-90, c-lbors2-90) - for other variations on a theme, add a letter from a distinguishing word to the title field (s-mmdpsl-90, s-mmdpsp-90) - for alternate publications of a paper, append "a" for article, "i" for incollection or inbook or inproceedings, or "t" for techreport, to the year field (kkt-ptots-90i, kkt-ptots-90t) - otherwise, punt and discriminate using whatever you can (g-gramq-cga-88, g-gramq-edbt-88) If the resulting citetags don't match your favourite descriptor for the reference, you can still use the old familiar version if you declare a mapping between the two in your TeX source, such as the following: \newcommand{\smawk}{akmsw-gamsa-87} You needn't go through the process for entries you don't cite: the merging software will automatically generate citetags for contributed entries lacking them. It also keeps entries in the colt.bib file sorted in order of author, title, and year, in order to bring various appearances of the same paper together. (This should match the order of the default softcopy.) Fields: we support all fields from the bibtex standard styles, plus a few common extensions like abstract=, annote=, isbn=. Fields cites=, comments=, keywords=, precedes=, succeeds=, and update= are our own. Conventions for entering all of these are as follows. Quotes aren't necessary when the field value is entirely digits (true for volume, number, and year, normally). You should use the empty string "" for fields you can't complete just yet (e.g. pages = "" for a conference or journal paper to appear). Some older entries use a visible placeholder like "??"; if you need to cite them, please fill in the hole, or change to "", as feasible. abstract: verbatim from the original item (optional) - little used in present entries, it's best added only when short and sweet address: city of publication - use for books, techreports, theses, and obscure irregular conferences; otherwise discouraged - use only first city if publisher lists several - add two-letter state/province codes for US/Canada cities; for others, add country - give English-language name, with correct diacriticals (e.g. Munich rather than M{\"u}nchen, Saarbr{\"u}cken rather than Saarbruecken) - city/country names to use are those in effect at time of publication (e.g. West and East Germany from 1949 until 3 October 1990, Germany thereafter) annote: explanatory or critical comment on item content (optional) - little used in present entries, but welcomed author: - separate multiple author fields with " and ", order same as in reference - author's names in name, initials order - use braces to enclose capitalized or comma-separated elements of a compound surname, e.g. {Van Wyk} or {Lipski, Jr.} - instead of full given names you may follow the custom of mathematical literature and use initials, space-separated (exceptions to avoid collision: Ta. Asano, Te. Asano) - [van Leunen p.155] by "strict and narrow propriety" we should cite precisely the name which appears on the item, even if it leads to irregularities. While it is reasonable to fix up such typographical glitches (attributable to coauthors, copy editors, and the pressure of deadlines) as you are certain the author would want you to, inconsistent practice is for the author and not the bibliographer to worry about. booktitle: title of book or proceedings containing item - for English items, capitalize first word, first word after a colon, and all other words except articles and unstressed conjunctions and prepositions. Otherwise follow capitalization conventions of the native language, if you know them. (According to the MLA Handbook, for French, German, Italian, Latin, and Spanish, capitalization in titles is the same as in normal prose.) There is no need for braces on capitalized words in this field. - abbreviations for some popular conferences are in the authority file. The merging software will recognize and convert most variant abbreviations to standard form. chapter: chapter or section number, where item is part of a monograph - use entry type of incollection if chapter has its own title, inbook otherwise cites: citations made by item (optional) - give as list using biblio citetags, such as cites = "bs-dcms-76, gjpt-tsp-78, o-agta-87" - needn't be an exhaustive copy of the item's citations, but if used should at least give the significant ones. You can say cites = "(complete) bs-dcms-76, ..." if the list is exhaustive. comments: bibliographic marginal notes - supplemental information not a part of the reference proper: notes on a item's source language, or relation to other items, or a UMI order number and page count, or a Computing Reviews or Math Reviews number... - separate multiple comments with a semicolon - "to appear in", "submitted to", "in press" and the like require fixup later, at which time changes in other info such as title (and thus label) tend to be overlooked; so please use these only as comments on the future of an entry already published in some form edition: of a book - use numbered ordinal, e.g. "2nd" editor: - editors of proceedings not needed, and discouraged - otherwise, use guidelines for author institution: publisher of a techreport - include any relevant department, and list in minor-to-major order (e.g. "Dept. Comput. Sci., Univ. California Berkeley") isbn: of book (optional) - worthwhile only for obscure or otherwise hard-to-find items - give with hyphens as specified by publisher journal: - abbreviations for some popular journals are in the authority file. The merging software will recognize and convert most variant abbreviations to standard form. - separate journal series are considered separate journals, e.g. journal = "J. Combin. Theory Ser. A" rather than series or volume A keywords: - use to supplement, for searching or descriptive purposes, terms already present in the item's title - separate multiple keywords with commas - keywords need only be attached to the newest of a paper's appearances, if identical for all - use those in authority file, by preference - additions to the list of keywords, are expected and welcomed, within reason; month: month of publication - encouraged for techreports and theses, discouraged otherwise - use bibtex standard abbreviations (three letters, lower case, no quotes) note: - use for supplemental information which should appear in a citing paper's reference list; otherwise use comments field - e.g. note = "Errata in 2(1981), 105" - for theses, give techreport type and number, if known, e.g. note = "Report TR-86-103" number: of techreport, work in a series, or issue of journal - essential for true techreports (nolle techreportum sine numeratum) - for journals, necessary iff there exists more than one "page 1" per volume (e.g. proceedings as separately-paginated issue of journal), and discouraged otherwise - use "--" for combined issues, e.g. "3--4" pages: - use double dash "--" in a number range precedes/succeeds: pointer lists for temporal relationships among entries - for example precedes = "oy-nubks-88" points to new & improved paper succeeds = "k-cmgcl-77" is backpointer from it publisher: - see authority file for standard names of some publishers school: granting degree, for thesis - include any relevant department, since this assists inquiries about availability or contents, and list in minor-to-major order (e.g. "Dept. Comput. Sci., Univ. California Berkeley") series: of books - e.g. "Lecture Notes in Computer Science" title: of item - for English non-books, you need only capitalize first word and proper names, and enclose latter capitalized words in braces so that bibtex will leave them alone. If you prefer, you may capitalize other words in a title to get full uppers & lowers capitalization, but take care not to embrace them. (Full capitalization is optional because it's more complex, and we know of no journals still requiring it.) - omit qualifiers like "(extended abstract)" - [van Leunen p.170] regardless of the style of the original, use colon to separate title from subtitle (edit if necessary): for example change "Serial science. {I}. Definitions" to "Serial science, {I}: Definitions" - otherwise "correct" only what you're certain the author would want you to - enclose math expressions (including numbers) in {$ ... $}, and express in TeX notation type: of techreport or thesis - e.g. "Technical Report" or "Manuscript" or "M.{Phil}. Thesis" - for theses, give the actual degree name, and supplement with keywords "master thesis" or "doctoral thesis" accordingly - if the thesis was distributed as a numbered report, then give its type and number in the note= field - capitalized words after the first need braces in this field update: date and bibliographer corresponding to last change - maintained by the merging software (so don't bother) volume: in journal or multivolume book year: year of publication - "to appear", "submitted", "in press", etc? See the comments field. '%' lines before entries: marginal comments wrt bibliography maintenance - use to flag entries with errors you can't fix just now ("% wrong volume number"), or to flag truthful data that may look erroneous ("% yes, ``connexion''") - the string "###" can be used to call attention where you believe something is missing or wrong. Feel free to fix such entries if you have the correct details handy. MISCELLANEOUS COMMENTS and OPEN PROBLEMS Johnson's STOC/FOCS index issued in 1991 contains a wealth of information on papers originally published through those conferences. Look to it first if you need supplementary information on such a paper. Standard bibtex comes with a limit of 750 entries per bibliography, which is laughably small for us. People will be able to produce hardcopy according to their own tastes only once they get their local bibtex reconfigured. bibtex (or at least its standard style files) have some fixed ideas about how things can be published; for instance, the following won't print completely and elicits a warning on the grounds that a proceedings cannot have both volume and issue numbers (i.e. be published as a journal issue). However SIGGRAPH's position is that its papers are published *in* a proceedings and that the proceedings is *in* an issue of the journal. @inproceedings{awg-psg-78 , author = "P. Atherton and K. Weiler and D. P. Greenberg" , title = "Polygon shadow generation" , booktitle = "Proc. SIGGRAPH '78" , journal = "Comput. Graph." , volume = "12" , number = "3" , year = "1978" , pages = "275--281" , oldlabel = "geom-20" } Patashnik's Bibtexing notes suggest that we ``don't take the field names too seriously'' and adjust the entry until it prints right. For the moment, entries of this sort (the only ones are from SIGGRAPH) have been given the article type, and the booktitle field won't print. But if SIGGRAPH's position is correct, then it is bibtex which should adapt in the long run. (Perhaps bibtex might better have used a general inclusion field ``in = "siggraph78"'' rather than crossref with its hardwired list of permitted situations.) BIB SOFTWARE AVAILABLE ELSEWHERE The computational geometry community has made some programs available for working with bibliographies. They can be retrieved via anonymous ftp to cs.usask.ca [128.233.128.5], in file pub/geometry/geombib.tar.Z. They include programs to translate between `bibtex' format and the older `refer' format, and programs to search quickly in large bibliographies. A new program "bibview" which is an X tool for manipulating bibtex databases has been published in comp.sources.x/v18i099. Its README says it "supports the user in making new entries, searching for entries and moving entries from one bib to another. It is possible to work with more than one bib simultaneously. bibview is implemented with Xt and Athena Widgets." bibview is available for ftp from (among other places) comp.sources.x archive sites, from cs.orst.edu:/pub/src/printers/bibview.tar.Z, and from dsrbg2.informatik.tu-muenchen.de:/pub/tex/bibview-1.0.tar.Z.