Substituting Images for Books: The Economics for Libraries

Michael Lesk



Does it yet make economic sense for a library to replace its books with electronic copies? How should this be done? This talk reviews the balance between the cost of building shelf space and the cost of scanning, both of which are now approaching $30/volume (from opposite sides). Scanning old books just about makes economic sense now, and should be widespread in a few years.

1. Introduction

The cost of scanning a 300 page book is about $30. The cost of building an on-campus space for such a book is almost as much. Every large library is always planning new space construction, since libraries keep buying books and don't throw them out. Over 30 North American libraries buy more than 100,000 books a year. Does it make sense for a library to try scanning books instead of building shelf space?

Basically, it is not quite economical yet. It will be soon. And it is already economical for multiple libraries to share an electronic copy; this will push libraries towards cooperation. Furthermore, the economic decisions involve not only inter-university cooperation, but also trading capital and operational costs; neither of these is historically a strongpoint of university budgeting.

The whole question of the relation of publishing, tenure and economics is also now up for discussion. Although basically beyond the scope of this paper, libraries have to be aware of the possibility that online publication may replace large numbers of the existing printed journals, with a resulting shift in the demand from their users. Online publication might also cause a change in the size or number of traditional journals, also impacting libraries. [Odlyzko 1994]. As users shift to electronic materials, they acquire the skills and equipment needed to use them, and make it easier for libraries to convert other collections.

2. Electronic Acquisitions

Libraries today frequently buy bits, not paper; some spend half or more of their purchase funds on electronic materials. Most libraries now own CD readers and make various databases available, buying much new material on CD-ROM. Most libraries also use Internet terminals and access materials on the Web. In general, the material on the Web is not usually also available in print, whereas the CD-ROMS are usually versions of print publications.

An exception for the Web material is the availability of classic literature. Combining sources such as the Oxford Text Archive and Project Gutenberg, basic texts of many out-of-copyright literary works are now available online. The best scholarly edition may not the one which can be found, but the works are normally available free. There are many literary works available in languages other than English as well; in fact the complete corpus of classic Greek literature is available through sources such as the Thesaurus Linguae Graecae and the Perseus CD-ROM. [Crane 1991].

Many current print publications can be purchased on CD-ROM; these include, for example, the Adonis disks of biomedical journals [Stern 1990]. and the IEEE journals on CD-ROM. Elsevier has suggested that in addition to the Tulip experiment (materials science journals) [McKnight 1993]. they will make many other journals available in electronic form at about 10% over paper cost. Many corporate libraries, in areas such as pharmaceuticals, now spend more on electronic acquisitions than on paper acquisitions.

The availability of these sources of information are training users in the use of online information, just as the use of OPACs is training people in search systems. In fact, many people today in fields such as computer science and physics complain if information they want is not available online. They are no longer accustomed to using paper resources for most of their information needs. And many of them believe that electronics will make possible much cheaper access to information.

Similarly, many large collections in the humanities and social sciences are being digitized. Again, these range from university projects to commercial activities. Columbia University, on the one hand, is digitizing the drawings and photographs of the Avery Architectural Library, plus the files of the Rosenberg papers; while Corbus (a company controlled by Bill Gates) has bought the Bettman archive of photographs and the Oxford English Dictionary has been available in machine-readable form for about ten years.

Note that digitization can involve either preparing an Ascii version of the text, which can be searched and reformatted, or scanned images of the pages, which when displayed look just like the original form but can not be searched directly. Although it is commonplace for the digitized material to be online and free, while the scanned material is on CD-ROM and charged for, this is not universal. The table below shows examples of all categories. Note that nonprofit material on the Net is usually provided free, while for CD-ROM distribution some charge is normally made to cover the costs of actually sending the disks out.
Examples of Machine-Readable Material
OnlineProject Gutenberg: versions of many classic texts American Memory files from the Library of Congress; Beowulf manuscript from the British Library
CD-ROM Data Collection Initiative: texts for use with linguistic researchHarvard Judaica posters
Online Lexis/Nexis (Mead Data Central), full text books on Dialog and many other online vendors AT&T-UCSF RightPages project with Springer-Verlag and other publishers
CD-ROM The Chadwyck-Healey English Poetry Database; many abstract services on CD-ROM UMI's sales of IEEE publications on CD-ROM; many other publishers

Although every cell in this table is populated, some are much fuller than others. For example, there are relatively few vendors selling images on the Net; both the slow transmission time and the copyright risks discourage this form of presentation.

The development of the widespread use of computerized books has meant that libraries are considering whether or not to convert some of their collections. Does it make sense to take large numbers of old books and convert them? In addition to providing access to this material which might be preferred by many users, it also addresses two major problems of libraries. One is that books produced on acid paper (including most books published between 1850 and 1950) are deteriorating rapidly and must be converted to some alternate format. Another is that all major libraries are running out of space, and books are more compact in digitized form than in their original form. The problem libraries face is that they do not charge for their services, and if libraries provide the electronic books for free this produces no revenue to fund the conversion. And even if libraries were prepared to break with their traditional ethic and charge for access to electronic material (as they do for photocopying and many modern charged databases), it is not likely that much money can be raised by charging for access to old books, since the demand for them is low.

The most obvious bargain that can be struck is to trade the cost of conversion for the cost of building storage space. In this calculation, I will assume that all the books are published before 1920, and thus out of copyright in the United States. At the moment we have insufficient experience in obtaining copyright releases for post-1920 material to understand what the costs of doing this are likely to be. Even if publishers are prepared, as they often are, to waive fees for reproducing material that has been out of print for fifty years, the administrative costs of doing the paperwork are significant.

3. Scanning Costs

Bulk scanning is now relatively cheap. Consider two different models. One is based on relatively strong paper which can be fed through a stacker. Scanning costs for such material are relatively low, particularly the hardware costs. Today one can buy a double-sided 40ppm scanner for $5K. If it is runs full speed for half of one shift for two years, that would be 2000 operating hours, in which time it would scan about 5 million pages. If we paid $10/hour for someone to run the scanner, that would be another $40,000. Thus the cost per page would be about 1 cent per page, most of which is labor cost. This implies that the work should probably be done by an professional scanning operation, which can perhaps share the worker across several scanners and achieve greater efficiencies. In any case, at 1 cent per page the cost to scan a 300-page book would be about $3; other costs are likely to double that (fetching the book from the shelf, cutting the pages loose, etc).

On the CORE project, we also scanned modern printing on strong paper. We converted four years of twenty chemical journals published by the American Chemical Society. The illustrations in these journals were overwhelmingly line drawings. As a result, we decided not to use grey-scale and scanned at 300 dpi, one-bit-per pixel. We could achieve about 15 pages per minute through our scanner (which cost about $15,000 back in 1990), and paid somebody about $12.50/hr to stand in front of it. Unfortunately for the economics, we only had about 300,000 pages to scan. As a result the scanner was idle most of the time, and the cost to us was perhaps $20,000 total or 7 cents per page, most of which was amortized hardware cost. [Entlich 1996]. We also converted additional microfilm frames from earlier years of the journals, for which we paid a commercial scanning company about ten cents per page.

If material has to be placed on a flatbed scanner, costs are much higher. The best study of this was the Cornell/Xerox CLASS project which scanned 19th century mathematics books. [Kenney 1992]. Cornell used a flatbed scanner for several reasons. One was that some of their books were sufficiently brittle to be unsuitable for a stack feeder. The other was that they used a different scanning method for pages with illustrations and needed an operator to select such pages and indicate the locations of the illustrations. This allowed them to optimize for high contrast on normal print but to keep enough information about grey level to handle photographs when those appeared. However, it meant that each page had to be examined manually. Today, given the speed of some grey level scanners, it is likely that a more efficient process is to scan everything with grey level and decide automatically whether there are pictures on the pages.

The two best measured reformatting projects are the just-mentioned Xerox/Cornell CLASS project and the Mellon/University of Michigan JSTOR project. In the CLASS project, funded by Xerox and the Commission on Preservation and Access, 500 old mathematics books were converted to digital image format and reprinted. The scanning was done initially at 400 dpi, 8 bits per pixel, but converted immediately to 600 dpi one-bit-per-pixel which matched the printing requirements of the Xerox Docutech printer. This allowed the equivalent of photocopies to be made using the intermediate online form. The CLASS project kept careful track of its costs and paid approximately $30 per book scanned for the actual scanning; there were additional overhead and reprinting costs which made the cost of a substitute book on the shelf between $50 and $60. No effort was made in CLASS to perform OCR on the material scanned. Instead, traditional catalog records are used to find the relevant books. CLASS did, of course, update the necessary records to indicate the preservation of these books (none of which had been reprinted for filmed).

The JSTOR project is now underway and is converting all pre-1990 issues of ten widely held journals in economics and history. [Bowen 1995]. JSTOR is funded by the Andrew W. Mellon Foundation as one of its many admirable projects to assist libraries. The intent is to evaluate the practicality of substituting electronic versions of journals for paper backfiles, thus saving storage space. JSTOR is specifying scanning at 600 dots per inch, plus OCR and correction. They are being quoted about 20 cents per page for everything, or about $60 per 300-page volume.

Note that both CLASS and JSTOR used scanning at 600 dpi one-bit-per-pixel. This is becoming a typical spec for scanning high-contrast material. It may well be excessive. Fax scanning is 200x100 dpi, which is basically enough to read conventional text. We started scanning 4000 pages of chemical journals at 200 dpi, and the result was readable but sufficiently hard in a few places to cause us to switch to 300 dpi. At that resolution, the problem is screen space, not readability. However, since the primary scanning cost is not the machine, but the labor of handling the paper, using a better quality scan at 600 dpi is not likely to increase a total project cost very much. As a result, many librarians feel that they should make the higher quality scan, to decrease the chance that someone else will have to do it again later. Unfortunately, particularly for bad paper, it would be better to put the extra scanning bits into grey-scale rather than resolution. Figure 1 below shows a sample of a badly deteriorated document (a 1958 Union Pacific timetable) as scanned both in one-bit-per-pixel and grey level.

Obviously the grey bits make the result much more readable. If a bilevel result is wanted (eg for OCR) it can be made easily from the grey scan.

The problem of identifying the images in scanned pages automatically has been solved several times. The CORE project solution is shown in Figure 2,

which displays a sample column of text, along with several functions computed from it. The ``density'' column is simply the number of dark bits across each scan line. Notice that where there is text, this forms a regular series of peaks, one for each scan line. In the area of the illustration, it takes on irregular low values. The problem is to distinguish the regular bumps from the irregular ones; we do this with an autocorrelation function, taking the density function and multiplying it by itself after shifting one line spacing. This function, labeled ``Autocor,'' takes on high values where there are printed lines and low values where there are not. Note that it does not depend on having any of the characters recognized by OCR; in fact it is normally done on 100dpi-equivalent images to save processor time. It does, however, rely on pages which have been deskewed. We also have a deskewing algorithm which relies on finding the left edge of the page. This is not as generally applicable as some of the better algorithms which find the interline blanks, but it is adequate on the well-formatted chemical journals and faster.

The result is shown in Figure 3, which includes a scanned page image with a figure marked out by the program.

We achieved fairly good accuracy at identifying the figures in the journals this way, using in addition a search for the explicit bitmap for ``Figure.'' Some of the less well identified chemical schemes were not identified properly, often having several schemes run together. Part of the problem is not just identifying the figure, but linking it to the actual text that invokes it so that a hyperlink can be established. One solution is to use thumbnails rather than unlabeled links in the final interface, so that the user can see when a mistake has been made isolating a graphic.

On balance, we see costs per volume ranging from $30 to $60 with a likelihood that that the cost of scanning will decline to something less than $10 over time. In addition, of course, it is necessary to buy something on which to will require 15 Mbytes per book; at present disk prices of 25 cents per megabyte that is about $4. With the rapid fall in disk prices (see Figure 4) the storage cost is much smaller than the scanning costs.

Offline storage is considerably cheaper than disks. The lowest cost is magnetic tape; 8mm cartridges hold 5 GB at a price of about 1/6 cent per Mbyte, or 3 cents per book. Recordable CD blanks cost about 1 cent per megabyte or 15 cents a book, and may well be a better choice since they have much greater durability than tape. However, offline storage of any form is likely to be primarily a backup system. Libraries in general can not afford either the cost of having staff retrieve and mount items in response to user requests or the risk of letting users do this themselves. Exceptions may exist for very lightly used material.

4. Storage Costs

The other side of the equation, of course, is how much it costs a library to build a space on a shelf to put the book. There are two widely different kinds of costs, depending on whether the stacks to be built are on a crowded central campus or off somewhere out of sight. The offsite solution, often called the Harvard solution after the Harvard Depository, can use both much cheaper land and much cheaper construction techniques, since fancy architecture is not required. All construction costs, of course depend on local conditions.

Examples of recent book stack construction include the new addition to Olin Library at Cornell, which cost about $20/book and the new stack at Berkeley, which runs about $30/book (including earthquake resistance). Both of these buildings are fairly expensive, since they are largely underground. Underground construction, however, is the kind of requirement imposed today by the lack of space on many university campuses. A more conventional library is the new library at the University of Kentucky which holds 1.2M books and 4,000 reader seats for $58M, with construction costs of about $165 per square foot.

A survey done by Cooper and quoted by Bowen (see the JSTOR reference earlier) gives construction costs ranging from $21 to $41 per book, amortizing to $2.57 for capital cost; adding maintenance, utitilies and the like gives $3.07 as the cost of keeping a book on a shelf for a year. Retrieving and reshelving a book, at New York Public Library, a closed-shelf library, is about $2. An open-shelf library can reduce this to 60 cents to circulate a book, but this is not an option for an off-site library.

More expensive libraries exist, of course, but using them for comparison is not appropriate. The new British Library building is about $75/book, and the new French national library around $100/book. Both of these buildings include offices, reader services, and many other facilities besides just book stacks. As national libraries, they also need large exhibition halls and include artworks beyond what any university would normally include in a library building program.

By comparison, the Harvard Depository costs about $2/book to build. It is built in modules, each holding about 500,000 books, and Harvard fills a module every few years (Harvard buys about a quarter of a million volumes each year). Although the buildings are built economically, and at a far lower cost than could be achieved if they had to be placed on the crowded Cambridge campus, they still provide a secure (staff-only) and airconditioned place to put books. The Depository is approximately 35 miles away from the main Harvard campus, and as such there is a cost to retrieve books when requested. Donald Waters of Yale University has analyzed the costs involved in retrieving material from such a Depository compared with the costs of scanning. [Garrett 1995]. The current cost of storage in the Harvard Depository is estimated at 24 cents per volume per year, while the cost of retrieving a book is estimated at about $4/retrieval (this is higher than the New York Public Library cost quoted above since it includes a 35-mile van trip). Waters found the costs for electronic storage and retrieval to be higher (over $2/volume for storage and $6/volume for retrieval), but judged that the costs for the paper library would rise about 4% per year while the computer costs declined 50% every five years. Under these assumptions, the costs of the digital and traditional library operations cross over in about 5 years. In 10 years electronic storage has a major cost advantage, with access costs of $2.70 per book rather than $5.60 (as estimated by Waters).

Electronic libraries have some less easily quantified advantages over paper libraries. Books are never off-shelf or inaccessible because someone else is using them (unless publishers adopt per-user license agreements). More important, books do not become inaccessible because they are stolen, and they are not torn or damaged in the process of use. Access to electronic books can be faster than to paper books, although since users do not pay for access today it is not easy to recover cash from them in exchange for this faster access. Most important, access can involve searching and thus much more effective use of the material.

For some kinds of material, electronic libraries offer enormous advantages. [Robinson 1993]. Extremely fragile items which must be provided in surrogate form can be distributed widely in electronic form. The number of people who will be able to look at the Beowulf manuscript in the form of its electronic images will far exceed those who would have been able to travel to London and justify to the British Library staff their need to handle the unique and extremely fragile manuscript. Furthermore, the scanning of this manuscript has been done using advanced photographic techniques including UV scanning and backlit scanning, so that parts of the electronic images are legible even where the corresponding portions of the original manuscript are not legible (the Beowulf original was damaged by a fire in the 18th century and further damaged in repair attempts after the fire). [Kiernan 1995].

Electronic libraries also have some disadvantages. Many users are accustomed to paper, and either dislike or fear on-screen books. Some find the screens difficult to read. In circumstances where people are asked to provide their own equipment, this is expensive (although many colleges already ask every student to buy a computer). The equipment may have a low-quality screen, making reading even more difficult. If the library is providing the reading equipment, there may be times when it is overloaded, and users must wait to get at it. It may also be broken or under maintenance, so that although a single book is never ``offshelf'' it may well be that a great many books are sometimes unavailable because of a machine failure. The Bellcore library has an electronic catalog; at first it was unavailable on Sundays, during machine maintenance. The concept of the catalog being unavailable even though the room was open would not make any sense to a traditional librarian.

All these economic calculations are done solely from the view of a single library. Of course, the scanning operation need be done only once, and making additional copies is cheap. By contrast storage, delivery and preservation of paper copies is a per-copy cost. Libraries had already learned to cooperate in the purchase and storage of paper copies, sharing the acquisition of expensive foreign materials. But in an electronic world the pressure for cooperation will be much stronger, and libraries will have to adjust their attitudes to deal with this, as described in the final section of the paper.

Libraries also will have the problem of comparing money spent on buildings with money spent on services. In the past, many universities have not viewed capital expenses as comparable to running expenses; many have not even monetized space, for example. There is a need to be able to persuade a university administration that money should be moved from construction budgets to computer budgets, and this may be difficult. Even with the best understanding and budgeting at the university level, for example, state legislation may impact such changes in public universities. And donors have historically been more willing to donate large sums for buildings than for less tangible activities, particularly computer systems which they are aware become obsolete at a rapid clip.

5. Electronic Preservation

Libraries have been struggling with the problems of acid paper books that are deteriorating badly, and must be preserved in some way. The major alternatives are deacidification, photocopying, microfilming and digitization. Microfilming has been the most economical alternative and has been favored for many books, while at the same time research has continued on bulk deacidification processes (since page by page treatment is too expensive to imagine using on the 80 million or so post-1850 books which are in danger). However, deacidification (and photocopying) are treatments which must be applied in each library; microfilming and scanning are methods which when done to one copy of a book, provide a cheap solution for other libraries wishing to have permanent access to the same book.

Electronic scanning has many attractions for preservation. It can be done once and cheaply copied by other libraries, and it eliminates future worries about paper deterioration (most deacidification techniques slow the disintegration but do not reverse it). Books are no longer at risk of vandalism. As explained above, electronics allows more effective access to the books. Since digital copying can be done without error, it also means that future reproduction does not introduce additional differences from the original. Technology can provide images which are comparable (or in the case of some manuscripts exceed) the quality of the original.

A potential source of worry for librarians, however, is the lifetime of the new digital material. The story of the 1960 census, written on now-obsolete media and in practice less accessible than the 1860 census, is well known. Similarly, librarians know of problems with magnetic tape deteriorating over time, and of double-sided videodisks delaminating. [Bogart 1995]. With a book, visual inspection is adequate to tell whether it is still usable, and if the book is turning yellow there will still be a period of time in which it can be photocopied or microfilmed. By contrast, to confirm that a tape or disk is usable it must be mounted on some device and it is possible that the material will be found to be worthless, without any advance warning.

Computer media differ widely in durability, but the primary danger is one of technological obsolescence. Punched cards, for example, were made of quite strong paper and if stored properly will last many decades, but the equipment to read them is no longer easily available. Other media such as 8-inch floppies, punched paper tape, and various kinds of magneto-optical drives have become obsolete. Even such once-common media as 5.25-inch diskettes and 1/2-inch 9-track tape are clearly becoming rare. The computer industry develops so rapidly that a library should expect to have to copy everything every five to ten years. Safety in digital preservation does not depend on durable objects but on having an adequate number of copies and refreshing them regularly.

This, in itself, is disquieting to some librarians. Even if the resources can be found today to digitize something, they worry that the resources will not be available for these future migrations to new media. After all, there are well-known examples (such as some satellite observations of Brazil) which were lost for failure to copy tapes in time. Traditionally, librarians have considered preservation as a one-time expense; a book would be rebound or photocopied and then put back on the shelf and forgotten. The idea that an obligation is being placed on successor librarians without a clear source of funding to correspond bothers the current generation.

Fortunately, the costs of the copying are decreasing rapidly. If we assume a 50% decrease in cost every five years, then the long-term cost of doing the migrations is small compared to the first copy. The librarians are right that the future must involve this work, but the work will appear small compared with current costs.

More seriously, unfortunately, is the problem of software format obsolesence as well as hardware format obsolescence. There are many more software formats, and they come and go much more rapidly than do hardware formats. Librarians and others must worry about the problems of out-of-date word processor and spreadsheet formats, which must somehow be converted to standard forms if they are to be effectively preserved. We may find ourselves needing a new kind of professional, a ``digital paleographer,'' who specializes in understanding the formats used by bankrupt software vendors.

6. Usability

A key question for the users is whether material in electronic form is as suitable for use as paper. Microfilm, for example, is very durable, very compact, and yet not really accepted by many users. However, the studies that have been made of utilization of electronic material show that people generally do like it and are able to use effectively. Online catalogs, despite one Luddite (and error-prone) article in the New Yorker, are accepted in all libraries, and the conventional wisdom in university libraries is that once about a third of the books are available in the online system the students stop using the card catalog which has the remaining books.

One study of the comparative effectiveness of paper and electronic forms was the CORE experiments done by Dennis Egan and collaborators. An experimental group of 36 chemistry students at Cornell University were presented with five kinds of tasks to be done using 1000 articles from the Journal of the American Chemical Society. One-third of the students got the journals on paper; one-third in Ascii displays (the SuperBook interface); and one-third using a page image display. To oversimplify, the results were that simple reading tasks were as fast with the computer displays as with the original printed journals, and that searching was much faster with the computer systems. [Egan 1991].

Other studies have also shown that students can learn as fast using computer systems as they can with traditional teaching materials. In experiments at Stevens Institute of Technology, for example, students who used online versions of the texts for a class in the history of science did as well as students using the texts on paper. There is no reason for libraries to fear conversion to electronic form for texts in general. Computer-aided instruction in general shows a 30% or so improvement in training time [Eberts 1988]. (since students can learn at their own pace), and reading in particular can be done 25% faster and 25% more accurately with appropriate interfaces. [Egan 1989].

7. Copyright Issues

Libraries face some legal issues related to conversion of materials and electronic preservation. [Oakley 1990]. The most serious one is that under the US copyright law as of the time of writing (January 1996), libraries have a right to copy for preservation purposes, but only in analog format (section 108). Although libraries have special permission to make copies of copyrighted works for the purpose of preserving a deteriorating original, they may only do so in ``facsimile'' form, a word apparently intended (and understood by lawyers and librarians) to mean analog copying. A revision to the US law is under consideration and, again at time of writing, the text would extend this exception to permit digital copying. If this does not pass in some form (and there is considerable controversy about other aspects of the proposed new law) libraries will have great difficulty converting material published after 1920, which may be in copyright. There is a project at Cornell converting the core agricultural literature which will produce some data on the difficulty of obtaining copyright releases for a set of books between 1920 and 1950, but even the administrative cost of finding the name of the copyright owner is significant.

Another legal problem is that of format conversion. As mentioned before, the rapid change in the software market makes it desirable for libraries to store their material in some kind of standard format, which means that there will have to be conversion from various word processors or other formats to a standard, e.g. SGML for text. In the ``look-and-feel'' cases being pursued by Lotus against Paperback and Borland, Lotus is claiming that imitation of the macro commands of the Lotus 1-2-3 spreadsheet software is an infringement of their rights. Depending on the final court decisions after the appeals are finished, a broad view of the Lotus claim would prohibit libraries from converting from one format to another. This would be a major problem for libraries attempting to preserve electronic materials; I hope that I am right in my prediction that even if Lotus wins its cases, the scope of protection granted will be narrow enough to remove the problem. As of this writing the appeals court has found against Lotus and the Supreme Court has split evenly, leaving the appeals court decision in place.

8. Social Effects

Libraries will need to expand their cooperative activities. Today, most libraries are judged and budgeted on the basis of how many books they have. The Chronicle of Higher Education publishes a regular table showing the ranks of libraries, recently showing Michigan listed as 6th while Columbia is 7th, partly because Michigan has 6,584,081 books and Columbia has only 6,532,0266. It can not really matter, in evaluating a library or a university, what the third significant digit in the number of books owned might be.

In the digital future, possession of books is likely to be much less important than access to them. Many university faculty and students already get much information either from the Web or from commercial online services, when once they might have depended entirely on the university library. Libraries themselves, of course, are purchasing many electronic resources. Just as libraries have decreased their purchase of printed abstracts journals, replacing them with access to online search systems or CD-ROMs, they are starting to replace printed journals and books with electronic copies. Electronic books need not be stored locally, and for rarely used material, libraries are certain to start sharing the storage responsibilities whenever copyright law permits. As mentioned before, Cornell is working on a project that will shed some light on the difficulties of obtaining permission for copyrighted books before 1950.

Libraries already have many cooperative activities such as cataloging, interlibrary loan, and shared collecting responsibilities. The New England Deposit Library, for example, is a shared storage facility which dates back to the 1940s, and similar facilities exist in many other places. However, digitization will greatly increase the value of sharing, since there will be no delay to physically transmit a book from one place to another. Other interesting possibilities occur as well. It is substantially cheaper to scan a book if the paper is strong and can be fed through a stack feeder, rather than requiring manual handling of each page. So if a large library in an urban center with humid and polluted air and many users wishes to scan an acid-paper book, it may turn out that a small library located in a rural and cold mountain location with few readers and clean air has a copy in much better shape, and one that can be scanned more economically. But of course the small library, by itself, has less motivation to scan a book which is not yet deteriorating and rarely used; a cooperative program is needed to make sense of who should scan what.

Of course, as more and more library usage becomes remote access to electronic records, we will see a different organization of libraries. Many institutions that now maintain several small libraries in different locations will decide that in an electronic world, there is no longer a need for the staffed location, and just let users in those locations access the collection over a data network and the librarians over a voice network. And this raises a further possibility, namely of entire library operations being outsourced. If the students in a university library can not tell whether the books they are reading come from the local network or from a remote library, does it matter whether the university library actually owns any of its own books? Perhaps some organizations, whether large university libraries catering to small colleges, or commercial organizations whether publishers, distributors, or bookstores, will offer a college administration a complete replacement service, and financial pressures will encourage the administration to accept it.

Libraries, in this scenario, must begin to document the services they provide beyond just having books on shelves. Unfortunately, the tendency to value libraries by the size of their collections has meant that inadequate attention is paid to the assistance provided by staff to faculty and students. The services and training given by local staff to the university community are the aspects of the library which an outside vendor will have most difficulty providing. On many campuses the library building also doubles as a student center, or a study hall, or other functions. Library management is going to have to think through what functions are valuable and how this value can be explained to university administrators.

The job of managing this transition in a university library is not going to be an easy one. Since only one digital copy need be created to serve many users, sharing of copies whether cooperatively or competitively is going to happen. I fear that competition may develop between libraries to supply services, and that many college administrators will jump at the chance to outsource library functions. To avoid this, librarians will have to emphasize the value of the library staff aside from putting books on shelves, and look at other information uses within a university and how they can contribute to them. The danger to all of us is that much lightly-used research material may be bypassed in a rush to provide the heavily used and commercially lucrative current textbooks and important reference material online, and that the more broader mission of keeping our entire cultural heritage will be overlooked. Digitization should be a way of increasing memory and diversity, not a way of standardizing everything and abolishing university institutions.


[Bogart 1995]. John Van Bogart Magnetic Tape Storage and Handling: A Guide for Libraries and Archives Commission on Preservation and Access and the National Media Laboratory (June 1995).

[Bowen 1995]. W. G. Bowen; "JSTOR and the Economics of Scholarly Communication," Economics of Information Confrence, Washington, DC (1995). Web:

[Crane 1991]. Gregory Crane Perseus 1.0 Yale University Press (1991).

[Eberts 1988]. Ray Eberts, and John Brock; "Computer-Based Instruction," pages 599-627 in Handbook of Computer-Human Interaction, eds. G. P. Cornish and A. GallicoM. Helander, Elsevier North-Holland, (1988).

[Egan 1989]. D, Egan, J. Remde, L. Gomez, T. Landauer, J. Eberhart, and C. Lochbaum; "Formative Design-Evaluation of SuperBook," ACM Transactions on Information Systems 7 (1) pp. 30-57 (1989).

[Egan 1991]. D. E. Egan, M. E. Lesk, R. D. Ketchum, C. C. Lochbaum, J. R. Remde, M. Littman, and T. K. Landauer; "Hypertext for the electronic library? CORE sample results," Proc. Hypertext '91, San Antonio (Dec. 1991).

[Entlich 1996]. Richard Entlich, Lorrin Garson, Michael Lesk, Lorraine Normore, Jan Olsen, and Stuart Weibel Making a Digital Library: The Contents of the CORE Project to appear.

[Garrett 1995]. John Garrett, and Don Waters Preserving Digital Information Research Libraries Group and Commission on Preservation and Access (1995). Web:

[Kenney 1992]. Anne Kenney, and Lynne Personius Joint Study in Digital Preservation, Commission on Preservation and Access (1992). ISBN 1-887334-17-3. .

[Kiernan 1995]. K. Kiernan; "The Electronic Beowulf," Computers in Libraries pp. 14-15 (February 1995). Web:

[McKnight 1993]. C. McKnight; "Electronic journals -- past, present. . .and future?," ASLIB Proc. 45 pp. 7-10 (1993).

[Oakley 1990]. Robert L. Oakley Copyright and Preservation: A Serious Problem in Need of a Thoughtful Solution Commission on Preservation and Access (September 1990).

[Odlyzko 1994]. Andrew Odlyzko; "Tragic Loss or Good Riddance: The impending demise of traditional scholarly journals," Notes of the AMS (1994).

[Robinson 1993]. Peter Robinson The Digitization of Primary Text Sources Office for Humanities Communication, Oxford University Computing Services (1993).

[Stern 1990]. Barrie T. Stern; "ADONIS-a vision of the future," pages 23-33 in Interlending and Document Supply, eds. G. P. Cornish and A. Gallico, British Library, (1990).