KEYWORDS: Technology development, sociotechnical systems, system design assumptions, library services, codevelopment experiences, collaboration.
This paper describes some reflections on our work in order to invite collaboration around the design and use of electronic libraries and large collections of digital documents. In addition to libraries' technical needs, we discovered the assumptions we made as vendors of technology, and some of the assumptions that we perceived were being made by our library customers. By codeveloping and coproducing prototypes of electronic libraries we also came to understand the consequences of these assumptions. Two projects are illustrative of our experience. A joint development project with Cornell University provided the real experience necessary to develop viable technology and to investigate the work practices needed to put this technology into actual use.[4][3] The second project (at Indiana University Purdue University at Indianapolis) explored the viability of a World Wide Web service to support electronic reserve readings. Both of these projects are installations of Xerox image-based systems for creating and managing electronic documents from existing paper-based materials.
In 1989 Cornell University, with support from the Commission on Preservation and Access (CPA), began collaborating with Xerox to investigate the value of digital image scanning as an alternative to microfilm and photocopy reformatting of brittle library materials. Cornell wanted to build a digital library, the CPA was interested in data about the cost effectiveness of digital scanning methods, and Xerox wanted to explore new market opportunities for image-based products. There were several outcomes of this project. First, Cornell has developed a prototype of an Internet based on-line digital library, and they continue to add materials to a growing electronic corpus. Second, data were generated on the costs and on the practices required to support production level digital scanning and reformatting of brittle books. Finally, Xerox has started to understand the system requirements to and constraints on image-based document management products.[2][4]
When we first began this project, we knew some things and, of course, we assumed some things. We knew, for example, that Cornell wanted to explore new methods of providing access to its materials. At that time, the on-line card catalog indexed prospectively from 1972 while the paper card catalog indexed retrospectively of 1972. The librarians pointed out that materials indexed by the paper card catalog were used rarely as compared to materials indexed by the on-line catalog. Thus we assumed that, at the very least, there had to be access to the image library via the on-line catalog. And, given a link to the catalog, we extrapolated that, with a critical mass of on-line intellectual property, would come a need for entirely new and (hopefully) powerful search mechanisms. Indeed, significant resources were spent developing and testing the initial framework for such things as global uniqueness, full text retrieval, expanding search scopes, search visualization techniques, etc. The surprise was, five years later, that the search infrastructure remains untried. A document can still only be located via the catalog with a Boolean operation whose scope is confined to a specific collection or set of collections.
An interesting activity related to this use question began when Cornell placed a terminal to the image access system in their mathematics library. They witnessed an immediate and dramatic increase in the access rate to certain 19th century texts. Subsequently they removed that terminal and eventually replaced it with access to their Cornell Digital Library (CDL) system. Our understanding was that they wanted a less expensive access method to the digital images than an imaging terminal provided, a guaranteed "read only" access port, and a long term archive that was independent of a single vendor and a single product. This was one stimulus to create a World Wide Web interface to the collections of digital documents managed by our image server. In this way we could provide a repository that supported a common, open, "read only" access method and a server design implemented with recognized, industry standard components. Such a server could be supported by the customer or third-party market agents should the vendor become unable to support it.
IUPUI library experiment in electronic reserves In 1994 the library of Indiana University Purdue University at Indianapolis (IUPUI) entered into negotiations to acquire a Xerox image document management system and its prototype World Wide Web image viewing and navigation services. IUPUI is largely a commuter campus, and they are interested in placing reserve readings on-line thereby relieving the bottlenecks and delays experienced by patrons trying to use these reserved readings. They thought that a World Wide Web interface would fit their needs. Xerox is interested in testing new products in situations of actual use, and is trying to find ways to test and develop new products concurrently. In January 1995, several classes at IUPUI began experimenting with, and relying on, on-line reserve readings. In addition, IUPUI is providing access to these documents from all workstations in the library.
While the IUPUI library is new and the Cornell library is a relatively old, they both reported an acute lack of shelving space. Indeed, one of the secondary goals for the Cornell project was the reduction of shelf space devoted to any one document. At the recently constructed IUPUI library, we encountered essentially the same concern. There are stacks, and nearly all of them are located in public spaces. Moreover, the library building has been designed with sufficient floor loading to allow high density, movable shelving to be installed everywhere. Their goal is as nearly as possible to eliminate the paper inventory and replace it with easy user access to electronic masters which can be printed by users where ever they happen to be. This could be in the library building or it could be at home. It is the only library we have encountered that is focused on decentralized and remote printing rather than on centralized reproduction.
We find this situation particularly interesting. Our library customers and collaborators typically stress image quality as an important characteristic of any system they would use. Initially this was a surprise to us because it contradicts the low resolution, minimal storage, low cost-per-transaction focus many traditional customers have maintained for image-based reprographics systems. Our other library collaborators have restricted themselves to older, out of copyright corpora in order to maintain high image quality and enable experimentation with new forms of access and delivery services. Much of the material in the IUPUI collections, on the other hand, is of recent authorship. Offering such material via a system configured to permit remote printing raises interesting questions of intellectual property rights management.
Searching is a high priority capability.
As previously noted, at Cornell many ideas were generated early on relating to directly connecting the image document repository and the existing Cornell library catalog. However, after two years of work, the search tools for the CLASS system had not been improved over their initial very simple capabilities. Our initial projects on the digital reformatting of brittle materials did not require sophisticated search mechanisms, or any substantial changes to the existing automated card catalog as the indexing method into the holdings.
There are two reasons for this change in the priority of search capabilities. First, significant work was needed to develop and support the desired production scanning capability. This involved upgrades to scanning hardware and software, continual changes to the user interface software that enabled end user control of the scanner, and repeated experiments with the physical layout of equipment and the establishment of regular, repeatable, manual procedures for efficiently scanning books, page by page.
Second, Cornell's implementation of its digital library needed to proceed on many fronts and on its own schedule, and could not be kept waiting for the unavoidable delays associated with developing our prototype system. For this and other reasons Cornell elected to develop their own digital library system independent of the CLASS prototype. This enabled Cornell to work concurrently on access issues, and to import scanned documents from the CLASS system whenever it was most practicable. By separating these concerns, a sophisticated search capability never became a serious requirement for the technology we were developing. The truth is we were disappointed the project never actively experimented with searching. We remain convinced that the on-line card catalog will prove unsatisfactory for sophisticated searching or for very large collections. On the other hand, the librarians certainly didn't hold searching to be as important as the other goals the project actually achieved.
In our case, it was both practical and important to separate the work of building a corpus of electronic material from the work of supporting access and navigation. However, separating these activities is consequential and requires that attention be paid to joining the results. Significant work is required in electronic document import, export, and conversion. Document interchange formats and standards must be developed.
Navigation and access are independent of media.
It will be difficult to determine how current holdings, records, and bibliographies might best be modified until some significant collections of electronic documents are available. The IUPUI library is actively experimenting with combining automated card catalog searching with World Wide Web access from their "Scholar's Workstations." These remote entries will, in time, also include access to the electronic document. Whether these extended catalogs and guaranteed access will suffice for all users, researchers, and scholars is an open question. Our experience is that these experimental capabilities will raise expectations. For example, the ability to search selectively the content of the works referenced in the catalog may be an important extension that scholars will want in exchange for the loss of the proximity searches implicit in the layout of the physical card catalog. Finally, interesting navigation problems are raised by hybrid documents.[6] These are partially electronic and partially paper (or other paper-like media) and seem particularly problematic for hyper-linked electronic catalogs. One example of a hybrid document is a paper document referenced to by an on-line document, perhaps via a hyperlink, but which is not available on line. How will these materials be made available to patrons? What are the practicable ways of adapting current search, navigation, and lending practices to experiment with making these complex hybrids available?
Copyright assumptions
Copyright issues can be avoided, or can they?
The CLASS project led to experiments in electronic course pack production. These illustrate how differently copyright issues can be handled when prototyping electronic libraries. The Cornell library's interest in developing production level digital reformatting capabilities was easily accommodated by scanning books whose copyright had expired. In fact, by creating a relatively small corpus from older brittle materials the CLASS project was able to work continuously on developing scanning work practices and improving the technology required to support scanning work. Copyright issues were specifically avoided.
However, moving the scanning and printing technology to the Cornell bookstore to produce course packs raised a number of issues related to reprinting of copyrighted materials. For the Cornell Bookstore and Xerox this represented an opportunity to prototype another system that could support permission tracking, accounting, and producing customized collections of copyrighted materials. Copyright management turns out to be more important to publishers, printers, and resellers than to libraries.
At IUPUI, on the other hand, the experimental system is a World Wide Web interface to reserve readings. For experimental purposes, the documents made available have been essentially restricted to students in the classes using the materials, and the library can legitimately claim "fair use." However, the availability of reserve readings immediately requires the development of security and authentication technologies, so that "fair use" can be honored, as library services are extended beyond the confines of the library building. Copyright becomes a library concern in a new way. Even though libraries are enabling viewing and reprinting of their holdings, it is still possible to carry out electronic library experiments without facing copyright issues. The current Cornell library project to digitize documents that illustrate the building of the material, economic, and social infrastructure of the United States is an example of how much can be done without solving the problems of copyright. A common solution to copyright issues will evolve slowly under the influences of many forces and common preferences. Clearly, moving a technology into a new situation can change its relation to broad social and legal institutions such as copyright.
Automation assumptions Automation is about simplifying work.
One major goal of the CLASS project was to evaluate new equipment and procedures for preservation reformatting. One expectation was that the job of actually scanning the pages of brittle books would be a relatively simple job. Although manually scanning brittle materials, page by page, is very tedious, instead of being a simple, low-skilled job, the scanning job proved to be relatively highly skilled. In fact, regardless of their job level (in terms of pay or prestige), the Cornell library scanning technicians have become very discriminating in judging the archival quality of scanned materials. They have acquired a good deal of specialized knowledge regarding digital imaging of line drawings and photographs. And, furthermore, they have become skilled in using computer-based systems to manage archives and backups of digitized books. As the Cornell projects move their electronic corpus onto the Internet, the library staff will become more experienced and knowledgeable about how to build and manage network repositories.
In addition to bringing digital scanning and computer-based document management know-how into library communities, electronic libraries involve librarians in different working relationship with computer systems staff, and with computer systems vendors. In the CLASS project, engineers and library staff worked together, freely crossing the organizational boundary between Xerox and Cornell University.[1] The relationship with IUPUI in general maintains the organizational separation, but the engineering team has been active in supporting the activities and learning of the IUPUI library and systems staff. As vendors we approach these codevelopment relationships as the most effective way of experimenting with and enhancing the utility of new technologies. However, the multiple relationships and dependencies are complex and require deliberate nurturing. The development and maintenance of these relationships is crucial to the successful deployment of effective digital library systems.
Computers are often adopted in the hope that manual tasks, and especially onerous tasks, will simply be automated; and we often hope that the computers can be simply delivered. But our collaborations have never been simple, and what can actually happen is that existing work practices get replaced with new work and new ideas about work, and then the business changes to something it couldn't have been without computer support.
System design assumptions
Working with users is different, but it will be straightforward. In the beginning, CLASS project members assumed that even though they knew very little about the details of a digital scanning and printing system, and virtually nothing about preservation activities or the deployment of computer-based systems in library settings, understanding the requirements and delivering a working system would be relatively straightforward. In terms of the equipment, the project did not depend on engineering or scientific breakthroughs. We knew working with a prototype in another institution would be different; we thought it might be difficult, as unforeseen problems arose; but we assumed that it would not be especially problematic.
Looking back on the project, none of us could have predicted the kinds of systems issues that were the most difficult. One major issue was finding and constructing descriptions of the system, and users' problems with it, that could be shared by librarians, technicians, and software engineers throughout the project. Understanding and representing library work in engineering terms started with dataflow diagrams and led to experiments using videotape to assist with observation and evaluation. These practices have been developed and extended by colleagues into a variety of approaches to representing work practices in ways that support the needs of system design.[10]
We have detailed the work required to bridge enterprise boundaries elsewhere.[1] The CLASS project overcame some of these inter-enterprise boundary problems by including Cornell library staff in standard Xerox engineering project activities; by employing a user advocate/technical writer as a senior systems engineer; and by responding honestly to problems with the system and the difficulties in maintaining the planned schedules for deliverables. One lesson for continuing work in building digital libraries is that significant effort is required in constructing working relationships between members of different enterprises and communities of expertise.
Open systems aren't that complicated: we'll use standards. Another early CLASS system design issue was the need to develop a system that was open and standardized. The technology chosen to support the Cornell preservation activities came with standards already built in. Two examples are the choice of the Tagged Image File Format (TIFF) for the format for document image files and CCITT Group 4 as the compression standard. These choices determined several attributes of the system, from the cost of components to the need for, and availability of, document interchange practices and procedures. For the IUPUI World Wide Web interface into electronic reserve readings, similar system attributes have already been determined. The IUPUI early commitment to building a multi-media library, as well as the proliferation of document types on the World Wide Web, highlight these issues of multiple standards. At the moment digital library applications fall into multiple markets with differing standards. Furthermore, there are no generally established practices for managing a digital library. No one can say today what will constitute an open digital library system, and what standards will be the most prevalent. One current risk in building open library systems is the frequent need to forsake previous work. The desire to build on top of facilities that are already in place and performing satisfactorily is perfectly reasonable. But it is likely that fundamental building blocks and accepted standards will need to change for digital library systems to remain open.
Performance and usability criteria are readily determined. Early attempts in the CLASS project to develop metrics to help determine system usability proved very difficult. The users' hopes and expectations were tacit, while the engineers wanted these to be explicit and framed as formal specifications. One user finally said, "we'll know how usable it is when we use it!" The IUPUI experiments with World Wide Web access to electronic reserve readings have yielded useful results in terms of performance and utility. Student criticisms range from frustration with network transmissions of document images (broken images in WWW browsers), to difficulty rendering document images on VDTs and printers. While some of these problems are specific to the systems being used, all of these problems would not go away if the Xerox system were replaced by a different system. These digital library experiments depend on many components (applications and infrastructures) and it is too early in the cycle of system maturation to determine a priori how well the integration of components will support electronic library applications and services. The truth is that usability and performance criteria are only determined through use, and are not based on any single component of a digital system.
Preservation and archive assumptions
Digital media will help solve preservation problems.
The evaluation of digital reformatting at Cornell highlighted another issue for the development of digital repositories. Electronic libraries seem to offer ready solutions to many of the well-known preservation and archive problems with paper materials. However, an electronic library contains its own preservation problems. The binary digits (analogous to ink) which represent the information content do not decay but the containing medium (analogous to paper) does decay and both the medium and the semantics associated with the bits will suffer from obsolescence.[8]
In the CLASS project, many brittle books were reformatted and returned to circulation. However, the images are also stored on a networked file server maintained and supported by the Cornell Information Technologies staff. In addition to this on-line archive, the scanning technicians retain a bookshelf of magneto-optical disks containing archival versions of the digital images of these reformatted books. Even today, less than five years since the project started, the availability of reliable hardware and software to read these magneto-optical disks is in jeopardy.
The near term expectations are that new technical breakthroughs, coupled with market pressure, will continue to drive rapid changes in available technology (communications, mastering techniques, language facilities, rendering devices, identification methods, storage systems, data management techniques, etc.). These changes imply a regular introduction of works produced in novel and evolving representations. Multimedia authoring is a ready example. Archival information created in such an environment simply cannot be wed to any particular style or generation of electronic media. That means each library system must be designed with an information migration (reformatting) strategy in mind. Crafting that strategy will require ongoing collaboration between librarians and system designers.
The issue of preservation is not avoided by migration to an electronic library. Indeed, it seems that preservation of the artifact is more difficult, while preservation of the content can be more tractable than for paper documents. Paper documents can easily be rendered on electronic media, and are easily converted to different media. Wholly electronic documents (including hyperlinks, video, and audio) are either very difficult or impossible to render on paper. And they may even be difficult or impossible to render on electronic media different from that on which they were originally created. Paper documents are "low tech;" i.e., they often can be maintained with ubiquitous and mundane tools such as glue, paper, scissors, erasers, etc. Floppy disks, magnetic tapes, CD-ROMs, etc., are only maintained with expensive and highly specialized facilities. As newer devices such as holographic memory cubes become available, the preservation of artifacts will be increasingly difficult and require increasingly specialized support.
How people will respond to document forms other than paper, is similarly uncertain. Books, by their very nature, appeal to us at many levels.[7] Originally, a book was the work of many individual craftspeople; some to compose the content, to make the paper, to write the text, to illuminate the pages, to bind the whole. The inks, the paper, and the leather covers are pleasurable to smell and feel. And a book's content is directly accessible with normal human perception. For centuries, curling up with a good book has been something to be anticipated. By contrast, curling up with a CD-ROM doesn't evoke the same response. The CD-ROM is a sterile little disk whose content cannot be appreciated without the mediation of a mechanical chaperon. It is not yet clear what cultural position these new forms will assume. In addition to experiential changes, as books are transformed to other media, there are serious questions about identity and authenticity. What happens to scholarly research as potentially richer electronic document forms are mutated by ever newer media?
What's more, the form of the CD-ROM and its associated player is changing rapidly. For example, a group of CD-ROM manufacturers is now agreeing upon a revised standard which increases the data density more than ten fold, thus rendering existing player devices obsolete. If the form and function of the electronic documents themselves are in rapid flux, then so too is the form and function of the digital library that manages and circulates these documents. And therefore, the work practices and expectations of both the librarians and the users will also continue to change in an effort to adapt. Since it is impractical to redesign the library in response to each introduction of a novel new document form or library system, it is necessary to discover design dimensions which allow continuous migration and growth while retaining full access to the retrospective collections. This challenge is largely social in nature and must be included in our construction of the work necessary to create digital libraries.
AN INVITATION FOR COLLABORATIONS IN DESIGN AND USE
The
questions of system design are not just technical
As technologists and engineers we often assume that technical issues can be cleanly separated from other issues. So we enter into system design activities with detailed questions and information about computer platforms, networks, and infrastructure, but with little or no understanding of the work these computers are expected to support, or of the social and political nature of the organizations in which the resulting systems will be used. Even the CLASS project engineers, grounded in observations of existing preservation and reformatting work in the library, represented these observations with engineering flow diagrams and requests for system usability criteria. It took a good deal of time and effort to develop shared work goals between the engineering team and the Cornell University library staff.[3]
However, shared goals are not enough, because technical and political decisions are made about individual products in many contexts. For example, consider decisions made about document interchange formats. The current Xerox image-based document management system encodes information about a scanned document's constituent images in the ISO Office Document Architecture and Interchange Format. Although this is an international standard format, it is not a US library document format. Here is an example of a decision made outside the context of the collaboration and driven by the available standards. The decision by Xerox to use this format was made in a business and technical context that extended beyond the CLASS project. The Cornell Digital Library project, for reasons related to existing practices of university computing and library communities, designed their own document structure and interchange format.[9] These technical choices, influenced and determined by situations of use, have implications for the longevity and convertibility of documents between formats; they affect the ways in which this product is used to support electronic libraries and their users in general and Cornell's experiments in particular.
Innovations in technology have been studied and framed by social scientists as networks of relationships and interactions among humans, machines, documents, and their environments.[5] When this kind of view is applied to library systems, then the field of analysis that is used to provide requirements grows to include a larger number of components and a more complex sets of interactions. Looking at the whole might yield different attributes for the resultant systems. These systems might, in turn, stimulate new ways of looking at the practices involved in using electronic libraries, providing electronic library services, and developing the new technologies, organizations, and infrastructures required to support these new practices.
One outcome of looking at digital library systems as part of sociotechnical systems that involve librarians, library users, engineers, libraries, computers, books, electronic documents, etc., is to take seriously the notion that the technologies and the practices associated with their actual use are codeveloped and coproduced by all the participants. While the physical limitations of computers and software play an important role in determining the attributes and utility of the systems that are eventually put in place, existing and evolving social and political arenas in workplaces and communities also determine system attributes and capabilities, as was illustrated above. In order to develop and deliver technology to support new work, such as those envisioned for digital libraries, it is important to find ways for users and customers to participate in collaborative design and development activities.
Electronic document formats and interchange
Providing access and retrieval capabilities to electronic repositories requires librarians and computer systems vendors to define and determine practical systems for experimenting with and eventually standardizing document formats and interchange practices. Associated with changes in digital storage and transmission technologies, are the challenges of maintaining access to documents stored in varied and changing media and formats. It is not always clear which formats will stand the tests of time and the competition of business. Although a prudent position for specific libraries might be to wait until early adopters have determined the most popular and prevalent information representations, this prevents librarians from influencing the shape of these standards. On the other hand, the demands of patrons may force libraries into investing in solutions that prove costly to maintain. Ultimately, systems supporting electronic libraries need to provide tools and techniques that permit ready transformations among electronic document formats. The investigation of how to support document formats and conversions among formats will require collaborations between libraries, vendors, and users.
Document revision and configuration management
In addition to providing a repository in which electronic documents are stored, and from which documents can be retrieved, electronic libraries will also need to maintain and manage the integrity of the corpus. Two issues stand out. First, if hypertext links among documents become a standard way of referencing archived publications, then it may be important to maintain a parts lists for an electronic document. Second, since electronic documents are easily changed, it may also be important for libraries to discover and maintain the revision history for electronic documents. System developers have been dealing with issues of revision and configuration control for years. It is common for software systems to be composed of different revisions of constituent modules. Furthermore, engineering tools and practices have been established that support revision control for individual components as well as system configurations. This is an area of collaboration in which system vendors may be able to explore and support the evolving needs of librarians to manage easily-changed electronic documents.
Librarians will need new practices to manage electronic library assets, and these practices are likely to change as digital technologies evolve and mature. System developers and vendors will also need new practices as we enter into inter-organizational collaborations that support and extend our businesses. It is through these codevelopment efforts that practical and useful digital libraries will be built and maintained.