<html>
<head>
<title>
DL94: Intellectual Realities and the Digital Library
</title>
</head>

<body>

<!--#include virtual="/DL94/header.ihtml" -->

<h1>
Intellectual Realities and the Digital Library
</h1>
<p>
Francis L. Miksa and Philip Doty<i><p>
Graduate School of Library and Information Science, The University of Texas at
Austin, Austin, Texas 78712-1276, {franmiks, pdoty}@uts.cc.utexas.edu<p>
<p>
<p>
<p>
</i>
<b>
Abstract
</b>
<p>
The question is asked, "Why should a digital library be called a 'library'?"
Three aspects of the traditional library as a collection of information sources
in a place are examined in order to shed light on their meaning to a digital
library.<p>
The idea of a collection is examined from the standpoint of both pragmatic and
necessary boundaries.  The idea of information sources is examined from the
standpoint of a source's "work" attributes and of the incommensurateness of
such works in a collection.  And the idea of the library in a place is examined
from the standpoint of logical space.  While no final conclusions are drawn,
the three concepts provide a basis for considering similar issues in the
digital library.<b><p>
<p>
Keywords:  </b>Digital library, library, library collections, information
resources, intellectual organization, Incommensurate data, work (intellectual
entity).<b><p>
<p>
<p>
<p>
1.  Introduction</b><p>
We begin with a question.  Why should a digital library, or an electronic
library, or a virtual library--for the purposes of the remarks here these three
terms will be considered synonymous--why should such a phenomenon be called a
"library"?  A digital library might well be called something else--a digital
information system, or a digital publishing system, to name two possible
alternatives.  But such alternatives have not been chosen.  Instead, "library"
has been the term of choice.  And this choice has been made not by librarians
(who might have been expected to choose it) but rather by computer and
information scientists who have been in the forefront of the development of
electronic information communications systems over the past three decades.<p>
Now, the term library might seems like a natural choice.  Then again it might
not be an enduring choice in the same way that "horseless carriage" did not
ultimately become the enduring term for the new technology, automobility.  The
purpose of these remarks is not, however, to quibble over a word, but rather to
reflect on certain aspects of the term library that might appear to be
instrumental in choosing it for this new technology, but which may well contain
implications not intended.  What we hope to do here is to explore the question
of whether a digital library should be called a library at all (implying, of
course, that it might actually be something else entirely) by examining
particular traditional aspects of the term.  However, in order to keep this
discussion to manageable proportions, here we will comment only on those
traditional aspects of the idea of a library which have to do with its internal
or intellectual realities as opposed to its external or social realities.
Topics to be discussed are highlighted in a statement which defines a
traditional library, that is, that a library is a collection of information
sources in a place.  Section 2 below will discuss the idea of a collection;
Section 3, the idea of information sources; and Section 4, the idea of a
place.<b><p>
<p>
2. The Library as a "Collection."</b><p>
The traditional library has always been defined as a "collection"--an
aggregation of informational objects.  Usually, focusing on the idea of a
collection is referenced to such things as its topical scope so as to determine
what should be in it, or its proposed clientele so as to determine what access
arrangements to it should be made.  These are legitimate concerns, of course,
but in quickly shifting to them we miss the most obvious thing that the idea of
a collection conveys--its implied boundaries.<p>
A collection implies boundaries in the sense that, while collecting at its root
simply means gathering, the act of gathering implies discrimination, some
objects being included and others excluded.  We might suppose that this simply
means that informational objects are included and non-informational objects
excluded.  But this is not the case at all.  <p>
In a traditional library objects are normally included or excluded first of all
on the basis of pragmatic considerations.  It has always been that way.
Expense both in acquiring and housing informational objects and the difficulty
of obtaining certain ones of them have always demanded limits to collections.
And the purpose for which a library is assembled has meant that not all
possible objects have been wanted in the collection.  After all, even where the
informational objects might be few for some given realm, it may not serve the
purpose of the collection to include all possible things--items in an
unreadable foreign language, items which are plainly redundant or derivative,
items representing incompetence, and the like.<p>
A much more trenchant basis for boundaries is not pragmatic, however, but
rather that boundaries appear to be necessary by the nature of things.  When a
collection is assembled it represents a segment of what Patrick Wilson has
aptly called the "bibliographical universe" [4] and which is defined elsewhere
as,<p>
an abstraction of the accumulated items or objects of recorded knowledge of
humankind in all of their various forms--for example, in the form of books,
periodicals, graphics (pictorial representations of one kind or another), sound
recordings, motion pictures, electronic data compilations, business,
governmental and personal records, and so on. [2]<p>
The abstraction called the bibliographical universe has limitations, of course.
For example, it includes only a single copy of each unique recorded knowledge
object.  And it necessarily excludes items which have become lost or which are
intentionally kept from the public eye.  The result is that not every
imaginable object of recorded knowledge is included in the abstraction.  <p>
Even with such  limitations, the bibliographical universe remains enormous.
however.  Think for a moment of the world's production of recorded
informational objects even in a single year--perhaps a million new editions of
books, countless periodical publications where even those recognized for
scientific and scholarly merit number in excess of 150,000 titles, not to
mention those which have ceased publication but are still useful.  And these
figures do not include serial publications of much more limited purview,
government publications at any level, audio-visual items of all kinds, archival
and other organizational records numbering in the trillions.  To this we add a
growing number of electronic sources, especially those representing visual
images and multimedia (including transmission of satellite imagery).  These may
best be spoken of in terms of terrabytes of data and they are growing in
numbers at exponential rates.<p>
All of the foregoing is mentioned because the reality of a collection, at least
traditionally, is that it necessarily represents only a very small segment of a
gigantic whole.  Access to these materials has been traditionally provided by
creating multiple libraries or collections, where the expense and labor in
assembling, describing, storing, and making them available is shared among
countless people and organizations.<p>
Within the context of a digital library, does the idea of a collection with
both pragmatic and natural boundaries still hold?  We ask this question because
it does not seem clear in the literature if what is meant by the digital
library is a series of such collections or one such collection.  For example,
at one of the exploratory seminars prior to the NSF Digital Library initiative,
Michael Lesk suggested that one of the purposes of the seminars was to express
the idea of a digital library forcefully enough so that Congress would invest
monies for creating it--that is, for creating <u>a</u> digital library for the
United States.[1]  What possibly could this mean, however, if the very notion
of a library implies boundaries of the kind described here?  Is not the idea of
a digital library also necessarily tied to the idea of boundaries as well?  In
short, is it not to be expected that there will be many digital collections
(i.e., libraries) just as there are presently many traditional libraries, or
does the idea of a digital library preclude boundaries in some extraordinary
way?  And finally, if boundaries are to be recognized, what would they be?<b><p>
<p>
3.  The Digital Library as a Collection of "Information Sources."</b><p>
The traditional library ordinarily implies the idea of a collection of
"information sources" which have the potential of informing in one way or
another those who consult them.  While the informative nature of an information
source seems straightforward, there would appear to be more to the idea of such
a source than its informativeness.  One primary attribute among many that
information sources have is that they have regularly been consulted and used in
terms of their separate and unique identities as intellectual or artistic
entities.  And as separate and unique entities they are by nature highly
incommensurate in the ways they assemble information, the reasons for
assembling information, the intended audiences for the act of assembling
information, and the like.<p>
The idea that an information source has a separate and unique identity as an
intellectual or artistic entity arises from the view that each such
informational source has two different but interrelated kinds of
attributes--those related to its content and those related to its container or
medium of transmission [2].<p>
Content attributes include such things as a source's intellectual form or
genre, its topicality, its intended audience, and so on.  But of special
importance among content attributes are those which in library cataloging are
clustered around the idea of the "work" or "works" an informational source
contains.  A work is an entity constituting the intellectual or artistic effort
of an intelligent being in representing his or her knowledge.  It is called a
work in the same sense that one designs and creates, say, a sidewalk or a
flower arrangement.  Each of the latter is the result of a direct effort that
begins with the mind.  In the case of information sources the result is a
discrete intellectual or artistic entity or product which has an intentional
structure, a storyline, arguments, and so on, with an identifiable beginning,
middle, and end.+ <p>
All information sources have the attribute of being a work in the sense spoken
of here, although not all will have been expressed with equal amounts of
intellectual or artistic skill and control.  Indeed, there is enormous variety
among them.  For example, some works may represent the expressions of single
individuals, but others will represent layers of participation by several
persons as in the creation of a work by one or more persons, its translation by
one or more others, its augmentation with illustrations or commentary and its
editing by still others.  In contrast, other information sources represent the
"utterances" (metaphorically) of corporate agencies.  Still others will
represent the confluence of human and computer capabilities, as in a database
software package.  It will have been designed by a team of persons.  But,
afterwards someone using the package will use the package to formulate a
specific database structure , someone else will input data into the structure,
still someone else may well devise a report program for the data, and a report
will be produced by the computational power of the computer which runs the
program.<p>
Likewise, there is immense variety in the way expression in information sources
has been shaped, not simply in terms of intellectual structure, but also in the
kind of expressive medium used--that is, whether the expression is textual and
discursive, textual and elliptical (as in textual tables), numeric, graphic,
composed in a special language such as mathematical or chemical notation) and
the like.  Indeed, some information sources may be distinctive by appearing to
be combinations of many of these motifs or by appearing to be chaotic in their
conception and execution, at least in terms of some standard of expression.<p>
The intersection of all of these kinds of variation in information sources is
so multifaceted, in fact, that it is fair to describe the whole universe of
such sources (or even a collection of them) as highly incommensurate in the way
they present information.  In fact, outside of differences that owe merely to
variations in the container in which information sources are
found,[[daggerdbl]] it seems fair to suppose that all information resources
represent unique variants in intellectual structure, formatting, expression,
etc.<p>
The reason for pointing out the "work" attributes of information sources as
well as their incommensurateness is that much (though not all) information
seeking is shaped by and inextricably connected to these attributes.  Ever
since information sources first came into existence they have regularly been
consulted in terms of the intellectual or artistic entities they contain and in
the light of the incommensurateness of their expressions of information.
Indeed, a large portion of humankind's intellectual activity has been devoted
to the task of intellectually grappling with such objects in these terms, often
assembling them as families of items based on how they refer to one another, or
because they refer to common ancestor sources, or because they contain common
text upon which comments are made, or because they have common sources
(including publishers), or because of any number of other attributes which show
their "family" relationships as whole intellectual or artistic entities.<p>
This aspect of the phenomenon of information sources in the traditional library
is pointed out because of undercurrents found in discussions of libraries in
electronic environments that speculate that the new technology will and should
promote something altogether different both in the nature of information
sources within the environment and in the nature of their use.  These
speculations stress the possibility of new kinds of information sources,
sources which, for example, evolve through emendations and changes put forth by
variety of successive participants and which, therefore, would seem to have
few, if any, of the work attributes of information sources just described.
Likewise, access to sources is occasionally described as a matter of
hypertextual linkages so finely wrought and controlled that retrieval would
resemble MEMEX-like navigation over bits and pieces of large numbers of sources
with little concern for the individual work attributes of any particular one.
Indeed, from the latter picture it would appear to be only a short leap to
information sources that are all controlled as to their structure the better to
access their parts.  Or it might suggest a situation in which information
sources are all entered into an electronic environment in such a way as to
blend them together conceptually (much like Doug Lenat's CYC project at MCC) so
that with a skillful search engine it would be possible and desirable for the
electronic mechanism to respond to the inquirer much like a person answering a
question.<p>
The point to be made here is not somehow to deny such possibilities or to
intone against them as "ought nots."  Who knows what this new technology might
become?  Few in 1905 could have imagined what the horseless carriage has now
become nor how over the years it would change the idea of personal and social
mobility.  Nor would it do to point out that endless and evolving information
sources already exist (for example, in the form of legal codes) and that some
forms of searching for information has always disregarded the work attributes
of information sources.  In the latter respect, for example, an administrator
in charge of enrollment probably neither cares nor reflects on the fact that
information related to the current enrollment of S. Smith comes from the
structured digital entity called the school's "Student Database."  Likewise, a
person interested in the highest batting average in the American League in 1949
will not likely be too concerned that the information came  from the <i>World
Almanac</i>.<p>
Rather, the point to be made is that the "work" attributes spoken of here have
been integral to human intellectual endeavors ever since information sources
came into existence millennia ago.  Further, the idea of a library has been
inextricably attached to the idea of providing access to such sources in terms
of their attributes as works (among other things) in response to that
intellectual pattern. Thus, while new possibilities may well become part of the
scene, if a digital library is, in fact, to be a library, would it not appear
that this aspect of a library must be an integral part of its conception?  Or,
is a digital library to be something else altogether.<b><p>
<p>
4.  The Library as a Collection of Information Sources "in a place."</b><p>
A third aspect of the traditional library that merits our attention is the
relationship it has to the idea of location in that a traditional library may
be defined usefully as a collection of information sources "in a place."  In
fact, one commonly thinks of a library in terms of its physical and, therefore,
its spatial location.<p>
Here especially, one might think, is the point where the idea of the digital
library really does distinguish itself from that of the traditional library.  A
digital library plainly does not need to exist in one place.  It can be
distributed over many different servers and clients in many different places.
Nevertheless, there is an aspect of the idea of "the library as a collection in
a place" that shows that an important factor about location which makes any
given collection of information sources a library is not actually a place as a
physical location but rather a place as an intellectual construct--a logical or
intellectual space, if you please--where location implies a rationalized set of
relationships imposed on the members of the collection.<p>
It is precisely this aspect of a traditional library that ties its divergent
elements together as an integrated entity and makes it more than merely a loose
assemblage of items.<p>
It is the reality of location as a logical or intellectual space which makes
doubtful whether even well-designed consortia and systems of separate
collections are libraries in and of themselves.  It is also this same sense of
logical or intellectual space which certainly contrasts with something as
pervasive as Internet gopher space today.  In the latter, one is faced with a
huge variety of useful sources not tied together as a single intellectual
construct, neither in the sense of structure nor in the sense of access
methods.  It is this reality that makes us readily conclude that gophers
considered together do not make a library; that something more is needed.<p>
The latter situation is not unlike the specter of the shopping resources of a
large city, where even directories like the yellow pages cannot overcome the
disparate nature of the way goods and services are organized, represented, and
made available.  Despite the friendly injunction to "let your fingers do the
walking," an individual has to amass a large amount of personal knowledge about
individual stores and agencies and the various ways they organize their wares
and conduct their business in order to negotiate them successfully for even a
single item.<p>
One answer to the same confusion with respect to information sources has always
been to make a library of them, where the idea of the library includes the
construction of a set of arrangements that overcomes the disparateness of the
individual sources by relating them to one another in terms of a single,
operational, intellectually structured whole.  This reality is one of the
inventions of the modern library of the past century or so, and, regardless of
how well or poorly we might think the result has been, or how relevant or
irrelevant, the lessons of that institutional experience are highly
instructive.<p>
The original vision of the library as propounded by nineteenth century pioneers
like Melvil Dewey and Charles A. Cutter (followed by others equally or even
more notable in the twentieth century) was more than simply a set of pragmatic
devices such as catalogs, classification systems, and reference desk
procedures.  It began in reality with a strong (as opposed to a weak) view of
the cohesive and interrelated nature of knowledge itself, of humankind's
accumulated social knowledge.  To these pioneers, organizing information
sources into a cohesive intellectual structure, regardless of the form of the
structure, was derivative of that preliminary vision of knowledge.  Their
efforts were shaped by what they assumed about that knowledge structure.  When
implemented in the form of bibliographic control practices, that same structure
provided a pathway to humankind's social knowledge.<p>
There can be little argument that the systems these people created (and with
which present-day libraries still contend) have severe limitations with respect
to modern information needs.  Some of the limitations have arisen from their
assumptions about knowledge itself, not only in how they viewed its
organization (i.e., linearly, in chiefly a two-dimensional hierarchical
structure, with monothetic classes) but also that there was only one true way
to organize it or that there was only one purpose for organizing (i.e., for
document retrieval).  Other limitations arose from the technology they had at
their disposal (for example, single entry book catalogs strapped by printing
cost limits, and card catalogs strapped by individual record space limits) or
in how they applied the technology (for example, classification limited in
application primarily to single-entry shelf sequences, at least in the U.S.).
Still other limitations resided in inadequate ideas about users' habits in
looking for information, including the failure to move beyond pre-coordinated
exact-match equations of document representation and controlled vocabulary
searching protocols.<p>
These people did contribute two essential components to the idea of a library
as a collection in a place, however.  First, they focused heavily on the idea
that what makes a library in large measure a library is the intellectual or
logical space necessary to accommodate the information sources a library
collects. Second, they appropriately assessed the reality that creating such a
space will necessarily require a great deal of time and effort.  The latter was
not unlike creating a commercial empire or, in academia, writing a
comprehensive treatise which as an introduction to a topic is so complete it is
not likely to be surpassed for years to come.  The result of their work was the
rather incredible (when one thinks about it) possibility of being able to enter
through the door of that agency called a library with a reasonable expectation
that the information sources collected there have been organized into a
sensible whole and that, even if the client does not understand the structure,
it would provide a basis for finding sources that fit his or her need.  In this
respect the metaphor of the library as a door through which one goes to find an
intellectually organized set of information resources is very provocative.<p>
Here, as in the first two points of this paper, we ask the obvious question,
that if a digital library is to be a library at all, must it not contend with
this extraordinary need to create a logical space, one that in reality
accommodates the boundaries among individual information sources as works,
including their incommensurability?  There has been talk of extraordinary
solutions to some of these problems in the new electronic
environment--knowbots, for instance, which can search for information in
hyperspace apart from any organization of that space, or gigantic parallel
processing mechanisms that would obviate the initial natural act of information
searching which is the exclusion of searching routes, the latter commonly in
terms of semantic relationships inherent in the idea being focused on and in
the light of the structure of knowledge in general.  Such alternatives may
ultimately become the appropriate way to proceed, of course.  But then, would
not the result appear to be something different than what is understood to be a
library?<b><p>
References</b><p>
[1]	Lesk, Michael.  1993.  "The Digital Library:  What is it?  Why Should it be
Here?"  In <i>Source Book on Digital Libraries.</i>  E. A. Fox, ed.
Blacksburg, Va.:  Department of Computer Science, Virginia Tech University, TR
93-35.  (Print version of electronic file.)<p>
<p>
[2]	Miksa, Francis.  1994.  "The Universe of Knowledge, the Bibliographic
Universe, and Bibliographic Control."  Ch. 1 in <i>Library Cataloging and
Bibliographic Control.</i>  Austin, Tx.: Ginny's Copy Service.<p>
<p>
[3]	O'Neill, Edward T. and Vizine-Goetz, Diane.  1989.  "Bibliographic
Relationships:  Implications for the Function of the Catalog."  In <i>The
Conceptual Foundations of Descriptive Cataloging.</i>  San Diego:  Academic
Press, 167-179.<p>
<p>
[4]	Wilson, Patrick.  1983.  "The Catalog as Access Mechanism:  Background
Concepts." <i>Library Resources and Technical Services </i>27, no. 1 (Jan/Mar):
4-17.<p>
<p>
<p>
<p>
<p>
<p>
<hr>
+ We exclude here for the sake of simplicity the phenomenon of an "incomplete"
work which occurs because its creator was unable to complete it or uninterested
in doing so.<p>
[[daggerdbl]] <i>The Expedition of Humphrey Clinker,</i> an epistolary novel by
Tobias Smollett first published in 1771, has in excess of 100 different records
in the OCLC database [3], but it is doubtful that they represent more than one
intellectually unique structured entity or work.  Different editions have
arisen mainly from republication in different physical formats.

<!--#include virtual="/DL94/footer.ihtml" -->
Last Modified: <!--#echo var="LAST_MODIFIED" --> <br>

</body></html>

