<html>
<head>
<!-- This document was created from RTF source by rtftohtml version 2.7.3 -->
<title>
DL94: Translating Data to Knowledge in Digital Libraries
</title>
</head>
<body>

<!--#include virtual="/DL94/header.ihtml" -->

<h1>
Translating Data to Knowledge in Digital Libraries
</h1>

<p>
<author>
Gordon K. Springer1 and Timothy B. Patrick2 
</author>
<sup></sup><p>
<i>
1Department of Computer Science, 
2Medical Informatics Group, School of Medicine, 
University of Missouri-Columbia, Columbia, Missouri, USA, 65211, 
{springer, patrick}@condor.cs.missouri.edu
</i>
<p>
<p>
<p>
<p>
<h3>
1.  Introduction</h3>
For
the first time in more than 1000 years an opportunity exists to change the
nature of the way libraries are organized and the way that the data contained
within them are accessed.  In the classical sense a library is an organized
collection of data or artifacts.  The collection is organized such that a user
of the library has a procedure or method for identifying a desired item and
being able to extract that item from the collection for perusal or use.  This
implies the existence of a classification scheme which can be used to store and
retrieve items in the library or collection<b>.</b><p>

The classification schemes, the physical organization and methods of access
in<b> </b>traditional libraries are bounded by the fact that these procedures
are<b> </b>focused on storing or extracting <i>documents</i> from a finite
dimensional<b> </b>space.  The dimension is usually three.  Books, journals and
the like are<b> </b>stored on shelves of a library and the method of access is
to go to a point<b> </b>in 3-space where the particular book or document can be
found.  Similarly,<b> </b>classification schemes utilized are broadly divided
into author, title and<b> </b>subject headings.  A fourth division, keywords,
attempts to quantify the<b> </b>content of a document.  Even so, the
classification schemes simply mirror<b> </b>the physical organization of the
collection.  And, in doing so, limits the<b> </b>ability of users to extract
information or knowledge in an intelligent way.<b> </b>This is not to belittle
the user or the library, but to point out the<b> </b>shortcomings of a
classical library which was designed to store and retrieve<b> </b>documents not
information or knowledge.<p>

With the evolution of the digital library, the traditional limitations of<b>
</b>collecting, organizing and retrieving items from a finite dimensional
space<b> </b>are not present.  The opportunity and the challenge is to take
advantage of the freedom to focus on extracting information and knowledge from
the digital collections.  The dimensionality of the information space is
increased only if analysis tools or <i>filters</i> are utilized as an inherent
part of the process of searching and extracting information instead of data
from the library collections.  It would be a serious injustice to continue to
only extract <i>documents</i>.  An excellent overview of these challenges are
presented in [1].<p>

<h3>
2.  Discussion</h3>
In
order to translate data to knowledge, access to large quantities of data is
necessary, and information must be extracted from these data.  Digital
libraries provide access to these data much more readily than is possible in
the traditional library. To be truly useful, the classification schemes used to
locate data in the massive, distributed collections must be extensive and
fine-grained.  In addition, it is necessary to be able to quickly and precisely
locate the desired digital collections needed to satisfy an information
request.  Thus an organized methodology must be in place to both enhance the
finding of pertinent data for a request, as well as limiting the number of
discrete data collections that must be accessed to extract the information
desired. <p>

The problem of extracting information from data is not addressed by simply
developing better classification schemes, organizing data collections using
newer and better database schema, nor simply making the data accessible to the
entire world by quickly transporting it across the evolving computer networks
or <i>data highways</i>.  <i>Filters</i> are needed that can derive information
or knowledge that can be extracted and analyzed from the massive collections of
data stored in digital form.  Moreover, extracted information from one source
should be usable as input to extract additional information from another
source. Thus, it is not simply a single, general purpose filter that is needed.
It requires a very large number of filters, that are discipline and user
specific.  The challenge is to make it possible for a wide variety of filters
to be utilized, when appropriate, to process the data and information available
and to extract the desired information.<p>

We are developing a system that is based upon the need for providing the user
with information rather than just data.  It involves the integration of
autonomous programs and analysis tools, which can be viewed as filters, to
extract the maximum amount of information that can be obtained about genetic
sequence data in the biomedical sciences.  This system utilizes servers that
are based upon open-system, distributed computing concepts. These servers offer
various kinds of services to users with information needs [2].  Integral to
this system is the ability of a given server to <i>advertise</i> its services
which can be quickly and efficiently utilized by prospective users.  The user
is unaware of where the services are located and what is entailed to access the
servers.  What the users do know is that they receive information, not data, in
response to their queries.  A discussion of the mechanisms used in this system
can be found elsewhere [3].<p>

<h3>

3.  Summary
The digital library brings with it the need to break with the traditions of the classical library.  We need to seek out better ways to increase the dimensionality of the information space to provide a wider variety of pathways to the information contained within the space.  This necessitates the use of analysis tools or filters to process the data contained in the search space so that information rather than documents is returned to the user.</h3>
The
<b>National Information Infrastructure</b> is going to be saturated with data
flowing from one location to another if the process continues to focus on
<i>document retrieval</i>.  Without the use of analysis tools or filters to
translate the data into information as an integral part of the process, we will
continue to be buried in a sea of data.  With the use of these filters, we will
be able to take full advantage of the digital library technology and provide
users with the information they need to effectively carry out their desired
activities.<p>

<h3>


Acknowledgments
</h3>
This
work was supported in part by grants LM07089 and LM05513 from the National
Library of Medicine, and also by Pittsburgh Supercomputing Center grant number
NCR930001P from the NIH National Center for Research Resources.  Its contents
are solely the responsibility of the authors and do not necessarily represent
the official views of the National Library of Medicine or the Pittsburgh
Supercomputing Center.<p>

<h3>


References
</h3>

[1]	Garrett, J. R., 1993.  Digital Libraries, The Grand Challenges, EDUCOM
Review, July-August 1993, pp. 17-21.<br>
<br>
[2]	Springer, G. K., 1994.  A National Scientific Computing Environment for the
Biological Sciences, Proceedings of the 27th Annual Hawaii International
Conference on System Sciences, IEEE Computer Society Press, Los Alamitos, CA,
1994, Volume V, pp.87-88.<br>
<br>
[3]	Patrick T. B., Springer, G. K., Sista, S. M., and Davison, S., 1994.
Methods for Shared Access to Medical Internet Information Sources, Proceedings
of the American Medical Informatics Association Spring 1994 Congress, American
Medical Informatics Association, Bethesda, MD, p. 123.<br>
<br>
<br>
<br>

</p>

<!--#include virtual="/DL94/footer.ihtml" -->
Last Modified: <!--#echo var="LAST_MODIFIED" --> <br>
</body>
</html>
