<html>
<head>
<title>
DL94: Research Agenda for the Intelligent Digital Library 
</title>
</head>

<body>

<!--#include virtual="/DL94/header.ihtml" -->

<h1>
Research Agenda for the Intelligent Digital Library 
</h1>

Elizabeth D. Liddy[1], Michael B. Eisenberg[1], Charles R. McClure[1], Kim Mills[2], Susan Mernit[3], and James D. Luckett[4] <p>

[1] <em>School of Information Studies, Syracuse University, liddy@mailbox.syr.edu, mike@ericir.syr.edu, cmcclure@suvm.syr.edu, </em><p>

[2] <em>Northeast Parallel Architecture Center, Syracuse University, kim@npac.syr.edu,</em> <p>

[3] <em>Scholastic Press, Inc., New York City, New York, susanim@aol.com,</em><p>

[4] <em>NYSERNet,  Liverpool, New York,  luckett@nysernet.org</em><p>

<h3>Abstract</h3>
We propose to meet the challenge facing the emerging global digital library by
providing "intelligent" access to an extensive testbed of digital resources for
a diverse group of real users using HPCC technology. The user environment is
K-12 education, which offers a range of electronically connected users with a
rich variety of uses. AskERIC currently provides question-answering service to
K-12 educators through the Internet via expert human intermediaries who
interpret users' queries, search relevant databases, and communicate the
information back to the user. AskERIC is a success due to its ability to
provide intelligent retrieval through these human intermediaries. The purpose
of our project is to similarly add intelligence to information seeking and
retrieval via unmediated natural language processing of texts and queries. The
DR-LINK system, developed under ARPA funding, will provide the base technology
for our "Digital Librarian" whose performance will be compared directly with
that of AskERIC's information specialists. Equally important and integral to
establishing access to and use of digital libraries are impact and policy
issues. Therefore, we will investigate the impact of digital library access and
resources in relation to users' behavior and will develop and assess policy
options, models, and recommendations.<p>
<b><p>
Keywords:</b>  Intelligent information retrieval, natural language processing,
HPCC, K-12 education, information policy, user studies.<p>
<p>
<p>
<b><p>
Goals &amp; Objectives</b><p>
The Intelligent Digital Library Project has two primary goals:<p>
<p>
1.	<u>Intelligent Retrieval</u>:  To accomplish a level of non-mediated
intelligent retrieval in a networked environment that is now possible only with
the assistance of a human intermediary.<p>
<p>
Based on the ARPA-funded DR-LINK natural language processing (NLP) approach to
representation and retrieval, we will create and test a Digital Librarian to
simulate the human intermediary's ability to retrieve information on the basis
of what people mean in their query, not just what they say. Documents will be
"understood" in the same way--their content represented at the conceptual level
of expression. DR-LINK is a prototype system that has been extensively tested
on hundreds of queries and gigabytes of documents. Porting the system to a new
user-base, broader ranges of document types, and a distributed environment will
entail basic NLP and information retrieval research. The performance of the
Digital Librarian will be tested in relation to the human intermediary and
current retrieval systems. Our objectives associated with this goal are to
investigate: <p>
<p>
<ul>
<li> The ability of NLP to add intelligence to retrieval<p>
<li> An adaptive, connectionist NLP system based on the original symbolic NLP
system<p>
<li> A direct comparison of machine and human provision of information<p>
<li> Information access and retrieval in an HPCC environment<p>
<li> Image retrieval via text-based and visual indexing and querying.<p>
</ul>
<p>
2.	<u>Development, Impact, and Policy Issues</u>:  To investigate the
development, impact, and policy issues associated with creating the first
significant digital library collection of learning materials for <p>
	K-12 educators and students.<p>
<p>
The testbed will include a rich and useful collection of primary materials in
math and science, a significant collection of teacher education materials, and
extensive resources focusing on key issues and aspects of teaching, learning,
and management. It will also contain a broad array of newspapers, journals, and
video clips. Specific objectives are to investigate:<p>
<p>
<ul>
<li> Models for creating quality digital library collections integrating digital
text and multimedia collections from independent providers<p>
<li> Effects of access and use on individual and group behavior in teaching,
learning, planning, administration, and decision making.<p>
<li> Policy implications related to costs, charging, accessibility, property
rights, and standards<p>
</ul>
<b><p>
The Research Team</b><p>
Leading the project is a research team drawn from academic units, research
centers, and commercial information providers:<p>
<p>
<ul>
<li> Syracuse University's School of Information Studies (IST), a leading academic
and research institution in the areas of information science, retrieval,
management, and policy<p>
<li> AskERIC, a federally funded, Internet-based education information service
managed by the Educational Resources Information Center (ERIC) Clearinghouse on
Information &amp; Technology (ERIC/IT)<p>
<li> Scholastic, Inc., the largest educational publisher and distributor of K-12
educational materials (books, magazines, and software) in the world and creator
of the Scholastic Network, a telecommunications information service targeted at
the K-12 community<p>
<li> Northeast Parallel Architecture Center (NPAC), a state and federally funded
research and development center that is dedicated to HPCC and works in
partnership with numerous corporations in a range of markets<p>
<li> NYSERNet, a high-speed data network connecting 200 organizations, (from large
universities and research centers to small public libraries and K-12 schools)
in New York State to the global community of the Internet<p>
</ul>
<p>
Joining this team are contributing partners from a wide range of commercial and
educational institutions. They will be integrally involved with creating the
digital library, conducting formal user research, developing software, making
network connections, and investigating policy and impact concerns. These
partners include:<p>
<p>
<ul>
<li> Bell Atlantic<p>
<li> Discovery Communications<p>
<li> CNN Newsroom<p>
<li> NYNEX<p>
<li> Dialog Information Systems<p>
<li> Xerox Palo Alto Research Center (PARC)<p>
</ul>
In addition, a range of state, regional, and local schools, school districts,
and educational organizations providing a range of bandwidth access have agreed
to participate as formal test sites.<b><p>
<p>
Project Organization</b><p>
To achieve the project goals, our research and development efforts are
organized into five major components. Figure 1 shows a unified model of the
project, integrating the various components.<p>
<p>
(1)	<u>Users and Uses</u>:  K-12 teachers, students, librarians, and
administrators will seek out resources in the digital library for:<p>
<p>
<ul>
<li> Planning and delivering curriculum and teaching<p>
<li> Completing assignments and projects<p>
<li> Managing, planning, and making decisions about education.<p>
</ul>
<p>
The formal test groups will include the full range of "real users," ranging
from those at sites with low-end platforms and access (e.g., VT100 at 2400 bits
per second) to those with high-end platforms and access (e.g., graphics clients
at T1 and beyond). AskERIC and NYSERNet will coordinate the user groups in
terms of training, support, and sampling for the various research studies.
NPAC, with assistance from NYSERNet, will coordinate access.<p>
<p>
(2)	<u>Digital Library Testbed and Facility</u>:  The testbed will integrate
text, bibliographic, image, and video resources from a number of partners.
Scholastic, Inc. will coordinate, with assistance from AskERIC, the development
of the digital library testbed. NPAC will provide the host environment and
networking facilities for the testbed.<p>
<p>
(3)	<u>Intelligent Retrieval</u>:  Users will be able to access the digital
library through an intelligent Digital Librarian that the research team will
develop, in addition to accessing it through existing Internet retrieval
methods and the AskERIC human intermediary service. This will enable direct
empirical comparisons between the intelligent Digital Librarian and the
existing human-mediated computer-based systems. The Digital Librarian will also
be extended into a more adaptive, connectionist NLP system. IST will coordinate
the research, development, and testing with assistance from NPAC and Xerox
PARC, and image retrieval with Scholastic, Inc. and Dialog.<p>
<p>
(4)	<u>HPCC System Environment</u>:  The technological foundation for the
testbed, retrieval systems, and interactions with users will be the HPCC
environment. NPAC, working with IST and Scholastic, Inc., will integrate an
evolving high-performance retrieval system with a range of delivery modes.
Research on scalable NLP systems with distributed, multimedia databases will
provide needed insight into digital library performance issues. 

<p>
(5)	<u>Impact and Policy</u>:  Underlying all of these components as well as
the ultimate success of digital libraries is the need to determine the impact
on performance and behavior and to understand the issues related to financing,
access, collection management, and standardization. IST will coordinate the
impact and policy studies with assistance from Scholastic, Inc. and AskERIC. A
number of partners will also participate in conducting assessments and forming
recommendations.<p>
<img src="figures/liddy1.gif"><p>
<i>Figure 1. The Intelligent Digital Library Model.</i><p>
<b><p>
Research Plan</b><p>
The proposed research covers a wide spectrum of disciplines and addresses a
rich set of issues that need to be answered in order to provide the maximum in
digital library services--from constructing the testbed, to networking a large
number of sites, to developing an HPCC implementation, to conducting basic
research in information retrieval, to evaluating the digital library's impact
on its users' everyday lives, to the policy issues that surround this rich new
service. The basic research we will conduct in this project will cover the
following major areas:<p>
<p>
(1)	Intelligent retrieval via a natural language text retrieval system
(DR-LINK)<p>
(2)	Scale-up of intelligent retrieval in an HPCC environment<p>
(3)	Evaluation of digital libraries' impact on users' behavior and
performance<p>
(4)	Policy implications of the creation, use, and availability of a digital
library<p>
<p>
Together, the outcomes from these major research efforts will provide the means
for moving toward the digital library of the future that is effective,
efficient, and widely beneficial.<p>
<b><p>
Research Area 1:  Intelligent Retrieval</b><u><p>
Problem</u>:  Information seekers want to express their information needs
naturally and with all necessary detail. They want the system to understand the
underlying meaning of their query in all its complexity and subtlety. Equally
important, they want the system to have represented the contents of documents
at a sufficiently rich level that it can respond to their query at its deep,
conceptual level. Instead, information seekers currently find that network
access does not provide for substantive, complex queries expressed in a full,
natural mode of expression to be responded to sufficiently. Most searching and
retrieval via the networks is currently performed at the surface
string-matching level--word for word matching with no accounting for the
complexity and ambiguity of the language. Furthermore, users cannot find the
information they need in an easy, straightforward manner that would encourage
their continued use of the network as an information resource. Although some
information seekers overcome the hurdles in their path, the available network
tools do not encourage access, nor do the circuitous routes whereby one may
find the gem lodes (given persistence, good luck, and sufficient time) invite
extensive usage.<p>
<u><p>
Solution</u>:  To overcome these barriers, we are proposing to develop a
Digital Librarian that can understand and act upon the same type of
wide-ranging natural language queries that users so easily and naturally ask of
a human intermediary. The documents, as well as the text accompanying images,
in the digital library testbed will be processed into a linguistic
representation as rich as that of the queries so that truly conceptual-level
retrieval can be performed.<p>
The testbed's intelligent retrieval capability will come from DR-LINK, a
prototype text-retrieval system developed under the auspices of ARPA's recent
TIPSTER Initiative (Liddy &amp; Myaeng, 1993). DR-LINK (Document Retrieval
Using LINguistic Knowledge) was developed to serve the information needs of
analysts who have complex, high-precision/high-recall information needs, which
are input in unrestricted natural language. Although the average network
inquirer may not currently be in the practice of posing queries as detailed as
the analysts', research has shown that users will easily produce rich, complex
statements of their information needs if they know that the system can use this
level of specific relevance requirement for more productive results (Oddy et
al., 1992).<p>
The NLP approach implemented in DR-LINK represents and matches documents and
queries at all levels of linguistic expression (morphological, lexical,
syntactic, semantic, and discourse) at which meaning is conveyed for purposeful
communication by human beings. In concert, these levels of representation
provide the capabilities whereby the system can truly aspire to intelligent
retrieval. Given this full range of linguistic processing, DR-LINK has the
potential for providing a level of intelligent retrieval that is now available
only through a human intermediary. That is, DR-LINK will:<p>
<p>
* comprehend the subtlety of a user's information need;<p>
* recognize the user's multiple dimensions of relevance requirements;<p>
* understand the complexity of ways in which relevant information might be
expressed in various information sources; and<p>
* retrieve those documents that are responsive to a query at the conceptual
level.<p>
<u><p>
Research Topics</u>:  We will investigate three basic research topics within
the project's intelligent retrieval component. The three basic research goals
are:<p>
<p>
A.	To develop an intelligent retrieval system that will use NLP to respond to
information queries with performance comparable to the information provision
capabilities of human intermediaries.<p>
<p>
	This basic research will involve the development of the Digital Librarian
which will emulate the information skills of the human intermediary whose
understanding of both theuser's information need and the information content of
documents is not limited to surface-level lexical comprehension; rather, the
intermediary interprets queries and documents at the multiple levels at which
meaning is conveyed in human language, from pure lexical pattern matching, to
recognition of all semantic equivalencies of needed concepts, to the discourse
level where structuring of information content conveys important aspects of a
text's meaning. We propose to bring this same level of natural expression and
system understanding to the broad range of users who will access the digital
library.<p>
	Within this project, we will create a Digital Librarian that simulates the
human intermediary's ability to retrieve information on the basis of what
queries and text mean, not just what they say.  This will be done by extending
DR-LINK's successful but still nascent NLP approach to the representation of
the queries that will be put to the digital library and to the processing of
the library's contents into the enriched linguistic representation provided by
the DR-LINK system.<p>
 <p>
B.	To achieve a major paradigm shift from the symbolic NLP approach used for
processing documents in the current DR-LINK Information retrieval system to a
more adaptive, robust, connectionist NLP approach.<p>
<p>
	The NLP approach we have pursued in DR-LINK is symbolic in nature. Successful
and impressive at<p>
	the conceptual level, it relies on intensive domain knowledge and a rule base.
However, the need to extend its use to new types of texts and queries has
caused us to re-evaluate the basic paradigm. Although the system is not
domain-dependent, its symbolic NLP approach requires much human effort in the
discovery and coding of the necessary regularities (rules) in text and the
development of linguistic knowledge bases. Applying DR-LINK to a new
environment therefore might be more time consuming than is necessary,
undermining the versatility and practicality of our system. To significantly
reduce this overhead of human effort, the DR-LINK system needs to be
transformed into an adaptive system that, when applied to a new environment,
can automatically extract (or learn) the rules (or regularities) and encode the
knowledge. The connectionist (neural network) approach possesses several
important properties that can be exploited to build the adaptive version of the
DR-LINK system.<p>
<p>
	In building an adaptive version of the DR-LINK system that can learn the
statistical regularities and rules, we will use two types of neural networks,
the feedforward neural networks with the backpropagation algorithm (Rumelhart
et al., 1986) and the simple recurrent networks (Elman, 1990; 1991). The former
possesses all the desirable properties described in the previous paragraph and
will be used to learn the required and, possibly, non-linear, I/O mapping in
the DR-LINK system (e.g., the mapping between the lexical clues and the text
structure components). The simple recurrent network is an extension of the
BPFFNNs and is capable of learning and recording temporal information. This
additional property is extremely useful for an NLP system such as DR-LINK,
because it enables us to build a connectionist NLP model that retains many
properties of conventional symbolic models, such as context sensitivity and
sensitivity to the compositional structures (Sharkey &amp; Reilly, 1992). In
the process of constructing such an adaptive system, some related issues also
need to be investigated. They include the determination of the appropriate
network size and topology, the proper construction of the training set
(samples) that reflect the essential characteristics of a domain, and the
evaluation of the new system's performance in comparison with the symbolic
version of the DR-LINK system.<p>
<p>
C.	To understand the information-seeking behavior of image seekers in order to
extend the Digital Librarian's capabilities to include the intelligent
retrieval of images, first via the natural language annotations that accompany
the images in our testbed, and eventually via retrieval on the features of
images themselves.<p>
<p>
	In the area of image retrieval, there will be two levels of effort. From the
first day that the testbed is accessible, we will be able to provide images as
well as text as an information source by using the text that accompanies the
majority of images in our testbed (e.g., captions on photographs or lesson
plans accompanying CNN footage). DR-LINK will process the accompanying text,
enabling us to provide text-based image retrieval by means of the NLP retrieval
capabilities available in the system. <p>
<p>
	As the project advances, image retrieval will move to a second level of
investigation--a basic inquiry into preferred methods of searching for images.
We will conduct an exploratory, observational study of users searching for
images. To understand which aspects of images (both fixed and moving) are
important for searching, and what mechanisms are needed for both indexing and
retrieving images, we propose to investigate the types of searches real users
engage in when looking for a particular image, and which aspects of the images
are important for successfully retrieving it. We will observe the testbed's
users to learn how they search collections and what aspects are important for
retrieval.We are interested in learning: <p>
<p>
	*	What aspects of images are important for 	searching?<p>
	*	What types of searches do users engage in when 	looking for an image?<p>
<p>
	We will first incorporate what we learn from these observations into the
DR-LINK query sublanguage analyzer. Then will empirically test text-based vs.
icon-based searching for images, as well as the use of DR-LINK's natural
language query analyzer to map users' free-text queries into a controlled
vocabulary searching thesaurus.<p>
<b><p>
Research Area 2:  High-Performance Digital Library System</b><u><p>
Overview</u>:  The electronic digital library will inevitably be distributed
over the Internet; this does not present a barrier to meeting the needs of
users, because it is easier to move from one network location to another than
to walk into an adjoining room in a physical library.<p>
Each digital library site of the future must be capable of storing massive
amounts of text and image resources and of providing both local and remote
users with intelligent retrieval methods. These requirements dictate the need
for an HPCC environment.<p>
At each site, there will be overhead and potential bottlenecks associated with
the activities of (a) locating documents, which can be computationally
intensive, and (b) retrieving and delivering documents, which relies for speed
on file retrieval hardware and software, on network bandwidth, and on
communication protocols. A cost-effective and efficient site will support these
activities in a system-balanced manner, such that, for example, the available
bandwidth is matched to the speed with which documents can be retrieved and
these are matched to the rate at which requests can be processed by the
computational engine.<p>
Achieving a balanced system requires monitoring all aspects for bottlenecks and
often implies using data so acquired to reconfigure the system. Research,
prototyping, evaluation, and feedback cycles will be important for all aspects
of system and network performance. An intelligently modular design is also
important so that changes can be made with little impact on system stability.<p>
<u><p>
Problem</u>:  The digital library of the future must be capable of storing
massive amounts of text and image resources and of providing both local and
remote users with intelligent retrieval methods. These requirements dictate the
need for an HPCC environment. To adequately meet the demands of an effective
digital library, it will be necessary to create a balanced system that will
efficiently handle very large digital libraries of text and multimedia, that is
responsive to the need for different types of searches and different
requirements for organizing information, and that is distributed and
modularized in such a way that it will maintain its integrity and performance
as it is (a) indefinitely expanded and (b) upgraded at the hardware and
software component level.<p>
<u><p>
Solution</u>:  To meet this need, we propose to create a fully distributed
client/server document retrieval system operable across platforms and at any
bandwidth. Specifically, we plan to:<p>
<p>
* Develop parallel versions of DR-LINK and document retrieval<p>
* Develop multiple types of servers for handling the various retrieval and
application functionalities<p>
* Develop effective message-passing that has good performance for both small
messages and large data transfers and can convey all necessary information
among modules<p>
* Separate document storage management from information management<p>
* Work closely with users to effect a user access design that reflects the ways
users work naturally or intuitively with the system<p>
<u><p>
Research Topics</u>:  The HPCC component of the research centers on three
fundamental research goals. These are:<p>
<p>
A. 	To enhance scalability by networking the DR-LINK environment and developing
a high-performance version of DR-LINK document processing.<p>
		<p>
B. 	To integrate DR-LINK with other types of electronic information, such as
multimedia and relational databases.<p>
		<p>
C. 	To scale the digital library testbed environment so that it can adequately
handle the massive amounts of diversified traffic it will be called upon to
support.<p>
<p>
These goals raise a number of issues that we will need to address.  These
include:<p>
<p>
* Transaction management for DR-LINK natural language query processing<p>
* Parallel processing strategies for optimizing text processing<p>
* Balancing I/O with the central processing unit (CPU) processing both for
creating the text database index and for handling multiple complex queries<p>
* Design of client interfaces, incorporating user-driven options<p>
* Refinement of the design by evaluating user interactions with the system<p>
* Continuing research on the use of computer processes and procedures as
sources of real information in virtual documents (i.e., information creation by
simulation, data analysis, database mining)<p>
* Integration of text search with heterogeneous data types and with
complementary search methodologies, such as image pattern recognition and
standard query language (SQL) queries<p>
* Efficient delivery systems that are performance-balanced from file storage to
display<p>
<p>
Supporting research will include strategies for facilitating fast multimedia
file retrieval, developing and optimizing network transfer protocols,
evaluating network traffic, and evaluating the total HPCC query processing and
documents delivery system for bottlenecks as the basis for design refinement.<p>
<b><p>
Research Area 3:  Impact of Digital Libraries</b><u><p>
Problem</u>:  Past introductions of new information technologies and systems
suggest there will be both intended and unintended users, uses, and barriers to
the use of digital libraries. And although there has been much fanfare in the
media about the significance of the new "information highway," little is known
empirically about this new technology's effect on the behavior, practices, and
outcomes of individuals and organizations.<p>
<u><p>
Solution</u>:  Given the almost immediate availability of our digital library
testbed to the K-12 environment, we will be in the fortunate position of being
prepared to conduct longitudinal evaluation studies of its impact over the span
of the project and perhaps beyond. <p>
We will identify and collect data about specific services or activities,
establishing criteria by which their success can be assessed, and determine
both the quality of the service or activity and the degree to which the service
or activity accomplishes stated goals and objectives (Van House, Weil, and
McClure, 1990). As such, our evaluation will (1) insure that the
highest-quality services are provided to the intended users of those services,
and (2) assist decision makers in allocating necessary resources to those
activities and services that best facilitate the accomplishment of
organizational goals and objectives (Hernon and McClure, 1990).<p>
<u><p>
Research Topics</u>:  This project will provide the educational community with
network access to the first K-12 digital library.  For this model to be used
for more extensive application, we must know what impact the project has on its
targeted user groups.  Our evaluation studies will be guided by the following
research topics:<p>
<p>
A.	What is the degree to which digital library services and materials resolve
user information needs?<p>
		<p>
	Ongoing monitoring and evaluation of how users assess the quality of digital
library service, and for what purposes digital information is being used (as
compared to traditional types of information), can help the testbed's designers
to better respond to the needs of users and potential users.<p>
<p>
B. 	What are the costs for developing the digital library as a whole, as well
as for specific individual services?<p>
		<p>
	Planners of digital libraries need information as to what costs can be
expected in the development of digital library products and services. However,
the traditional cost categories that are used in planning conventional
libraries may not be appropriate for use in a digital library context. Being
able to conduct cost analyses of selected services is essential for conducting
the policy research on pricing services.<p>
C. 	What are appropriate performance measures for assessing the efficiency,
effectiveness, extensiveness, and impact of digital libraries?<p>
		<p>
	In addition to determining the degree to which digital libraries resolve user
information needs and identifying digital library costs, it is essential to
extend research on developing performance measures for digital libraries. Work
by Van House, Weil, and McClure (1990) has demonstrated the importance of
developing validated performance measures for academic libraries. This research
will be extended into the digital library environment.<p>
<p>
	The research questions identified here are significant and suggest that
ongoing user-based evaluation of the digital library's impacts should accompany
the design and implementation of specific services. These key questions will
help the research team to better understand the impact of digital libraries and
provide ongoing, user-based, formative assessments that will inform decisions
related to the design and management of this and future digital libraries.<p>
 <b><p>
Research Area 4:  Policy Research on Digital Libraries</b><u><p>
Problem</u>:  The networked information environment, so far, has been driven
primarily by the technology of computers, electronic networks, and
telecommunications facilities. Government and the larger society are now
witnessing the preliminary results of a technology-driven approach to the
information infrastructure. These results include disparities between those who
can and those who can't access networked resources; conflicts over privacy and
access to information; reluctance on the part of information creators to make
their information available because of cost/pricing concerns, and intellectual
property protection; the competing, sometimes conflicting use or the complete
lack of technical standards; and other societal and political issues and
concerns.<p>
<u><p>
Solution</u>:  Our research includes a specific component to investigate policy
issues related to the creation, use, availability, and effectiveness of a
network-based multimedia digital library. The effort will draw upon, in part,
the digital information developed for inclusion in the digital library as well
as policy analysis of these topics from broader perspectives , e.g., the
existing Federal laws and regulations. After identifying existing policy in the
areas addressed by the research questions detailed below and the issues guided
by those questions, we will develop and assess policy options and
recommendations in the areas of:<p>
<p>
* Financing and pricing information resources and services in a networked
environment<p>
* Providing universal access to and universal usability of digital libraries<p>
* Managing the information resources contained within a digital library and
addressing such critical concerns as intellectual property, privacy, and
censorship<p>
* Developing and implementing technical standards and standardization policy
for digital libraries<p>
<p>
Policy research and analysis in these key areas will provide the foundation for
counteracting the current technology-driven nature of the emerging information
infrastructure. In this way, the research will assist policy makers, users, and
other stakeholders in resolving the more complicated social, political, and
economic questions that have emerged with the development of the networked
information environment.<p>
<u><p>
Research Topics</u>:  The following research questions address critical
concerns for digital library and networked information environment operations
and use.<p>
<p>
A. 	What collection and access management policies can effectively address the
characteristics of networked information services and resources? <p>
	<p>
	Traditional, print-based libraries have developed policies to manage their
collections and access to them. The networked environment calls into question
these traditional notions of collection development as well as the patterns of
access and use of information resources. It also calls into question
long-standing intellectual property arrangements among publishers, information
providers, libraries, andtheir users. Publishers and electronic information
providers are currently experimenting with various models to assure copyright
protections and other intellectual property rights in the electronic
environment. Collection development by digital libraries must accommodate
information providers' concerns regarding these rights.<p>
	<p>
	Traditional libraries also have developed effective safeguards on patron
privacy, and many libraries have in place statements and policies on
intellectual freedom and censorship. Digital libraries will need similar
policies, modified to reflect the specific characteristics of the networked
environment. This research will identify policy categories related to
collection and access management and detail the priority issues in each
category. Our goal is to develop and recommend a policy framework for this
broad area of collection and access management.<p>
	<p>
B.	How will digital libraries be financed, and how will information resources
and services be priced, in a networked environment?<p>
	<p>
	Networked services and resources are not free, although for many users they
appear to be "free" because there are no direct charges attached. The operation
of a digital library will require funding, and it is still unclear (1) how
digital libraries will be supported, and (2) how they will cost and price their
services.<p>
	<p>
	Pricing is a key policy issue that has yet to be addressed and resolved for
the delivery of any networked information service, resource, or product.
Indeed, it is impossible to address pricing policy without considering other
policy issues, such as the public and private sectors' roles in financing
digital libraries and access to them. There exists a well-established body of
literature related to the economics of information, but there is less knowledge
about the economics of digital information, especially digital information
within the context of a digital library.<p>
	<p>
	This research question will guide an investigation into the costs of digital
libraries (e.g., network connections, digitization of information,
distribution, etc.); their pricing of resources, products, and services; and
funding mechanisms to support digital libraries. Models for these three areas
will be developed that can inform policy options and recommendations.<p>
	<p>
C.	What constitutes universal access to digital libraries, and who is
responsible for ensuring that such universal access occurs?<p>
	<p>
	The Clinton Administration's National Information Infrastructure: Agenda for
Action (1993) identifies "universal access" and "universal service" as key
operating principles for the evolving networked information environment. These
terms have yet to be operationalized, and, equally important, some critics have
questioned the purposes to which universal access will be put. There appears to
be a close connection between the policy of universal access and the resulting
utility and effectiveness of that access.<p>
	<p>
	Barriers that impede universal access to networked information need to be
identified and policy must be developed to minimize the impact of those
barriers. Some of these barriers are related to the notion of technical and
network literacy, the collection of technical or other skills users will
require to use digital libraries effectively.<p>
	<p>
D.	What technical standards are needed for the components of the digital
library, and what are the mechanisms for evolving these standards?<p>
	<p>
	Technical standards are embedded within the technology that currently is
driving the networked environment. The range of standards that may be useful
for digital libraries is extensive and includes standards for
telecommunications and computer protocols, data formats, and data interchange,
among others. Unfortunately, conflicting and competing standards and standards
development processes and organizations may reduce the likelihood of seamless
interoperability and interworking of the components of the networked
environment. In addition, the dynamic nature of existing and emerging
information technology calls into question whether the traditional standards
development processes can respond in a timely manner with useful (i.e.
implementable) standards.<p>
	<p>
	Policy research is needed to identify which standards will be necessary for an
effective interoperable and user-based digital library. An inventory of
existing and proposed standards will assist in determining the degree to which
they meet or conflict with emerging operating requirements for a digital
library.<p>
	<p>
	Research is also necessary to determine the mix of responsibilities among the
government, public, and private sectors in the development of standards. Since
there is no existing or overarching standards policy that guides the
development of standards for the information infrastructure, this research will
identify the components and responsibilities for a model of standards
development that responds to the needs of digital libraries.<p>
<b><p>
Significance</b><p>
This research directly addresses the key concerns of emerging digital
libraries. Owing to the range of expertise of the research team, we are well
prepared and capable of investigating the following topics. In particular, we
will:<p>
<p>
Categorize/index/organize electronic information:<p>
<p>
* Integrate public and commercial text, still-image, and video collections in
one testbed. Most of the collection has already been digitized, thereby
permitting us to focus on the content and access issues from day one of the
project  <p>
* Produce a level of conceptual indexing quite beyond the capabilities of
normal keyword indexing techniques by using the multiple levels of linguistic
processing produced by DR-LINK<p>
* Provide subject classification of, and seamless subject access to, documents
stored in distributed databases via the proven Subject Coding approach of
DR-LINK<p>
<p>
Develop advanced software for searching/filtering/browsing:<p>
* Extend the discourse-level processing by DR-LINK to new genres and query
types for improved intelligent retrieval<p>
* Achieve and empirically compare with human retrieval performance, a new level
of intelligent retrieval via NLP of both texts and queries<p>
* Improve DR-LINK's robustness and portability by developing an adaptive,
connectionist model of the current symbolic NLP system<p>
* Develop a data-grounded understanding of users needs and search behaviors in
image retrieval and produce a model for feature-based indexing<p>
* Empirically compare text-based and icon-based indexing and retrieval of
images<p>
<p>
Address networking standards/protocols/policies:<p>
<p>
* Investigate protocol design as well as scalability issues in an HPCC system
implementation<p>
* Conduct users based studies to learn the impact of the digital library on its
users' behavior and performance<p>
* Develop and validate new performance measures for use in evaluating digital
libraries<p>
* Provide recommendations from broad-based policy studies on costs, ownership,
financing, access, and other crucial issues<p>
<p>
The strengths of the research project include the breadth and significance of
the research issues, the coordinated approach, and the extensive expertise of
the research team members. In addition, the project incorporates a number of
perspectives - academic, public-sector, and commercial - into a unified
approach. Additionally, the project's research findings will be directly
applicable and transferable to a wide range of business, government, and
recreational uses and settings. Lastly, we will produce a digital library and
Digital Librarian that will be made available to all educators and students.<p>

<!--#include virtual="/DL94/footer.ihtml" -->
Last Modified: <!--#echo var="LAST_MODIFIED" --> <br>

</body>
</html>

