<html>
<head>
<title>
DL94: The CoLib Project
</title>
</head>

<body>

<!--#include virtual="/DL94/header.ihtml" -->

<h1>The CoLib Project: Enabling Digital Botany for the 21st Century</h1>
<p>
John L. Schnase[1], John J. Leggett[2],
Ted Metcalfe[1], Nancy R. Morin[3],
Edward L. Cunnius[1], Jonathan S. Turner[4], 
Richard K. Furuta[2], Leland Ellis[5],
Michael S. Pilant[6], Richard E. Ewing[6], Scott W. Hassan[1], 
and Mark E. Frisse[1]<p>
<i>
[1] Advanced Technology Group, School of Medicine Library, Washington University School of Medicine; 
[2] Hypermedia Research Laboratory, Department of Computer Science, Texas A&amp;M University; 
[3] Missouri Botanical Garden ; 
[4] Department of Computer Science, Washington University; 
[5] W.M. Keck Center for Genome Informatics, Institute of Bioscience and Technology, Texas A&amp;M University; 
[6] Institute for Scientific Computation, Texas A&amp;M University<p>
</i>
<pre>
</pre>

<b><p>
Abstract</b><p>
The CoLib Project is a multi-institutional, inter- and intra-disciplinary
effort aimed at establishing a large-scale, distributed, botanical digital
library and research facility. Its initial focus is on enabling the Flora of
North America Project (FNA), a collaborative effort to gather and disseminate
information on all the plants of North America.  CoLib's long-range goals,
however, are the following:  (1) to scale FNA into a comprehensive, world-wide,
internetworked, botanical digital library, (2) to enable the scientific
practice of botany, (3) to use digital library technology to create new
opportunities for cross-disciplinary sharing, and (4) to create a context
wherein the librarianship that is intrinsic to systematic botany can be
extended to the work that is currently underway on digital libraries.<p>
Large-scale hypermedia technology and broadband communications will be key
components of future digital libraries. CoLib's research agenda focuses on
collaborative hypermedia library systems, interactive multimedia computing and
communications systems, and librarianship in the botanical digital library.
The CoLib Project is providing a rich environment for research on large-scale,
distributed, digital library systems and organizational processes associated
with the use of digital libraries.  <p>
<b><p>
Keywords </b>-- ATM, botanical informatics, collaborative hypermedia library
systems, hypermedia-in-the-large, scientific practice, systematics, digital
library machine.<b><p>
<p>
<p>
<p>
1.  Introduction</b><p>
In the future, our work will be mediated through rapid, coordinated access to
shared information.  Digital libraries will provide the substrate for this
dialog. Through shared digital libraries, people will collaborate with
colleagues across geographic and temporal distances. They will use these
libraries to organize personal information spaces and to read, write, teach,
learn, and create.  In traditional fashion, their intellectual work will be
shared with others through the medium of the library--but their contributions
and interactions will be elements of a global and universally accessible
library that can be used by many different people and many different
communities. By increasing the effectiveness and speed with which information
is communicated and used, digital libraries are likely to potentiate major
paradigm shifts in science.  They will advance existing areas of study, promote
disciplinary fusions, and enable entirely new discourses of study.  <p>
The CoLib Project is a multi-institutional, inter- and intradisciplinary effort
aimed at establishing a large-scale, distributed, botanical digital library and
research facility.  The technology component of CoLib's research agenda focuses
on collaborative hypermedia library systems and interactive multimedia
computing and communications systems.  We are using Asynchronous Transfer Mode
(ATM) network technology for our communications infrastructure.  We believe
that large-scale, collaborative hypermedia technology and broadband
communications will play a critical role in enabling scientific practice. <p>
As a starting point, CoLib is focusing on enabling the Flora of North America
Project (FNA), a major collaborative effort, to gather and make available, in
digital and printed form, the most up-to-date information on the 20,000 species
of vascular plants and bryophytes of the continental United States and Canada.
CoLib will scale FNA into a comprehensive, worldwide, internetworked, botanical
digital library.  This will allow the librarianship that is intrinsic to the
practice of systematic botany to be extended to the broad spectrum of work that
is currently underway on digital libraries.  More important, however, we
approach CoLib as a means to opening many scientific disciplines through the
clarifying and unifying frame of biological diversity.<p>
The major participants in the CoLib Project include Washington University,
Texas A&amp;M University, and the Missouri Botanical Garden.  We call the
initiative "CoLib" because its focus is on expanding the range of
collaborations, participation, and activities that can occur when digital
libraries enable scientific practice.  Our hope is that the CoLib Project will
provide an opportunity for theoretical and empirical research that is firmly
grounded in real-world needs and will help bring the conduct of botanical and
library sciences into a new age where computer systems, high-speed networks,
and ubiquitous access to information will revolutionize existing concepts of
publication and research.<p>
This paper provides an overview of the CoLib Project.  We present our rationale
for CoLib, describe the project, and then discuss its implications for the
science of botany and for research on digital libraries in general. <b><p>
<p>
2.  Why Botany?</b><p>
The fate and economic prosperity of human populations are inextricably linked
to the natural world.  Plants, in particular, are critical to our ability to
maintain livable communities.  All life on Earth depends on plants, algae, and
certain bacteria to convert the energy of the sun into food and other forms of
usable chemical energy. Beyond their value as a biological resource, plants
contribute to the aesthetic quality of life, moderate climates,  and increase
our understanding of environmental processes. <p>
It is not surprising that information relating to plants is vital to a wide
range of scientific, educational, commercial, and governmental uses.
Unfortunately, much of this information exists in forms that are not easily
used.  From rare book manuscripts to scattered and eclectic databases and
physical specimens mounted on sheets of paper and preserved in herbaria
throughout the world, our record of botanical information is largely
inaccessible.  Except for traditional, manual practices, there exist no
comprehensive technology or organizational framework that allows this
information to be used effectively by research scientists or other potential
client communities.  <p>
There are, therefore, several important reasons why a large-scale,
internetworked, botanical digital library would be of value:<p>
*	<b><i>Botanical information is important</i></b>.  There are currently about
250,000 species<a href="#fn0">[1]</a> of plants on Earth.  At least 25,000
species probably await discovery.  While this represents only a small percent
of the estimated 100 million existing species of organisms, the importance of
plants is unequaled [29].
 For example, about 100 species of plants are directly or indirectly the source
of virtually all the calories consumed by the human population.  Most
medicines, including 20 with the largest worldwide sales, contain ingredients
derived from plants.  Plants contribute to the integrity of global ecosystems
and are a critical component of America's economic base.  The direct and
indirect value of our botanical resource and its related information is
virtually impossible to estimate [15].<p>
*	<b><i>Botanical information is abundant and media-rich</i></b>.  It is
difficult to estimate the quantity of information required to describe living
species.  If all the diversity of the world were revealed and described
one-page-to- a-species and then published in bound thousand-page volumes, it
would take nearly six kilometers of shelving to house this information [29].
 This is the size of a medium-size public library. However, the complete record
of biological information is orders of magnitude greater than this and exists
in media types far more complex than paper.  Botanical information certainly
resides in traditional library holdings, such as books, monographs, and
journals.  It also exists in scores of institutional and individual databases
and in hundreds of laboratory and personal field journals scattered throughout
the country.  The use of spatial information, geographic information,
simulation, and visualization techniques is proliferating, along with an
increasing reliance on two- and three-dimensional images, full motion video,
and sound.  Depending on the area of research, information may exist in the
form of genetic maps of plant chromosomes or electron micrographs of
subcellular structures. <p>
Two important forms of botanical information are Floras<a href="#fn1">[2]</a>
and herbarium specimens<a href="#fn2">[3]</a>. A botanical digital library
built on next-generation communications and multimedia computing technologies
can bring coherence to this vast and diverse assemblage of information.<p>
*	<b><i>A large-scale, botanical digital library would be an important national
resource</i></b>.  Over the years, the United States has created laws and
policies to encourage responsible stewardship of botanical resources.
Regulatory programs, acquisition of public lands, private conservation efforts,
etc., have aimed at understanding how to sustainably support the plant-related
goods and services upon which we all depend.  Despite these endeavors, the
nation's botanical biodiversity is in decline, and our understanding of key
biological processes is incomplete.  Critical long-term environmental decisions
continue to be made based on inadequate information.  There is no effective
cross-institutional framework for identifying and conducting research of the
highest priority, coordinating among current and future research activities, or
making botanical information available in coherent and usable ways to the many
agencies and organizations that have responsibilities for protecting,
restoring, and managing our biological heritage [15].
 A national botanical digital library would provide an organized framework for
collaboration among federal, regional, state, and local organizations in the
public and private sectors; provide improved programmatic efficiencies and
economies of scale through better coordination of efforts; and provide an
extensive and common information base that could be used to anticipate and
lessen potential conflicts about biological resources.<p>
*	<b><i>A botanical digital library can focus ongoing independent research
efforts</i></b>.  Numerous national and regional efforts are underway to
improve the way researchers access collections data and communicate with one
another, and significant funding from private and public sources sustains the
ongoing collections management and informatics efforts of our nation's
free-standing museums of natural history and botanical gardens [1].
 Additionally, the National Research Council Committee on the Formation of the
National Biological Survey confirmed the need for a National Biological Survey
for resource management and future research.  It has also been recommended that
work on plants have high priority [15].
 A botanical digital library would provide a common focus for these independent
efforts and would build upon the substantial investments that have and will be
made in components of such a library.  A botanical digital library would do
more than simply complement ongoing informatics efforts; it would significantly
leverage these activities by creating a more global and neutral context for
sharing information, thus broadening participation and creating opportunities
for greater "buy in" from new client communities.<p>
*	<b><i>A botanical digital library would serve many client groups</i></b>.  Of
the various types of scientific information, botanical information is of value
to perhaps the widest range of client groups. Botanists, ecologists, scientists
and researchers from other communities, forests mangers, farmers, municipal and
regional planners, agriculture consultants and extension agents, weed and pest
controllers, recreation managers and planners, flood control engineers,
landscape architects, interior designers, plant breeders, seed companies,
animal feed companies, dermatologists, rare and endangered species agents,
poison control centers, publishers, artists, and teachers are among those who
would use information contained in a convenient, network-accessible botanical
digital library.  It is important to realize that most of these client groups
are currently isolated from botanical information either because the
information is not available in an electronic form, or because the lack of a
coherent and usable information infrastructure results in ineffective use of
existing digital information [1, 14, 15].<b><p>
<p>
4.  The CoLib Project</b><p>
We assume that botanical information in the future will be produced,
transmitted, and consumed primarily in electronic form [1,14].

This will represent a major paradigm shift for the science of botany and will
require substantial rethinking of how to design, implement, and use botanical
information systems at all levels.  In order to achieve effective solutions and
bring coherence to the field, we feel that it is necessary to view the
challenges of botanical informatics from a digital libraries perspective.  <p>
The CoLib Project consists of the following components:  (1) a large-scale,
geographically-distributed, collaborative research project that will benefit
from the application of digital library technologies, (2) a substantial
collection of botanical  information whose development, analysis, access, and
associations with other resources will be extended  by the effort, (3) specific
client communities and software environments that will utilize CoLib's
technology and information products now and in the future, (4) high-capacity
ATM networks as the primary communications infrastructure, (5) institutional
infrastructure to support the library, and (6) a long-term plan for management
of the library and coordination of research activities and commercial and other
development.   We now briefly describe these elements.<b><p>
<p>
4.1.  The Flora of North America Project</b><p>
As indicated above, the Flora of North America Project (FNA) is at the heart of
the CoLib Project.  FNA will gather and make available, in digital and printed
form, information on the 20,000 species of vascular plants and bryophytes of
the continental United States, Canada, and Greenland--the first overall account
of the plants of this vast area.  The FNA workforce currently numbers over 500
scientists, including professional plant taxonomists in North America and other
parts of the world, biologists in government agencies, such as the U.S. Forest
Service, Bureau of Land Management, U.S. Fish and Wildlife Service, and state
conservation and biological survey offices.  The FNA editorial committee
consists of 34 plant taxonomists distributed throughout the United States and
Canada.  Thirty institutions have committed major staff time and facilities to
the successful completion of the Flora.  The project began in 1985 and is
expected to be completed early in the next century [13,14].<p>
FNA researchers study plants by examining herbarium specimens and critically
evaluating published reports of previous work.  Sometimes it is necessary to
perform detailed biochemical, electron microscopic, or other studies of the
plants.  The assemblage of information resulting from the analysis of a plant
is referred to as a "treatment."  Every treatment includes original research as
well as a fresh consideration of any previous studies, and each treatment is
prepared by a taxonomic specialist.  Treatments fully integrate knowledge about
the plant from vast and diverse sources of information.  Many of the treatments
present for the first and only time knowledge resulting from a researcher's
lifetime of study.  The  treatments also incorporate results of recent research
that otherwise might not become available to nonspecialists for many decades.<p>
The Missouri Botanical Garden is the Organizational Center for the Flora of
North America Project.  At present, the following work is done at the
Organizational Center once a treatment is received:  information is
incorporated into the FNA database; the treatment is technically edited and
sent for bibliographic and nomenclatural editing; the final sequence of taxa is
checked, and information is added to tracking files for maps and illustrations;
synonymy is checked against lists of names accepted in major floras and
checklists; distribution is checked against maps; maps are scanned or redrawn;
illustrations, which have been prepared through close consultation with the
author, based on living material, photographs, and herbarium specimens, are
completed.  Manuscripts, maps, and illustrations are usually sent out for
review by regional collaborators.  In addition to regional reviewers, special
reviewers in the areas of weeds, agriculture, and horticulture are asked to
review all relevant treatments, and two editorial committee members act as
additional outside reviewers.  Twenty-six regional floristic specialists have
agreed to review treatments of taxa that occur in their areas. Overall
coordination of the effort is the responsibility of the convening editor (Dr.
Nancy Morin).<p>
Through a cooperative agreement with Oxford University Press, the Flora of
North America Project is being published as 14 printed volumes at the rate of
approximately one per year.  Plans are also underway for Oxford University
Press to produce CD-ROM versions of the Flora and to make the Flora available
on-line. The information produced by FNA will provide a unified framework for
basic and applied research dealing with North American plants and plant
products.  The project has also become the model and focus of debate on the
future of floristics and systematic botany.<b><p>
<p>
4.2.  Collaborating Institutions</b><p>
CoLib's primary institutional collaborators include:  <p>
*	<b><i>Missouri Botanical Garden</i></b>. The Missouri Botanical Garden,
founded by Henry Shaw and opened to the public in 1859, is the oldest botanical
garden in the United States.  The Garden's Research Division conducts the
world's most active program for collecting and studying vascular plants and
bryophytes from throughout the world, especially those of the New World and
African tropics.  Because of the size of its herbarium, the size of its
library's holdings, its large proportion of historical specimens, and its
formal and informal collaborative agreements with numerous other institutions,
the Garden is recognized as one of the most significant global repositories and
resources of botanical information.<p>
*	<b><i>Washington University</i></b>.  The Computer Science Department at
Washington University is recognized for its role in the intellectual and
practical development of Asynchronous Transfer Mode (ATM) network technology.
Project Zeus focuses on the practical deployment of ATM technology in
university and metropolitan settings. Its multipoint switching technology has
become the flagship product of one of the leading commercial vendors of ATM LAN
switches.  Washington University School of Medicine is the nation's fourth
largest institution for biomedical research and has been a leader in medical
education and education reform since the turn of the century.  The School of
Medicine Library is one of the most active medical libraries in the country and
is the center for research and deployment of advanced information systems and
communications technologies within the medical school. <p>
<p>
*	<b><i>Texas A&amp;M University</i></b>.  The Hypermedia Research Laboratory
at Texas A&amp;M University is known for its work on large-scale hypermedia
systems, hypermedia databases (hyperbases), and the application of these
technologies to collaborative work processes.  Its early work in sponsoring the
Dexter Workshops has led to one of the most influential hypermedia models in
the literature today. Texas A&amp;M is the home of the nation's largest
agricultural college and, in cooperation with the USDA and the W.M. Keck
Institute for Genome Informatics, maintains the nation's collection of
scientific information relating to cotton and sorghum, two of the most
important agricultural plants.  Texas A&amp;M is also recognized for its work
in high-performance computing, scientific computation, and scientific
visualization.  Its Institute for Scientific Computation has pioneered the use
of wavelets for multiple-resolution visualization and compression.<p>
*	<b><i>Southwestern Bell Telephone Company</i></b>.  The laboratories
affiliated with Southwestern Bell have a long and distinguished record of
research in broadband communications and collaborative systems technologies.
Southwestern Bell has been a major corporate sponsor of large-scale ATM testbed
development in the St. Louis area.  The company  is committed to expanding the
availability of high-speed communications into new application domains and to
new client communities, such as those participating in the CoLib Project.<p>
<p>
The CoLib Project also includes an important array of secondary collaborators
who will become the first extended tier of clients to utilize the CoLib Library
These are discussed in Section. 4.4.<b><p>
<p>
4.3.  Information Resources</b><p>
The Missouri Botanical Garden is the primary source of information for the
botanical digital library during the initial phase of CoLib's development.  In
subsequent phases, this base will be expanded to include the resources
contained in the network-accessible repositories of virtually all the nation's
major botanical institutions.  The information associated with the Flora of
North America Project is a subset of the vast amount of information managed by
the Garden and its affiliated institutions.  The following are some of the
specific elements that will be included in the CoLib
Library.<b><i></i></b><u><p>
TROPICOS</u>.  TROPICOS, developed and managed by the Missouri Botanical
Garden, is the world's largest research- and education-oriented botanical
taxonomic database [1].
 As of September, 1993, TROPICOS contained information on 600,000 plant names,
50,000 of which are needed for FNA.  It also contained 501,000 specimen records
and 57,120 bibliographic records.  Extensive authority files are used in the
system, including 25,443 records for persons and 18,183 records for periodicals
and books.  In collaboration with The New York Botanical Garden and INGRES, a
program is underway to significantly refine and expand the TROPICOS database.
<p>
TROPICOS supports a number of programs at the Garden, including herbarium
specimen label production, all moss-related literature references published in
the past 20 years, all scientific data associated with the Flora of North
America Project, the index to plant chromosome numbers, and data associated
with numerous worldwide projects, such as the Flora of Peru, Flora of
Madagascar, Flora Mesoamericana, and Flora of China Projects.<u><p>
Herbarium</u>.  The cornerstone of the research program at the Garden is its
herbarium.  The herbarium collection presently consists of nearly 4.5 million
mounted specimens, making it the fourth largest in the United States and among
the top fifteen of the world's 2600-plus herbaria.  The herbarium's accession
rate is about 200,000 specimens a year, the highest in the world.  Use of the
Garden's collection and its other facilities by researchers worldwide is high,
with hundreds of research scientists using the collection each year and nearly
160 mounted specimens per day being loaned from St. Louis.  The collection
contains many specimens of historical and nomenclatural importance, including
an estimated 150,000 type specimens<a href="#fn3">[4]</a>.   An important
component of CoLib is an imaging initiative that will result in the creation of
a digital archive of the type specimens and rare book manuscripts associated
with the Flora of North America Project.<u><p>
Libraries</u>.  The Missouri Botanical Garden houses one of the most
comprehensive collections of literature devoted to systematic botany and
floristics.  The general collection contains over 215,000 volumes, including
110,000 volumes of monographs and journals.  The library has seven special book
collections containing between five and six thousand rare books, including the
first printing of the first edition of Charles Darwin's <i>On the Origin of
Species</i>, John James Audubon's <i>Birds of America</i>,  and the Linnean
Collection from the 1700s that includes most of Carl Linnaeus'<a
href="#fn4">[5]</a> botanical works in original editions and subsequent
revisions.  Among its non-book collections are more than 220,000 archived items
including professional papers, historic manuscripts, photographs, oral
histories, and more than 40,000 microfiche images of plant specimens from other
herbaria.<p>
Creating a digital archive of many of these manuscripts is important, because
they contain the original descriptions of species corresponding to the physical
type specimens contained in the herbarium collection.  At least 25 percent of
all citations in the current literature of systematic botany refer to
publications that date from 1900 or earlier. In non-digital form, it is often
necessary for researchers to physically travel to St. Louis to simultaneously
view type specimens and their associated published information.<u><p>
Publications</u>.  The Garden is responsible for the coordination, editing, and
publication of articles and studies in numerous scientific journals and books.
These include scientific journals critical to the field, such as <i>Annals of
the Missouri Botanical Garden</i>, <i>Monographs in Systematic Botany</i>,
<i>Index to Plant Chromosome Numbers</i>, and other publications.  Typically,
over 200 manuscripts are being processed at one time by the publications
department.  <p>
The material that we have chosen for initial inclusion in the CoLib library
represents a rich cross section of informational forms. However, other
important collections will eventually be incorporated as well.<u><p>
Herbarium Consortium</u>.  Formal and informal relationships between the
Missouri Botanical Garden and the nation's largest herbaria will enable future
expansion of the CoLib Project to include the country's most important sources
of botanical information.  The ten institutions are the Academy of Natural
Sciences, the Bishop Museum, the California Academy of Sciences, the Field
Museum of Natural History, Harvard University, the New York Botanical Garden,
the Smithsonian Institution, the University of California at Berkeley, the
University of Michigan, and the Missouri Botanical Garden.  The aggregate
holdings of these herbaria are over 28 million specimens, including 84 percent
of the North American type specimens.  Their collections are the most
frequently used in the United States, and they maintain the most active
specimen loan program in the world [1].<p>
These herbaria are similar to the Missouri Botanical Garden in the kinds of
information that they manage.  Each has associated libraries, publications
departments, laboratories, etc.  The primary difference among them is the plant
groups or geographic areas upon which their collections and scientific research
focus.  Other differences are the extent to which their information exists in
digital form and the sophistication of their computing infrastucture.  However,
all are initiating large-scale, individual and cooperative efforts to bring
their information on-line, and arrangements are in place for the inclusion of
their information in CoLib.<u><p>
W. M. Keck Center for Genome Informatics</u>.  Under the auspices of the USDA,
the Keck Center maintains the national working collection of approximately
7,000 species for the genus <i>Gossypium</i> (cotton)--economically, one of the
nation's most important agricultural crops. Data and herbarium specimens are
maintained on the A&amp;M campus for each species.  The Keck Center is
transitioning this information into electronic, network-accessible form and is
expanding the information base to include image data and molecular genetic
data.   A similar effort is underway with sorghum, where the responsibility for
a national repository is split between Texas A&amp;M University and the
University of Georgia.  The sorghum collection includes approximately 20,000
accessions.<p>
The inclusion of these information repositories in a botanical digital library
creates two important interfaces.  Washington University School of Medicine and
the Keck Center have important ties to the Plant and Human Ge-nome Initiative,
thereby creating a unique and potentially valuable interface between the
botanical and agricultural research communities and the biomedical research
community.  Perhaps more important to the broader issues relating to the
conduct of science, using digital libraries as the context for sharing
information between molecular biologists and botanical systematists can help
promote the convergence of these disciplines [30].<b><p>
<p>
4.4.  Client Groups</b><p>
The Flora of  North America Project is a gold mine of information for the
botanical research community.  As we have indicated, however, the CoLib Project
seeks to enable more than this single client group--it intends to use FNA to
catalyze the formation of a large-scale, distributed, botanical digital library
that will be useful to many other people and communities. In addition to the
primary collaborators, we will include among our clients one of the nation's
major science centers, two magnet schools, a major national educational
consortium, the National Biological Survey, and the pharmaceutical
industry.<u><p>
St. Louis Science Center</u>.  The St. Louis Science Center, with over 1.7
million visitors a year, is ranked third among science centers in the U.S. and
fourth in the world. The Center collaborates with the Missouri Botanical
Garden, the St. Louis Zoo, and the St. Louis public schools on a wide range of
educational programs [6,17].
 The Center is a recognized leader in the use of multimedia educational
materials in its education, community outreach programs, and museum exhibits.
<u><p>
Curriculum Development Collaborative</u>.  The Curriculum Development
Collaborative is a national effort involving the Ontario Institute for Studies
in Education, the University of California at Berkeley, Vanderbilt University,
and the St. Louis Science Center.  The goals of the project are to deploy
technology-based curricula in science-math magnet schools throughout the
country and to increase collaboration between students and active research
communities [6,17].
 The CoLib testbed will include two St. Louis-area math-science magnet schools
associated with this national initiative:  the Mullanphy Botanical Garden
Investigative Learning Center (an elementary school affiliated with the
Missouri Botanical Garden) and the St. Louis Science Center Middle School and
High School.<p>
Inclusion of these institutions and programs as part of CoLib enables a diverse
program of research on the use of digital libraries in education.  Since the
Missouri Botanical Garden has an active graduate program and most of the
botanists at the Garden also share academic appointments at Washington
University or the University of Missouri, the addition of these magnet schools
provides an educational interface that spans grades K-12, undergraduate and
graduate programs, and general community outreach. <u><p>
National Biological Survey</u>.  The U.S. Department of the Interior has formed
a new agency, the National Biological Survey (NBS).  The mission of this new
agency is to gather, analyze, and disseminate the information necessary for the
wise stewardship of our nation's natural resources and to foster an
understanding of our biological systems and the benefits they provide to
society.  The NBS will have responsibilities to inventory, map, and monitor
biotic resources.  It will also help coordinate basic and applied research on
species, populations, and ecosystems, and create a basis for sound
environmental decision-making.  The NBS will provide, for the first time, an
organized framework for collaboration among federal, regional, state, and local
organizations over an extensive and common base of biological information [15].
 The CoLib botanical digital library will significantly contribute to this
effort. <u><p>
Pharmaceutical Industry</u>.  The rosy periwinkle (<i>Catharanthus roseus</i>)
is a rare plant native only to Madagascar.  However, it produces two alkaloids,
vinblastine and vincristine, that cure most people of two of the deadliest of
cancers, Hodgkin's  disease and acute lymphocytic leukemia.  The income
produced from the manufacture and sale of these two substances alone exceeds
$180 million a year.  Most medicines are obtained directly from plants or are
synthetic versions of molecules first discovered in nature.  However, in the
case of these types of anticancer agents, fewer than 3 percent of the world's
flowering plants have been examined for alkaloids [29].
 The Missouri Botanical Garden, in cooperation with the National Cancer
Institute and the Monsanto Company, collects plants around the world to be
screened for potential therapeutic value.  Biodiversity is the foundation of
biotechnology, and we expect that the pharmaceutical industry will extensively
utilize the information contained in the CoLib library to "prospect" for new
drugs. <b><p>
<p>
4.5.  Communications Infrastructure</b><p>
CoLib will expand Washington University's experimental ATM network to support
research on digital libraries.  Project Zeus is a long-term effort aimed at
introducing advanced Asynchronous Transfer Mode (ATM) network technology into
the university campus and metropolitan St. Louis area [2].
An ATM switch, developed in conjunction with the Zeus Project, has been
licensed to SynOptics Communications and is now being sold commercially. Eight
switches are currently in place in St. Louis, serving about fifty users.<p>
Each of the six main CoLib sites (Missouri Botanical Garden, Washington
University Medical School Library, Washington University Computer Science
Department, Texas A&amp;M University Computer Science Department, Institute for
Biosciences and Technology, Institute for Scientific Computation) is deploying
SynOptics switches.  Each switch supports 16 links operating at 155 Mb/s.  The
switches are being used to connect researchers at each site to local servers
and to other sites in the same geographic location.  <p>
While long-distance ATM services among collaborating sites are not readily
available yet, we anticipate that situation will change dramatically in the
near future.  Initially the project will rely on existing Internet connectivity
to support collaboration between the St. Louis and A&amp;M sites and between
Consortium institutions and FNA scientists.  CoLib will move to higher-speed
connectivity as inexpensive ATM services become available, either commercially
or through anticipated developments within NSF-Net, Southwestern Bell, or other
telecommunications companies.<b><p>
<p>
4.6.  Research Program</b><p>
The advent of digital libraries poses a staggering array of research problems,
and our understanding of the technological and social complexities of digital
libraries will not progress in the absence of sound, well-designed theoretical
and practical studies.  Fundamental research is the overarching mission of the
CoLib Project.  Its research program is focusing on three major areas: (1)
collaborative hypermedia library systems, (2) interactive multimedia computing
and communications systems, and (3) librarianship in the botanical digital
library.  <p>
Each of the projects in these areas requires an interdisciplinary, team
approach and, in most cases, an interinstitutional collaboration.  The specific
details of the research projects will be presented in future publications;
here, we briefly describe the major issues that are being addressed: <p>
*	<b><i>Collaborative Hypermedia Library Systems</i></b>.  It is important to
challenge the limited conception of information-seeking that colors much of the
work currently being done on digital libraries.  Digital libraries permit a
departure from the archaic indexing and presentation schemes derived from
static, paper-based libraries, and we must avoid retaining the limitations
inherent in the representational medium of print on paper. Collaborative
hypermedia systems enable flexible and efficient mechanisms for locating,
organizing, and personalizing information.  They also permit multiple
conceptual mappings over an information space and allow multiple users engaged
in a common task to interact synchronously or asynchronously over shared
resources. This is clearly one of the most promising computer technologies for
use in digital libraries of the future [7-9, 18, 19, 23].
 <p>
In CoLib, we are extending collaborative hypermedia technology into the domain
of digital libraries through research that focuses on hypermedia-in-the-large.
We define hypermedia-in-the-large as open, extensible, large-scale systems that
support hypermedia-based collaboration across high-speed, wide-area networks [7,10].
 We are also examining the use of interface agents in large-scale,
hypermedia-based digital library systems [16].
 Finally, we are studying issues relating to the use of hypermedia digital
library systems in the computer-augmented environment where users interact with
information through whiteboards, laptop computers, tablets, and personal
communicators [21, 22, 28].<p>
<p>
*	<b><i>Interactive</i> <i>Multimedia Computing and Communications
Systems</i></b>.  Distributed multimedia digital library systems of the future
will require computing platforms that are closely coupled to high-speed
networks and capable of processing a potentially large number of continuous
high-bandwidth data streams.  They will be expected to handle huge
computational demands for everything from handwriting and speech recognition to
real-time image processing.  They will need to support massive amounts of
storage as applications make greater use of visual information.  These demands
require new ideas in system architecture and a rethinking of how operating
systems, computer hardware, and networks interact. <p>
In CoLib, we are examining issues relating to multimedia conferencing, focusing
on ways of optimizing the interplay between hardware and collaborative software
in order to enable <i>n</i>-way audio and video conferencing in the context of
a digital library.  We are also studying the design of flexible,
high-performance computing platforms that will be required by the types of
distributed multimedia applications that will be developed in CoLib.  Finally,
we are examining the feasibility of using scalable ATM network technology to
deliver multimedia digital library materials and to support the types of
interactivity that will occur over digital library information [2, 26, 27].<p>
*	<b><i>Librarianship in the Botanical Digital Library</i></b>. Historically,
research libraries provided a service to the research community on the basis of
internal workings that were essentially isolated from the communities that they
served.  Networked information services have begun to change this.  Today, the
library is often a subset of these services and is involved, to varying
degrees, in informatics research, computing services, and technology education.
As a result, libraries have become flatter, less formalized institutions that
are no longer the specialized purview of trained librarians.<p>
In CoLib, we hope to strengthen the integration of  libraries into the
organizational challenges of large-scale, internetworked information systems.
Research in this area focuses on combining technologies with work practices to
create new possibilities for librarianship.  The Flora of North America Project
is an example of editing- and authoring-in-the-large and, as such, allows CoLib
to explore new tools and roles for a librarianship that is better integrated
with community research, decision-making, and communication.  We are examining
the use of discipline-oriented structured documents and collaborative
publication in the setting of a botanical digital library.  We are also
studying ways to enhance image-based collections management through wavelet
compression techniques for multiple-resolution imaging of herbarium specimens,
cataloging and retrieval by image features, and 3D visualization [3-5, 24, 25].
 <p>
<p>
We embody the fusion of these fundamental technologies and processes in a
concept of the <i>Digital Library Machine</i>.  Digital Library Machines are
comprised of hardware and software tools that enable the collaborative
workgroup practices of a community. Whether virtual or real, Digital Library
Machines support communities of scholars in the conduct of their day-to-day
intellectual activities, including collaborations with other researchers,
distribution of research results, publication in digital journals, access and
personalization of existing literature, and the education of future researchers
and students. <b><p>
<p>
5.  Discussion</b><p>
We believe that the CoLib Project has important implications both for botany
and for research on digital libraries. <b><p>
<p>
5.1.  Implications for Botany</b><p>
We believe that the emergence of digital libraries and advanced computing and
communications technologies will fundamentally change the practice of botany
and the way in which botanical information is used.  The delivery of high
resolution plant specimen images across gigabit networks will enable the
"virtual herbarium"--scientists and students anywhere in the world will be able
to retrieve images of plants, rotate them in 3D, and zoom in on key structural
elements.  "Distance microscopy" will enable physical specimens to be examined
closely even at remote sites.  Pen computers, global positioning systems,
remote sensors, and microminiature color CCD cameras will change the way
information is gathered in the field--high-resolution, 3D images will be
collected <i>in situ</i>  and immediately sent to the virtual herbarium for
processing along with associated collection data.<p>
Scientists will be able to forge links to related information and store these
links in private hyperbases.  They will be able to personalize the shared
information space with annotations and reuse it for new purposes.  If it is
necessary to collaborate with colleagues over this material, distributed
multimedia computing systems will be used for real-time interactions. In the
research labs, teams will gather in computer-augmented conference rooms, where
large, interactive displays will be used to plan their research, discuss
results, and share information. Computations, models, and simulations will be
as much a part of the information space as images and text.   New client
communities and new discourses of study will be enabled.<b> </b>We predict that
an era of "telebotany" and "virtual science" will emerge based on the use of
digital libraries of botanical information [1,11, 12, 14, 20, 21].
<b><p>
<p>
5.2.  Implications for Digital Libraries</b><p>
There is an important but subtle reason why research in a botanical digital
library can have a significant and wide-ranging influence on other lines of
digital libraries research.  The key lies in understanding information as an
activity and systematic biology as a form of librarianship.<p>
Systematic botany attempts to group organisms according to their physical
characteristics in a way that reflects their evolutionary relationships. This
results in taxonomic categories that help to distinguish biological species.
Species, however, are not defined in terms of their intrinsic properties, but
<i>in relation</i> to other groups.  It is the goal of systematic biology to
infer these relationships through complex processes of categorization.<p>
This synthesis is subject to constant revision through debates and developments
in supporting biological disciplines.  Systematic botany thus pursues
categorization on a par with inquiry--categories are not a corollary of
information but, instead, are a mode of creation and argumentation, an activity
that provides the nomenclature and unifying context through which botanical
information and enigmas are communicated within and beyond the scientific
community.  In this way, botanical systematics performs a large-scale,
distributed, and collaborative scientific librarianship--it provides a
framework for the vast diversity of information vital to this scientific
discourse. <p>
Within many disciplines--linguistics, cognitive science, anthropology,
psychology, etc.--information is conceived as an activity.  It is not the
component elements stored on paper or on a networked fileserver; instead,
information is integrally constituted in the activities that involve these
elements. Libraries have traditionally provided an organizational frame in
which to use and develop the diversity of information.  Digital libraries of
the future will likewise be more than mere repositories for disembodied
ideas--they will be inseparable from the activities that model or stratify
information. As a discipline, systematic botany is in a unique position to help
our understanding of these issues.  What better circumstance to study digital
libraries than in the context of a science whose essential activity is the
progressive and natural categorization of diversity?<b><p>
<p>
6.  Conclusion</b><p>
This paper has provided a high-level overview of the CoLib Project.  In
summary, the following are among the major goals we hope to achieve:<p>
*	CoLib will increase our understanding of hypermedia-in-the-large, agency, and
ubiquitous computing in the context of a digital library;<p>
<p>
*	CoLib will  increase our understanding of the use of broadband communications
technology in the digital libraries setting;<p>
*	CoLib will promote the inter- and intradisciplinary development of botanical
information;<p>
<p>
*	CoLib will  make significant corpora of botanical information more readily
available to diverse communities that have been isolated until now from this
important information;<p>
<p>
*	CoLib will enable the transition from a specimen- and paper-based herbarium
to the virtual herbarium and telebotany and will become a central clearinghouse
for floristic taxonomic information for the world;<p>
*	CoLib will create a large-scale testbed for fundamental and applied research
on scientific digital libraries that is open to many research communities and
methodologies;<p>
*	CoLib will promote the fusion of systematic and molecular biology.<p>
*	CoLib will help botanists and the public come to understand what a botanical
digital library is, how it is used, and how it can influence the fundamental
conduct of science within the discipline.  It will yield an understanding of
digital librarianship.<p>
<p>
Research in the setting of a botanical digital library will allow the science
and librarianship that is intrinsic to the practice of systematic botany to be
extended to the broad spectrum of work underway on digital libraries. We
believe that CoLib will contribute to the basic understanding needed to build
digital library systems and organizational infrastructures to advance libraries
and botany into the twenty-first<a href="#fn5">[ ]</a>century.<b><p>
<p>
<p>
Acknowledgments</b><p>
Special thanks to Peter Raven, Chris McMahon, Debbie Kama, Jim Solomon, David
Brunner, Connie Wolf, Doug Stevens, Jerry Cox, Gil Jost, Mary Lamon, Dwight
Crandell, Christine Roman, Jim Myers, Mark Radle, Al Winterbauer, Jim
Carpenter, Mike McCarthy, Cindy Kunz, Myrna Harbison, Frank Almeida, Pat Gunn,
Paul Schoening, Martha Hill, and Jim Smith for their help in launching the
CoLib Project.<b><p>
<p>
<p>
References</b>

1.	Cooley, G.P., Harrington, M.B., and Lawrence, L.M. (Eds.).  1993. Analysis
and Recommendations for Scientific Computing and Collections Information
Management of Free-Standing Museums of Natural History and Botanical Gardens,
Vol. I, II. MITRE Corporation, McLean, VA.  (NSF-sponsored study.)<p>
<p>
2.	Cox, J., Jr., Gaddis, M., and Turner, J. 1993. Project Zeus: Design of a
broadband network and its application on a university campus. IEEE Network,
pp. 20-30. <p>
<p>
3.	Frisse, M.E. 1988. Searching for information in a hypertext medical
handbook.  Communications of the ACM, Vol. 31, No. 7, pp. 880-886.   <p>
<p>
4.	Frisse, M.E., Marrs, K., and Schoening, P.A. 1992. A method for publishing
genomic maps.  In Proceedings of the Sixteenth Annual Symposium for Computer
Applications in Medical Care (Washington, D.C.), pp.  376-382.<p>
<p>
5.	Furuta, R.K. and Stotts, P.D. 1989. Separating hypertext content from
structure in Trellis. In Proceedings of the Hypertext '92 Conference
(University of York, June, 1989).<p>
<p>
6.	Lamon, M., Lee, E., and Scardamalia, M.  Cognitive technologies and peer
collaboration: The growth of reflection. In Design Experiments: School
Restructuring Through Technology. Collins, A. and Hawkins, J. (Eds.). Cambridge
University Press, New York.  (To appear.)<p>
<p>
7.	Leggett, J.J. and Schnase, J.L. 1994. Viewing Dexter with open eyes.
Communications of the ACM,  Vol. 37, No. 2,  pp. 76-86.<p>
<p>
8.	Leggett, J.J., Schnase, J.L., Fox, E.A., and Smith, J.B. (Eds.).  1993.
Proceedings of the NSF Workshop on Hyperbase Management Systems.  Department of
Computer Science Technical Report No. TAMU-HRL 93-002, Texas A&amp;M
University, College Station, TX.<p>
 <p>
9.	Lokken, S.T. and Leggett, J.J. 1993. Document representations in hypermedia
library systems.   (In prep.)<p>
<p>
10.	Malcolm, K.C., Poltrock, S.E., and Schuler, D.  1991. Industrial strength
hypermedia: Requirements for a large engineering enterprise. In Proceedings of
the Hypertext '91 Conference (San Antonio, TX, Dec.), pp. 13-24. <p>
<p>
11.	Milton, E.O., Ferris, H., Fortuner, R., and Diederich, J.R. (Eds.).  1990.
Articial intelligence and modern computer methods for systematic studies in
biology (ARTISYST). University of California at Davis, Napa, CA.
(NSF-Sponsored Workshop)<p>
<p>
12.	Morain, S. 1993. Emerging technology for biological data collection and
analysis. Annals of the Missouri Botanical Garden,  Vol. 80, No. 2,  pp.
309-316. <p>
<p>
13.	Morin, N.R. 1991. Beyond the hardcopy: Databasing Flora of North America
information. In Proceedings of the International Congress for Systematic and
Evolutionary Biology IV (Portland, OR),  pp. 973-980. <p>
<p>
14.	Morin, N.R., Whetstone, R.D., and Tomlinson, K.L. (Eds.).  1989. Floristics
for the 21st Century. Monographs in Systematic Botany from the Missouri
Botanical Garden, Vol. 28. The Missouri Botanical Garden Press, St. Louis, MO.
<p>
15.	NRC. 1993. A Biological Survey for the Nation. National Academy Press,
Washington, DC. <p>
<p>
16.	S&aacute;nchez, J.A. 1993. HyperActive: Extending an Open Hypermedia
Architecture to Support Agency. M.S. Thesis. Department of Computer Science,
Texas A&amp;M University, College Station, TX.  <p>
<p>
17.	Scardamalia, M. and Bereiter, C. 1993.  Technologies for knowledge-building
discourse. Communications of the ACM,  Vol. 36, No. 5,  pp. 37-42. <p>
<p>
18.	Schatz, B.R.  1993. Building an electronic community system. In Readings in
Groupware and Computer-Supported Cooperative Work: Assisting Human-Human
Collaboration. Baecker, R.M. (Ed.). Morgan Kaufmann Publishers,  New York, pp.
550-560. <p>
<p>
19.	Schnase, J.L., Leggett, J.J., Hicks, D.L., N&uuml;rnberg, P.J., and
S&aacute;nchez, J.A. 1993.  Design and implementation of the HB1 hyperbase
management system.  Electronic Publishing - Origination, Dissemination and
Design, Vol. 6, No. 1, pp. 1-29.<p>
<p>
20.	Schnase, J.L., Grant, W.E., Maxwell, T.C., and Leggett, J.J.  1991. Time
and energy budgets of Cassin's Sparrow (Aimophila cassinii) during the breeding
season: Evaluation through modelling. Ecological Modelling, Vol. 55, No. 4, pp.
101-135. <p>
<p>
21.	Schnase, J.L. and Leggett, J.J.  1989. Computational hypertext in
biological modelling. In Proceedings of the Hypertext '89 Conference
(Pittsburgh, PA, Nov.),  pp. 181-198. <p>
<p>
22.	Schnase, J.L., Leggett, J.J., Hicks, D.L., N&uuml;rnberg, P. J., and
S&aacute;nchez, J.A.  1994. Open architectures for integrated, hypermedia-based
information systems. In Proceedings of the 27th Annual Hawaii International
Conference on System Science (HICSS '94) (Maui, HI, Jan.),  pp.  386-396. <p>
<p>
23.	Schnase, J.L., Leggett, J.J., Hicks, D.L., and Szabo, R.L. 1993.  Semantic
data modeling of hypermedia associations.  ACM Transactions on Information
Systems, Vol. 11, No. 1, pp. 27-50.<p>
<p>
24.	Stotts, P.D. and Furuta, R.K.  1989. Petri-net-based hypertext: Document
structure with browsing semantics. In ACM Transactions on Information Systems,
Vol. 7, No. 1, pp. 3-29. <p>
<p>
25.	Stotts, P.D., Furuta, R.K, and Ruiz, J.C. 1992. Hyperdocuments as automata:
Trace-based browsing property verification. In Proceedings of the European
Conference on Hypertext (ECHT '92), pp. 272-281. <p>
<p>
26.	Turner, J.S. 1992. Managing bandwidth in ATM networks with bursty traffic.
IEEE Networks, Vol. 6.  <p>
<p>
27.	Turner, J.S. 1988. Design of a Broadcast Packet Network. IEEE Transactions
on Communications, Vol. 41.  <p>
<p>
28.	Weiser, M. 1993. Some computer science issues in ubiquitous computing.
Communications of the ACM,  Vol. 36, No. 7.  pp. 74-84. <p>
<p>
29.	Wilson, E.O. 1992. The Diversity of Life. Harvard University Press,
Cambridge, MA. <p>
<p>
30.	Wilson, E.O. and Raven, P.H. 1993.  A fifty-year plan for biodiversity
surveys.  Science, Vol. 258, pp. 1099-1100.  <b><p>
Authors Addresses</b><i><p>
John L. Schnase, Edward S. Metcalfe, Edward L. Cunnius, Scott W. Hassan, and
Mark E. Frisse</i>: Advanced Technology Group, School of Medicine Library,
Washington University School of Medicine, 660 South Euclid Avenue (Campus Box
8132), St. Louis, Missouri, 63110. {schnase, metcalfe, edc, hassan, frisse}
@medicine.wustl.edu; <p>
<i><p>
John J. Leggett and Richard K. Furuta</i>: <b> </b>Hypermedia Research
Laboratory, Department of Computer Science, Texas A&amp;M University, College
Station, Texas, 778843. {leggett, furuta}@bush.tamu.edu;  <i><p>
<p>
Nancy R. Morin</i>:  Missouri Botanical Garden, P.O. Box 299, St. Louis,
Missouri, 63166.  morin@mobot.org;  <i><p>
<p>
Jonathan S. Turner</i>:  Department of Computer Science, Washington University,
One Brookings Drive (Box 1045) St. Louis, Missouri, 63130.
turner@cs.wustl.edu; <i><p>
<p>
Leland Ellis</i>:  W.M. Keck Center for Genome Informatics, Institute of
Bioscience and Technology, Texas A&amp;M University, 2121 Holcombe Avenue,
Houston, Texas, 77030.  leland@straylight.tamu.edu; <i><p>
<p>
Michael S. Pilant and Richard E. Ewing</i>:  Institute for Scientific
Computation, College of Natural Sciences, Texas A&amp;M University, College
Station, Texas, 77843.  {mpilant, ewing}@isc.tamu.edu.<p>
<p>
<p>
<hr>
<a name="fn0">[1  ]</a>A <i>species</i> is a category of biological
classification generally comprising organisms capable of interbreeding with one
another in natural conditions, but not with members of other species.  The
number of living species of all kinds of organisms currently known is
approximately 1.4 million.  This includes 250,000 plants, 750,00 insects,
280,000 other animals, and 132,500 species of protozoa, algae, fungi, viruses,
and bacteria.  (This definition for species best applies to sexually
reproducing organisms and is often referred to as the "biological-species"
concept.  Other definitions for species exist, e.g., genetic definitions,
ecological definitions, etc.  The concept of species is a complex and natural
frame for argumentation.)<p>
<a name="fn1">[2  ]</a><i>Flora</i> refers to the plants occurring within a
given region as well as to a publication describing those plants.  To
distinguish between the two, the word is generally capitalized when a
publication is meant.  A Flora may contain anything from a simple list of the
plants occurring in an area to a very detailed account of those plants.  They
almost always contain scientific and common names, literature references,
descriptions, habitats, geographical distribution, illustrations, flowering
times, and miscellaneous notes.  They also may include chemical information,
chromosome numbers, population occurrences, as well as identification devices,
such as "keys," that consist of mutually exclusive statements.<p>
<a name="fn2">[3]</a>  <i>Herbarium specimens</i> consist of pressed, dried
plant specimens that  have been mounted on  30x40 cm sheets of
archival-quality paper with a label that indicates date and place of
collection, collector, and associated information.  The sheets are given an
accession record and stored in sealed cabinets.  When properly preserved,
herbarium specimens retain indefinitely the features needed for accurate
identification.  Herbarium specimens are critical to documenting and studying
the world's flora.  The oldest portions of the Missouri Botanical Garden's
collection date from the mid-1700s, and include specimens collected by Carl
Linnaeus, Charles Darwin, and George Engelmann, one of the foremost botanists
of the 19th century.<p>
<a name="fn3">[4]</a>  A t<i>ype specimen</i> is a specimen that is cited in
the original publication to be the basis of a new species name.  It therefore
determines the correct application of that name and becomes a critically
important element of scientific information.  Types are consulted by
taxonomists to resolve discrepancies between descriptions of species and to
prevent the publication of further discrepancies.  Since types are the basis of
plant names, they must be consulted, along with original descriptions, as part
of any sound monographic or floristic research.<p>
<a name="fn4">[5]</a>  In 1753, Carl Linnaeus published <i>Species
Plantarum</i><u>,</u> which established the binomial system of naming plants
that is followed today.  The most frequently used categories are family, genus,
and species.  Naming is governed by a set of rules established by an
international committee.  Each new species must be described in Latin and
published in a recognized scientific journal.

<!--#include virtual="/DL94/footer.ihtml" -->
</body></html>