<html>
<head>
<title>
DL94: Navigating and Searching in Hierarchical Digital Library Catalogs
</title>
</head>

<body>

<!--#include virtual="/DL94/header.ihtml" -->

<h1>Navigating and Searching in Hierarchical Digital Library Catalogs </h1>
<p>
Robert B. Allen 

<i>Bellcore, MRE 2A367, 445 South Street,
Morristown, NJ, rba@bellcore.com </i><p>

<b>Abstract</b><p>

Two interfaces are described for navigating large collections of document and
book records. An Online Public Access Catalog interface uses a classification
hierarchy to facilitate browsing and searching.  The system has been
implemented and currently runs with over 50,000 book records.  Interface
widgets allow the hierarchy to be displayed and traversed easily.  For example,
the Book Shelf dynamically updates itself to reflect searches and attribute
selections.  A second interface, not yet fully implemented, allows access to
the <i>ACM Computing Reviews</i> classification.<p>
<b><p>
Keywords</b>: Classification, hierarchies, hypertext, interface, OPAC,
retrieval, search.<b><p>
<p>
<p>
1. Navigation and Searching</b><p>
Hypertext systems allow a user to browse a highly structured network of links
and nodes.  Information Retrieval (IR) systems usually return ranked lists of
document records according to how well they match a query as determined by a
retrieval algorithm. Both approaches have proven effective, but many of the
issues in combining them remain to be explored.  One domain in which this
integration is needed is managing collections of document and book records.<p>
Hierarchies are the primary organizing principle for many book classification
systems. A hierarchical Online Public Access Catalog (OPAC) structure provides
an a priori similarity space  for locating related books.  For instance, a user
may  search to find a shelf area relevant to a query and then check the
surrounding books for other relevant items.  Organizing books by an a priori
similarity may be seen as a weak alternative to the variety of ad hoc
organizations made possible  by electronic searching.  However, a consistent
structure reflecting a commonly agreed upon organization of knowledge should
help orient the user. It could orient a casual user browsing the book
collection and it could be used to organize search results. Thus, the essential
question of this research  whether an a priori structure has advantages in
conjunction with derived similarity spaces  for navigation and retrieval. <p>
While OPACs are widely used,  many of these interfaces are designed for ASCII
terminals and do not have advantages such as direct manipulation associated
with GUIs.  Some prototype systems introduce creative interfaces, but they may
not scale well [4, 12]. Other OPACs provide extensive term searching but do not
take advantage of the hierarchical organization [7]. Cataloging systems [e.g.,
11] also provide access to the hierarchical classifications. However, these
generally have only simple graphical interfaces and are not documented in the
literature.<p>
Interfaces for electronic books have now been widely studied, but relatively
little attention has been paid to the management of <i>collections  </i>of
books in these systems. The <A HREF="http://superbook.bellcore.com/SB/SBhome-page.html">SuperBook</A>[TM]</a> browser [5, 6]
takes advantage of the hierarchical structure of documents. However, the
<A HREF="http://superbook.bellcore.com/SB/SBhome-page.html">SuperBook</A> browser itself is not effective for navigating the hierarchical
structure in an OPAC; it does not allow fielded search and it is not designed
for presenting short records.<p>
Section 2 describes an OPAC interface that exploits hierarchical organization.
Section 3 describes an interface for the  <i>ACM Computing Reviews
</i>categories. <p>
<b><p>
2. HOPAC Interface</b><p>
Figure 1 shows an interface that allows interaction with the Dewey hierarchy by
fielded search on the records and presentation of records on a dynamic shelf.
The interface is composed of three main groups of widgets that are described in
Section 2.2. The interface has been implemented in Xwindows with Motif
widgets.<b><p>
<p>
2.1.  Book Records and Classification Hierarchy</b><p>
The Dewey Classification System is probably the most widely used international
classification system.  It is also the purest hierarchy of the major library
classification systems. While part of the Library of Congress MARC system is
hierarchical, the "Cutter"  number extensions are  orthogonal to the main
classifications.  The Dewey Classification  System was designed  for cataloging
books [11], but it has also  been suggested as the basis for an interface for
access by the casual user [10]. With the introduction of high-powered personal
workstations and  <p>


<p>
<img src="figures/allen1.gif"><p>
<i>Figure 1:  Hierarchical OPAC Interface</i><p>
<p>
flexible GUI interfaces,  the accomplishment  of this goal for the casual user
is now much more practical.<p>
The headings for a large part of the Dewey Decimal System were  obtained and
merged with the book records. While the Dewey hierarchy, like any
classification system, is not suitable for all tasks, it is useful for a large
range of task and is familiar to many  users.   In preparing the corpus, long
call numbers were truncated to 4 decimal places. In a few cases, the hierarchy
was not complete and filler headings were inserted. For instance, in the
Classification immediately below the first-level node <b>000.0 Generalities</b>
is the third-level node <b>001.0 Knowledge.</b>   A second-level heading
<b>000.0 General </b>was created to match other second-level headings under
<b>000.0 Generalities </b>such as  <b>010.0 Bibliography</b>.<p>
	Book and document records numbered by the Dewey Decimal Classification System
were obtained from the Bellcore Technical Libraries. They covered approximately
50,000 books and technical reports. Each record included the shelf number,
author, title, publisher, location, a subject field, and a list of the library
locations where the book was held.<p>
<b><p>
2.2.  Interface Widgets<p>
<p>
2.2.1.  Subject Hierarchy and Current Node Lists</b><p>
The upper left corner of Figures 1 and 2  shows the Subject Hierarchy and
Current Node lists.  These allow a user to navigate through the hierarchy and
serve a function similar  to the expandable Table of Contents (TOC) of the
<A HREF="http://superbook.bellcore.com/SB/SBhome-page.html">SuperBook</A> browser. In a deep and wide hierarchy, such as the Dewey
Classification System, the contents of the  expanding TOC would frequently
scroll out of view. Although less information is presented in separate Subject
Hierarchy and Current Node lists than in an expanding TOC, these lists yield a
more predictable display  and are especially suitable for the Dewey
Classification records where the shelf number provides an additional  pointer
into the hierarchy.  Moreover, hierarchies have looser semantic connections
between nodes at the same level than the tables of contents of documents and
books.
<p>

<p>
<p>
<img src="figures/allen2.gif"><p>
<i>Figure 2:  Interface after Search for "Shannon"</i><p>
<p>
The Current Node list displays items which allow the user to navigate deeper
into the hierarchy.  Initially the current nodes are the top-level
classification terms (as shown in Figure 1). When nodes lower in the hierarchy
exist, the higher-level nodes are marked with an "=" . The Subject Hierarchy
list displays the hierarchy nodes above the  books currently being displayed on
the Book Shelf.  Clicking on one of the higher-level nodes causes the immediate
descendants of the selected node to be displayed in the Current Node list. In
addition, the Shelf displays books at the selected node. Figure 2 shows the
Current Node list with three choices.  A search on the author name "Shannon"
has just been completed and eight books were returned.  The Shelf displays only
those eight books and their immediate parent nodes in the default HitsOnly
display mode.  The Subject Hierarchy has opened to the node that contains  the
first matched book.<p>
	Counts of search matches are posted beside the node labels and they can help
the user locate relevant items. For instance, in Figure 2 the user can see that
only 2 of the 8 books matching this search are under the heading of <b>000.0
Generalities. </b>This suggests it might be worthwhile to examine those books
under other parts of the hierarchy.<p>
<b><p>
2.2.2.  Book Shelf and Book Display Widgets</b><p>
The Book Shelf (right side in Figures 1 and 2) does not attempt to mimic a
physical book shelf. Rather, it is a very long list of records. The user
typically has only a partial view of the list.  The view of the Shelf is
limited by the number of items that can be displayed on the screen at any time
and by options that determine which records and which attributes of those
records are to be displayed. The selection of displayed attributes is
determined  in response to iterative queries that control a filter mask. Thus,
the Book Shelf is "dynamic" in the same sense as the dynamic graphical query
interface described in [15] and  used in  data viewers [e.g., 13]. Nodes in the
classification system immediately  above the selected books are also presented
on the Shelf. The default display for records on the Shelf shows titles. The
user can select other record attributes to be presented on the Shelf such as
the author name, the length (number of paper pages),  and the publisher. In the
current implementation, the Book Shelf widget list contains a very  large
number of records and it is slow to reinitialize.
<p>

<p>
<img src="figures/allen3.gif"><p>
<i>Figure 3:  Interface for Browsing Computing Literature by Computing Reviews Categories </i> <p>
<p>
When the user  clicks on a book title, a Book Display widget is  opened showing
the full record for that book. One Book Display option allows the user to
request Similar Books. This searches for books similar to the displayed book
where similarity is determined by one of the retrieval algorithms, rather than
by shelf proximity. This option spawns a new search that, when it follows an
initial search, it is a type of relevance feedback. Because the book records
are short, the Similar Book requests yield some spurious matches. As with the
initial searches, posting similar-book hits against the Subject Hierarchy
allows the user to follow the classification semantics to identify relevant
items. The Book Display contains further options including one for presenting
other books by the same author.  This links books across leaf nodes of the
hierarchy.  It has not been fully implemented because many of the connections
would have to be made by hand. <b><p>
<p>
2.2.3. Fielded Search and Attribute Selection Widget</b><p>
The Fielded Search widget (lower left in Figure 1) generates searches on book
record fields  such as title, author, and subject descriptors.  Two search
algorithms are available. One uses a Boolean OR of matched terms.  The second
is based on term matches between the query and the document terms weighted by
term frequencies. <p>
Attributes, such as the library location, whether the document has been checked
out, and the type of document, which may be  used to select subsets of books
are controlled by menus. By selecting various library locations it is possible
to examine the virtual Shelf for any one location or any combination of
locations of the Bellcore Technical Libraries.<p>
<b><p>
2.3. Examining the Book Shelf after a Search </b><p>
Following a search the user can step forward and backward to the next matched
book  with the Up_Book and Down_Book buttons. These buttons provide a
convenient way to move quickly through the hierarchy while allowing the user to
keep a sense of the location within the hierarchy. The Up_Node and Down_Node
buttons allow the user to move even more quickly by jumping from one node which
contains hits to the next.<p>
	The hierarchical interface is most effective for comparing documents of
relatively similar retrieval values because it does not display information
about the quality of the matches.  That is, unlike typical IR systems that
present ranked similarity, the interface based on hierarchical structure does
not readily show graded retrieval scores. Thus, a <i>titration</i> procedure
was developed to select a reasonable number of titles to be displayed. In the
current implementation, the system attempts to find a threshold  to display
more than 5 but less than 100 books.<p>
The Previous_Book_In_Order and Next_Book_In_ Order buttons let the user examine
books in the ranked order  in which they matched the query.  It is easy for the
user to lose orientation because the books are not necessarily in order and the
user viewing them  jumps around the hierarchy. Furthermore, if the user
requests Next_Book_In_Order after all of the books in the initial (titration)
set have been viewed, the set expands by relaxing the threshold. The user is
notified of this change in the  display on the Feedback Window (lower left in
Figure 1), but the hit counts are also updated and this may confuse the user.<p>
<b><p>
2.4.  Additional  Features</b><p>
Several  additional widgets are under development.  Graphics can often help
orient users with large amounts of data [9].  For the hierarchical OPAC, an
active dendrogram is being developed like the one in [3]. The graphical view
can be used in many ways, such as displaying search hits. Another feature that
is being developed is a personalized shelf on which the user can create
relevant collections.<b><p>
<p>
3.   <i>Computing Reviews</i>  Classifications </b><p>
The computer science literature as organized by  <i>ACM Computing Reviews
</i>(CR) classification system [1]. Unlike the Dewey Classification, documents
in the CR system may appear in several different parts of the hierarchy. There
are several relatively orthogonal dimensions in the CR classification system.
In that respect, it is like a facetted classification system [14].<p>
	Figure 3 shows a partially operational interface for browsing the computer
science literature by means of the Computing Reviews classification. Major
categories are chosen from the  Facets widget at the upper left.  These
selections open cascaded menus which display lower-level categories.  When the
"+" to the right of the facet label  is selected, the facet is added to the
Current Constraint list (lower left). In order to give context to the selected
constraints, their parents are displayed in parentheses on the Constraint List.
The constraints are ANDed together to determine which documents are displayed
in the Shelf. This is analogous to the Hits Only mode of the OPAC interface. Of
course, the constraints propagate to all their descendants. Constraints can be
dropped from the Constraint List by clicking on the "-". <p>
A second way to employ the CR Classification would be to search for an article
of interest and then find other articles that have the similar classifications.
This is a type of <i>lateral link </i>across the hierarchy.  For instance,
among Doctoral Dissertations that were cited in <i>Computing Archive </i>[1] as
having been published in 1992, the most frequent associate of category H.3.3
(Information Storage and Retrieval) was H.3.5 (On-line Information Systems).
Thus, users who access articles under H.3.3 might be informed that articles
likely to be  related to their interest may be found  under H.3.5. <p>
<b><p>
4. Discussion</b><p>
Interfaces have been developed for accessing collections of book and document
archives.  Although no formal user testing has been undertaken, informal tests
suggest that the interfaces are intuitive. The greatest problem appears to be
complex interactions among features. For instance, with Hits Only mode there
are often too few selections to fill the Shelf Display; thus, the Up_Book and
Down_Book buttons have no effect.  In addition, some test users have suggested
that the elision in the Hits Only mode should apply to TOC as well as the Book
Shelf.<p>
These interfaces could provide the basis for access to additional electronic
information sources. Clearly, it would be possible to have the short document
records point to the full text of the books and documents. Moreover,
encyclopedia articles describing authors could easily be presented. Likewise,
book reviews, citation statistics, circulation data, and user annotations could
be included as part of the Book Display. Conversely, an electronic encyclopedia
could access the OPAC for bibliographies.<p>
Overall, these interfaces attempt to demonstrate that the structure of a
classification system can be a useful aid for searching and navigating a
digital library catalog.  Techniques such as  titration  and lateral linking
show how IR and Hypertext approaches can be combined.  It is also worth noting
that similar approaches could be applied to a search-based OPAC [e.g., 7] and
display similar books  for items  that match  a query.  In any event, while the
Dewey  Classification System provides links to other, presumably related,
documents, there are many other dimensions of similarity  among collections of
books and documents (e.g., author, citations, publisher) that  could be used
for linking as well. It remains to be seen whether all of these dimensions can
be  coordinated  into usable interfaces.<b><p>
<p>
<p>
Acknowledgments</b><p>
 The Dewey Decimal Classification was used with the permission of the Online
Computer Library Center (OCLC). The collection of book records used here was
developed for test purposes and is not a Bellcore product. <b><p>
<p>
<p>
References</b><p>
[1]	ACM, ACM Computing Reviews Classification System. <i>ACM Computing Reviews
35 </i>(1994) 4-44.<p>
<p>
[2]	ACM,  <i>ACM Computing Archive, </i>1994, New York.<p>
<p>
[3]	Allen, R.B., Obry, P., and Littman, M., An Interface for Navigating
Clustered Document Sets Returned by Queries. <i>Proceedings of SIGOIS
</i>(Milpitas, CA, June) ACM, New York, 1993, 203-208.<p>
<p>
[4]	Borgman, C.L., Walter, V.A., Rosenberg, J.B., and Gallagher, A.L.,
Children's Use of a Direct Manipulation Library Catalog. <i>ACM SIGCHI Bulletin
</i>23(1991) 69-70.<p>
[5]	Egan, D., Lesk, M.E., Ketchum, D., Lochbaum, C.C., Remde, J.R., and
Landauer, T.K., Hypertext for the Electronic Library?  CORE Sample Results.
<i>Hypertext '89 </i>(Pittsburgh, Nov.) ACM, New York, 1989, 299-312.<p>
<p>
[6]	Egan, D., Remde, J.R., Gomez, L.M., Landauer, T.K., Eberhardt, J., and
Lochbaum, C.C., Formative Design and Evaluation of <A HREF="http://superbook.bellcore.com/SB/SBhome-page.html">SuperBook</A>. <i>ACM
Transactions on Information Systems 7 </i>(1989) 30-57.<p>
<p>
[7]	Fox, E.A., France, R.K., Sahle, E., Daoud, A., and Cline, B.E., Development
of a Modern OPAC: From REVTOLC to MARIAN. <i>Proceedings of SIGIR</i>
(Pittsburgh, June) ACM, New York, 1993, 248-259.<p>
<p>
[8]	Frisse, M.E., Cousins, S.B., and Hassan, S., WALT:  A Research Environment
for Medical Hypertext. <i>Hypertext '92 </i>(San Antonio, Nov.) ACM, New York,
1992, 389-394.<p>
<p>
[9]	Lesk, M.E., What To Do When There's Too Much Information?  <i>Hypertext '89
</i>(Pittsburgh, Nov.) ACM, New York, 1989, 305-318.<p>
<p>
[10]	Markey, K. and Demeyer, A.N. <i>Dewey Decimal Classification Online
Project: Evaluation of Library Schedule and Index Integrated into the Subject
Searching Capabilities of an Online Catalog,</i> OCLC, Dublin OH, 1986,
OPR/RR-86-1.<p>
[11]	OCLC (Forrest Press), <i>Electronic Dewey</i>.  Dublin OH, 1993.<p>
<p>
[12]	Pejtersen, A.M., A Library System for Information Retrieval Based on a
Cognitive Task Analysis and Supported by an Icon-Based Interface.
<i>Proceedings of SIGIR </i> (Cambridge, MA, June) ACM, New York, 1989,
40-47.<p>
<p>
[13]	Swayne, D.F.,  Cook, D., and Buja, A., Interactive Dynamic Graphics in the
Xwindow System with a Link to S. <i>Proceedings of the Section on Statistical
Graphics of the American Statistical Association </i> (Atlanta) ASA , 1991,
1-8.<p>
<p>
[14]	Vickery, B.C.,  <i>Facetted Classification</i>. New Brunswick, Rutgers
University Press, 1965.<p>
<p>
[15]	Williamson, C. and Shneiderman, B., The Dynamic HomeFinder: Evaluating
Dynamic Queries in a Real-Estate Information Exploration System. <i>Proceedings
of SIGIR </i> <i> </i>(Copenhagen, June) ACM, New York, 1992, 338-346.<p>
<p>
<p>
<p>
<p>
<!--#include virtual="/DL94/footer.ihtml" -->
</body></html>
