Querying, Navigating and Visualizing a Digital Library
Catalog
Aravindan Veerasamy, Shamkant Navathe
College of Computing
801, Atlantic Drive
Georgia Institute of Technology
Atlanta, Georgia 30332-0280, USA.
Phone: 1-404-894-8791
E-mail: veerasam, sham@cc.gatech.edu
ABSTRACT
We describe the design of an User Interface for a ranked output
Information Retrieval system that integrates querying, navigation and
visualization in a seamless fashion.
Highlights of the system include the following:
- Using a visualization scheme, the interface provides visual
feedback to the user about how the query words influence the ranking
of retrieved documents.
- By simple drag-and-drop operations of objects on
the screen, the
interface facilitates a naive end-user in constructing complex
structured queries and in providing relevance feedback.
- To suit the evolving information needs of the user, the
interface supports navigational features such as browsing
documents by specific authors and browsing the Table of Contents of
publications.
- The interface integrates an online thesaurus which provides
words related to the query that can be used by the user to expand
the original query.
By providing a rich set of features, the interface coherently supports
a wide spectrum of information gathering tactics for different classes
of users.
KEYWORDS: Visualization of results, visual query languages,
query processing, information retrieval
WALK-THROUGH OF A TYPICAL USER SESSION
A typical user session along with the response of
the interface for every user action is described below using an
example (refer to Figure 1).
Figure 1. Sample querying session. The window titled ``Positive
Objects'' is colored green and the window titled ``Negative Objects''
is colored red. All ``incantations'' of an object in the
display are colored green/red whenever it is classified as
positive/negative.
- The user types in his/her free form textual query in the query
window. In the example shown in figure 1, the query is ``ozone
depletion and melanoma''
- As every query word is typed in, the system consults an on-line
thesaurus and displays words and phrases related to the query word in
an adjacent window.
- At any point during the session the user can ``drag-and-drop''
(using the mouse) any
of the related words/phrases into the positive and negative
windows. Internally the system expands the query by treating the
positive words/phrases as synonyms of the corresponding query
word. The negative words/phrases are included in the query with a NOT
operator. For example, if for a query word ``bank'', the phrase
``financial institution'' is classified as positive and ``river bed''
is classified as negative, the corresponding internal query would be
``#SYNONYM( bank #2( financial institution )) #NOT( 2
river bed))''. The interface
facilitates construction of such
structured queries by simple ``drag-and-drop'' operations of the mouse.
In the example in figure 1, a phrase, namely ``skin cancer'' that is
related to the query word ``melanoma'' has been classified as
positive. Internally the systems treats the phrase as a synonym
of ``melanoma''.
- After the user types in the query, the system evaluates the
query and displays the titles of top-ranked documents in the ``Query
Results'' window.
- The user examines the query result. Clicking any title
with the mouse will bring up the full document.
Figure 2. Visualization of results for the base query.
- Figure 2 is a visualization of the query results for the base
query ``ozone depletion and melanoma''. The leftmost column of bars corresponds
to the top-ranked document, with the columns progressing to the right
representing progressively lesser ranked documents. We can see that
almost all of the 150 documents were retrieved because they contained
the query words ``ozone'' and ``depletion''. Only 15 of the top 150
documents have anything to do with melanoma. Further, of those 15
documents, only one discusses ozone (the top-ranked document --
leftmost column in Figure 2.) Thus we can clearly see that either there
are not many documents dealing with melanoma and ozone or the
ozone-layer concept drowns out melanoma during retrieval.
- The user can classify any document as being relevant or
non-relevant by ``drag-and-drop''ping the document into positive and
negative windows. In the example in figure 1, the user has classified two
documents titled ``CFC-free integral skin foams for steering wheels.''
and ``Video comparator system for early detection of cutaneous
malignant melanoma'' as positive. The document titled ``Symposium on
chemistry of the Atmosphere'' has been classified as negative.
- The user can also highlight a portion of a document and
``drag-and-drop'' that portion into the positive and negative windows. The words in
the highlighted document portion are used to expand the query in the
next iteration.
- During the next iteration, the reformulated query with the
feedback information is processed by the system resulting in an
improved ranking of documents.
Figure 3. Visualization of results for query with feedback information.
- Figure 3 is a visualization of the results of the revised query
(i.e., the query with relevance feedback information). The figure
shows that there are four
documents dealing with melanoma and ozone. (Note that the documents
which deal with melanoma and it's synonym skin cancer are displayed in
the same histogram titled ``melanoma'', since melanoma and skin cancer
represent the same query concept). Thus there are three additional
documents retrieved due to the effect of classifying the phrase ``skin
cancer'' as a synonym of ``melanoma''. But still there are not many
documents about melanoma compared to ozone depletion.
Our experience with this visualization scheme has shown it to be a
useful tool for identifying different facets of the query, as in this
case, the facets are melanoma and ozone.
- Using any document as a starting point, the user can browse
through the list of other articles in the same journal issue or
conference proceedings with a help of a Table-of-Contents which is
generated automatically. This is useful in many cases such as
when the user comes across a special-issue of a journal devoted to the
search topic.
- The user can also browse through the list of articles written by
the same author. For example, an author who has written an article about
the effects of ozone layer depletion on skin cancer has probably
authored more articles along the same lines, and the user might want
to see them.
CONCLUSION & FUTURE WORK
A prototype interface [VHN95] written in Tcl/Tk [Ousterhout]
using a ranked output information retrieval
system, INQUERY [CCH92]
for a library catalog, Compendex containing about 300,000 documents
has been implemented. The interface facilitates the inherently
interactive nature of the information seeking process.
``Drag-and-drop'' operations (using the mouse) form the basis of
interaction encouraging the user to provide feedback
information to the system and helps in the dialog between the user and
the system. Almost any information on the screen can be used by the
user to provide feedback information. An online thesaurus, WordNet
[Miller90a], is integrated with the interface to form a single
system.
The interface also supports a visualization scheme which illustrates
how the query results are related to the query words. Visualizing the
results of the query keeps the user more informed on how the system
computed the ranking of documents. With this information, the user is
better equipped to reformulate the query for the next iteration.
The interface also has facilities to browse the Table of Contents of
publications and to browse the list of articles written by a specific
author.
It is our opinion that integrating all of the above features in a
seamless interface leads to an interplay between different items that
is much more beneficial than the sum of the individual items in
isolation.
We are in the final stages of implementation, and in future, we intend
to test the effectiveness of the interface by
conducting studies on how library users, experts looking for detailed
information as well as naive users, interact with the interface
and how they react to ranked output systems as opposed to existing
boolean systems. We plan to include a domain-specific thesaurus for
the engineering domain from Compendex and a collection-specific
word-association thesaurus if possible.
ACKNOWLEDGEMENTS
We are thankful to Dr. Bruce Croft for letting us use the INQUERY
retrieval system. We are indebted to the Dean of Georgia Tech Library
Ms. Miriam Drake and Engineering Information Inc without whom it would
have been impossible to use Compendex data for the experiment.
Many thanks to Dr. Marti Hearst whose Tcl/Tk code
for the SMART system was helpful as a spring board for us to
write the interface.
Support in part by ARPA contract No. F33615-93-1-1338 is also appreciated.
REFERENCES
[CCH92]
J.P. Callan, W.B. Croft, and S.M. Harding.
The inquery retrieval system.
In Third International Conference on Database and Expert Systems
Applications, September 1992.
[Miller90a]
George A. Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and
Katherine J. Miller.
Introduction to WordNet: An on-line lexical database.
Journal of Lexicography, 3(4):235--244, 1990.
[Ousterhout]
John K. Ousterhout.
Tcl and the Tk Toolkit.
Addison-Wesley, 1994.
[VHN95]
A. Veerasamy, S. Navathe, and S. Hudson.
Visual interface for textual information retrieval systems.
To appear in Proceedings of the Third Conference on Visual
Database Systems. IFIP 2.6, 1995.