<html>
<head>
<!-- This document was created from RTF source by rtftohtml version 2.7.3 -->
<title>
DL94: Mixed-Media Access
</title>

</head>
<body>
<!--#include virtual="/DL94/header.ihtml" -->

<h1>
Mixed-Media Access
</h1>

<author>

Francine Chen, Marti Hearst, Julian Kupiec, Jan Pedersen, and Lynn Wilcox
</author>
<i>

Xerox Palo Alto Research Center, 3333 Coyote Hill Road, Palo Alto, CA 94304, {fchen,hearst,kupiec,pedersen,wilcox}@parc.xerox.com




</i>
<h3>
Overview</h3>
Effective
information access is crucial for any digital library, since the availability
of rich online-document repositories is not useful without a methodology for
finding items of interest.  In our view an effective information access system
enables one to search and present information in a variety of ways, making use
both of conventional search and browsing methods, as well as less conventional
methods involving automated highlighting, emphasis detection, thematic
thread-following, and summarization.  Moreover, these multiple access
mechanisms should operate seamlessly and robustly over multimedia document
types.  Our work to date integrates access to information in a wide variety of
digital media, including scanned text, scanned images, digitized audio, and
digitized video, as well as traditional plain text collections.  Below we
outline recent work that illustrates our approach to multimedia access. <p>

<h3>

Mixed-Media Keyword Search</h3>
Keyword
search and similarity search, currently used to access textual information, can
be directly extended to other media types.  Plain text is usually input with a
keyboard and is usually retrieved via a query posed using the same device.
Analogously, digitized spoken text can be queried by spoken keyword input;
[WB92] describes work in this vein.  Extending the analogy once again, scanned
textual images can be searched by selecting a region of the image containing
the desired keyword; [KB90] describes how this can be accomplished.<p>

We are also developing retrieval techniques that cross media.  In our
word-image spotting system, partially specified keywords or phrases which have
been entered by a user through a keyboard are detected and located in images,
implementing a type of image "grep'' [CWB93].  We also have built a system
which enables a user to access documents from a large plain text corpus by
speaking the words of a query.  The system exploits the fact that word
alternatives that are the result of recognition errors are not likely to be
semantically correlated, whereas the words intended by the speaker are
semantically related and generally occur close together in text [KKB94].<p>

<h3>



Information Threads</h3>
Rather
than viewing information as an uninterrupted sequence, we are developing
methods to identify information threads in multiple media.  For example, audio
information may be composed of multiple sources of sounds, as in a conversation
between two or more people.  We have developed a method for segmenting an audio
stream based on speaker identification [WCKB94].  Similarly, in video, scene
changes occur which may signal a change in topic or speaker, and we are
developing a method for identifying scene changes from a video signal.  We are
also developing techniques for identifying changes in topic in plain text
[hearst94]; these methods may also be applicable to scanned images of text and
conversations. <p>

<h3>

Mixed-media Browsing and Summarization</h3>
The
user's goals should determine what kind of information is displayed after a
search.  When scanning for specific information, presentation of a portion of a
document may be all that is needed.  When trying to get an overview of a sample
of documents, summarization of the material may be more appropriate.  When a
user is trying to determine what is contained in a corpus, tools for browsing
may be desired.  We are developing these tools in a variety of media.  <p>

Browsing provides a way to view the contents of a text collection without
requiring the user to input search terms.  We have developed Scatter/Gather, an
unsupervised method for organizing the contents of very large text collections.
The method yields semantically coherent clusters that can be browsed, a subset
chosen and recombined, and the new results, showing a select subset of the
collection, can again be browsed according to semantically coherent
clusters[CKPT92].  <p>

We have also developed methods for selecting excerpts to create summaries of
documents.  One method summarizes plain text documents, and has been extended
to scanned images of text.  We have also developed a summarizer for audio
information that makes use of prosodic cues to detect emphatic (and hence
hopefully important) excerpts that are then combined to create a summary
[CW92].  In addition we have developed methods to automatically partition an
audio signal based on speaker identity to enable quick scanning and browsing
[WCKB94].<p>

<h3>
Summary</h3>
We
have developed many of the components of a system that will provide multiple
search and viewing techniques for multimedia information.  Our approach is to
integrate access methods across various media types and to provide mixed-media
access when appropriate.  In the future we plan to determine how best to
combine various search and display techniques to create a seamless interface to
multimedia information.  <p>

<h3>


References
</h3>

[CWB93] Francine R. Chen, Lynn D. Wilcox, and Dan S. Bloomberg.  Detecting and
locating partially specified keywords in scanned images using hidden Markov
models.  In Proceedings of the International Conference on Document Analysis
and Recognition, Tsukuba Science City, Japan, October 1993.  <br>
<br>
[CW92] Francine R. Chen and Margaret M. Withgott.  The use of emphasis to
automatically summarize a spoken discourse.  In Proceedings of the
International Conference on Acoustics, Speech and Signal Processing, San
Francisco, CA, March 1992.  <br>
<br>
[CKPT92] D. R. Cutting, D. R. Karger, J. O. Pedersen, and J. W. Tukey.
Scatter/gather: A cluster-based approach to browsing large document
collections.  In Proceedings of SIGIR'92, Copenhagen, Denmark, June 1992.  Also
available as Xerox PARC technical report SSL-92-02.  <br>
<br>
[hearst94] Marti A. Hearst.  Multi-paragraph segmentation of expository text.
Proceedings of the 32nd Annual Meeting of the Association for Computational
Linguistics, Las Cruces, NM, 1994.  To appear.  <br>
<br>
[KB90] Gary Kopec and Steve Bagley.  Editing images of text.  In Proceedings of
Electronic Publishing '90, Cambridge, England, 1990. Cambridge University
Press.  <br>
<br>
[KKB94] Julian Kupiec, Don Kimber, and Vijay Balasubramanian.  Speech-based
retrieval using semantic co-occurrence filtering.  In Proceedings of the ARPA
Human Language Technology Workshop, Plainsboro N.J., March 1994.  <br>
<br>
[WB92] Lynn D. Wilcox and Marcia A. Bush.  Training and search algorithms for
an interactive wordspotting system.  In Proceedings of the International
Conference on Acoustics, Speech and Signal Processing, San Francisco, CA, March
1992.  <br>
<br>
[WCKB94] Lynn D. Wilcox, Francine R. Chen, Don Kimber, and Vijay
Balasubramanian.  Segmentation of speech using speaker identification.  In
Proceedings of the International Conference on Acoustics, Speech and Signal
Processing, Adelaide, Australia, April 1994.<br>
<br>
<br>
<br>
<br>
<br>
  
</p>

<!--#include virtual="/DL94/footer.ihtml" -->
Last Modified: <!--#echo var="LAST_MODIFIED" --> <br>
</body>
</html>
