<html>
<head>
<title> 
DL94: A Patterned Injury Digital Library for Collaborative Forensic Medicine
</title>
</head>

<body>
<!--#include virtual="/DL94/header.ihtml" -->

<h1>
A Patterned Injury Digital Library for Collaborative Forensic Medicine
</h1>
<p>
David Stotts[1], John Smith[1], Prasun
Dewan[1], Kevin Jeffay[1], F. Donalson
Smith[1], Dana Smith[1], Steven Weiss[1], 
James Coggins[1],
and William Oliver, MD[2,1]
<p>
[1]<em>Department of Computer Science ,CB 3175, Sitterson Hall, Chapel Hill, NC
27599-3175</em><p>

[2]<em>Armed Forces Institute of Pathology, Walter Reed Army
Hospital, Washington, DC</em><p>

<h3>Abstract</h3><p>

The UNC CS Collaboratory, in conjunction with the  Armed Forces Institute of
Pathology and the Office of the North Carolina Chief Medical Examiner, is
constructing a digital library to augment the collaborative practice of
forensic medicine.  Called the Repository of Patterened Injury Data (RPID), the
project is putting into digital form several important and heavily used
collection of medical data, stored and presented to allow joint consultation by
several pathologists.  We are researching issues in data storage, retrieval,
and delivery; in shared distributed user interfaces; and in collaborative work
patterns for forensic medicine.<p>
<b><p>
Keywords:  </b>Forensic medicine, pathology, patterned injury, hypermedia,
collaboration, medical images, browsing, search, retrieval.<b><p>
<p>
<p>
<p>
1.  Project Overview</b><p>
Prevention of violent crime continues to be an important national priority, and
it is increasingly important in the daily lives of our citizenry.  Those
actively involved in investigating these crimes and in apprehending the people
who commit them must be able to pool their knowledge and expertise to provide
maximum effectiveness.  The digital library testbed we are creating and the
research we will do in relation to it will contribute to the more effective
interaction of forensic pathologists.  More specifically, it will enable
forensic pathologists to consult a large testbed of data from forensic
medicine, share new case data, apply enhancement and analyses algorithms to
images and consult with one another.  Our research will also apply broadly to
other distributed groups of professionals who work intensively and over
extended periods of time with multimedia data, including substantial volumes of
image data.  Examples include astronomers, art historians, and biochemical
researchers.<p>
We are creating a testbed digital library, called the Repository of Patterned
Injury Data (RPID).  It includes test, images, audio and video sequences,
charts, diagrams, drawings, and numerical data.  Information in the repository
will support forensic pathologists collaborating on medical cases involving
patterned injuries.  Patterned injuries are those having distinctive
characteristics as a result of the object causing the injury, such as a hammer
or tire tread.  Identifying the source of an injury provides valuable
information not just for determining the cease of death but also in
apprehending and prosecuting those responsible in cases where a violent crime
is involved.<p>
Individual data objects are linked in RPID into a hypermedia graph structure to
support flexible browsing.  Access is to be provided by query-based search both
on attributes of images and on the text of accompanying material.  The computer
infrastructure provided by our project for working with the RPID will also
include image enhancement, collaboration, and communication functions as
integral parts of the environment.<p>
The initial data for the repository will come from the extensive archives of
AFIP; from the Office of the Armed Forces Medical Examiner's (OAFME) Lindenberg
collection of microscope slides and pictorial data; and from the Milton Helpern
Forensic Pathology Museum.  Other specialized collections, such as images of
tire tread marks, are maintained by individual pathologists and researchers; we
intend to bring these data into the repository, once it has been established.<p>
After the initial construction phase, we will proceed with connection to other
consultation sites.  Dr. Thomas Clarke in The Office of the Chief Medical
Examiner for the State of North Carolina is enthusiastic about participating in
this effort.  Dr. Clarke's office performs 1300 autopsies a year, and funds
2500 additional ones; all these cases generate data for their collection.
Their forensic archive goes back to 1970, with all data except photographs
already in digital form.  They are willing to incorporate this data into the
RPID, and they will be the first intermediate site we connect.<p>
We foresee approximately 2,000 images with associated case data being processed
in the initial year, with the addition of 5,000 images in year two, and growing
to at least 20,000 total images with case data by project's end.  With the
inclusion of autopsy case data from the NC Chief Medical Examiner's Office, and
with data being integrated from other experimental sites beginning in year 2,
the amount of forensic material could grow to as much as 70,000 images with the
associated case data.<p>
Development work is taking place at two sites.  The testbed collection of
digital data is being constructed by the Armed Forces Institute of Pathology
(AFIP), housed at the Walter Reed Army Hospital in Washington DC.  Work on the
computer and communications infrastructure and a related program of
collaboration user studies is being done by the Collaboratory Project at the
University of North Carolina at Chapel Hill (UNC).  This infrastructure will be
based on the Artifact-Based Collaboration (ABC) System developed by the UNC-CH
Collaboratory project under the NSF CTCT program [12, 13, 25].  This will allow
us to put a usable infrastructure in place quickly and also to carry out
further computer systems research to develop new capabilities that can be
tested and added incrementally to the installed system.<p>
Users of the completed testbed will include three types of forensic
professionals: consulting pathologists; forensic scientists specializing in
patterned injury; and field processionals, including pathologists,
criminologists, forensic scientists, and forensic odontologists.  They will be
situated at some twenty locations across the country, including AFIP; the UNC
Medical School; the U.S. Department of Justice; the New Mexico State Medical
Examiner's Office; the North Carolina Chief Medical Examiner's Office; and the
Office of the Armed Forces Medical Examiner (OAFME).  The initial repository
will grow as a result of this use by forensic pathologists throughout the
nation.<p>
Use of the testbed will be closely monitored by project researchers, both by
human observers and by automated tracking of users' actions, in order to
document changes in the working practices of this community and to evaluate
specific systems research components.  Issues we will investigate include how
to distribute multimedia data for effective access; how to organize the data
for both hypermedia access and query-based  search; how use of the digital
library will affect the working practices of forensic pathologists; how to
assist collaboration relative to the digital library; and how to build a
collaborative infrastructure that will adapt to heterogeneous environments,
both for mixed data forms and for mixed computer hardware and software
platforms.<p>
<b><p>
1.1. Relationship of RPID to efforts elsewhere</b><p>
While there are several projects reported in the literature that provide
hypermedia data in a medical context [7, 1] the RPID project has some unique
aspects.  The previous efforts have not been in the context of large or widely
distributed data sets.  They have allowed physicians to simulate in hypermedia
their research notebooks, for example; or they have provided centralized
facilities for small library subsets.  A unique emphasis of the RPID is
enabling of collaboration among users of the library.  Another unique aspect of
RPID is the research focus on building an infrastructure that will allow the
easy integration of new user tools and source data as the library grows.  Thus,
we are designing for, and investigating the practicality of continual expansion
of the data and increasing distribution among the library sites. <p>
There are also numerous projects that provide image storage and manipulation
facilities to medical personnel using images like radiographs.  These systems
tend to focus heavily on the graphics capabilities, and are little more than
traditional databases otherwise.<b><p>
<p>
2. Project Requirements and Scenarios</b><p>
Our research on the Repository for Patterned Injury Data (RPID) will produce a
large digital library of medical data (specifically, materials represented in
various media, comprising patterned injury case records) for the collaborative
use of forensic pathologists.  RPID data will include test, images, video
sequences, charts, diagrams, drawings, and numerical data.  The project will
make contributions to several areas of knowledge:<p>
<i><p>
Organization and Structure of Information</i><p>
<p>
* hypermedia data organization of a large collection of disparate materials<p>
* strategies for moving back and forth between IR (searching) and hypermedia
browsing<p>
* strategies for constructing argumentative paths within a hypermedia data
model<i><p>
<p>
Computer and Communication Technology</i><p>
<p>
* intelligent agents to traverse hypermedia structures and perform a variety of
functions, including searching and linking<p>
* tools and protocols for defining links into video data<p>
* distributed storage and access to data using local- and wide-area networks<p>
* adaptable caching and replication strategies for image-intensive data<p>
* new architecture and implementation for supporting computer conferencing of
arbitrary applications at the operating system level<i><p>
<p>
User Behavior</i><p>
<p>
* detailed task-based portraits of a user community prior to their use of the
digital library, during their adaptation to it, and after they become familiar
with it<p>
* studies of the effects of digital library and collaboration technologies on
communication patterns within the user community<p>
* evaluation of search and browsing behaviors<p>
* evaluation of the effect of specific computer and communication
infrastructure features on task behavior<p>
* examination of privacy and security issues within the user community and
development of appropriate policies<p>
* evaluation of effects of digital library technology on the conduct of science
within a specific professional community	<b><p>
<p>
2.1. Motivation</b><p>
Prevention of violent crime continues to be an important national priority, and
it is increasingly important in the daily lives of our citizenry.  Those
actively involved in investigating these crimes and in apprehending the people
who commit them must be able to pool their knowledge and expertise to provide
maximum effectiveness.  The digital library we will create, and our research
with it, will enable more effective interaction of forensic pathologists.
Specifically, it will enable forensic pathologists to consult a large testbed
of patterned injury data, share new case data, apply enhancement and analysis
algorithms to images, and consult remotely with one another.  Our research
results will apply broadly to other distributed groups of professionals who
work intensively and over extended periods of time with multimedia data,
including substantial volumes of image data.<p>
The analysis of patterned injury in forensic pathology presents a challenge
that draws uniquely upon both the medical and forensic expertise of the
investigator.  As a problem in wounding, it is a medical challenge.  As a
problem in image enhancement and pattern analysis, it is a forensic challenge.
This skill benefits heavily from experience.  The investigator much have both
experience in wound analysis and a knowledge of the universe of discourse,
wherein lies the object which caused the injury.<p>
For instance, discerning the general properties of an object (that it is
rounded, sharp-edged, etc.) is well within the general knowledge base of most
forensic pathologists.  Recognizing that a specific injury is likely to have
been caused by, say, a socket wrench requires the examiner not only to know
about pathology but also to have some knowledge of automotive tools.  This need
for an encyclopedic recall of objects and object properties makes much
patterned injury analysis difficult, especially when injuries occur in areas
which abound with specialized tools, implements, and objects (<i>e.g.,</i>
construction sites, factories, hair-dressing salons, etc.).  Moreover,
recognizing the types of marks made by partial or oblique blows is a further
challenge in analyzing the geometry of impressions.<p>
No forensic pathologist or odontologist can be expert in all areas of hardware
manufacture and utilization.  Instead, we must rely on the experience of our
colleagues.  Unfortunately, when an expert in the field retires, a wealth of
specialized experience is lost, usually along with a career's worth of valuable
patterned injury data and analysis results.  The RPID project is an attempt to
save such knowledge and to make such experience permanently available to
pathologists and investigators across the country.  We will do this by
establishing an electronic registry of solved patterned injuries, and by
developing a computing and communications infrastructure wherein pathologists
and investigators can electronically access the data, search for cases germane
to their current problems, and conveniently consult with one another.<b><p>
<p>
2.2. User community requirements</b><p>
Primary characteristics of the users of the library for forensic medicine
include:<p>
<p>
* Highly skilled and experienced group of expert users.<p>
* Users widely distributed throughout the nation.<p>
* Work intensively over sustained periods of time and require access for both
casual browsing and quick retrieval of specific information.<p>
* Work with multiple forms of data, including photographic images, computer
enhanced images, text, video, audio, and statistical data.<p>
* Consult and communicate with one another in direct reference to the data.<p>
* Produce new information (<i>e.g.,</i> sequences of images, both photographic
and generated) that is integrated with archival data.<b><p>
</b><p>
A computer and communications infrastructure that can support these users in
their accustomed patterns of work as well as enable them to work in new and
more productive ways in the future must have the following features:<p>
<p>
* Augment the skills and knowledge of users, not try to replace them or reduce
them to algorithm.<p>
* Provide support for multimedia, initially including image, text, video, and
statistical data, but extensible to other forms.<p>
* Provide mechanisms to link data in <i>ad hoc</i> as well as systematic
ways.<p>
* Support extensive, free-ranging browsing.<p>
* Support automated search according to coded attributes (<i>e.g.,</i> specific
characteristics of wounds, such as length, depth, shape, etc.).<p>
* Support easy integration of new material into the RPID.<p>
* Provide an open architecture that can accommodate arbitrary applications,
such as image analysis and enhancement tools, and integrate their output into
the RPID for future use.<p>
* Support collaborative viewing and interaction with RPID data and associated
applications.<p>
* Support conversations and discussions among small groups of users.<p>
* Support exchange of case-specific collections/structures of information,
including commentary, for asynchronous consultation/collaboration.<b><p>
2.3. System Usage</b><p>
The RPID will use the collaborative computing and communications infrastructure
developed by the Collaboratory at UNC-CH for dissemination of, exploration of,
and interactive consultation with the medical case data from AFIP.  There are
several distinct forms of interaction required by this digital library:<p>
<p>
* <i>exploration</i><p>
<p>
The RPID will be organized along feature vectors incorporating both image-based
geometric information and "expert" knowledge.  Associated with each wound image
will be the image information regarding the object which caused the wound, a
text file containing pertinent case data and notes, a text file containing
links for further searching, and other media as appropriate (<i>e.g.,</i> video
clips, scene images, site diagrams, etc.).  For each wound in the wound image,
a link will be established to adjoining images in the feature vector space.<p>
To explore this space, then, an investigator will choose a starting place, and
then move from image to image along change vectors.  For instance, assume an
investigator has a case and has found an image which is similar to, but still
significantly different from the unknown mark in class characteristics.  Let's
assume that the unknown mark has the same general shape but is in some sense
"longer," and that "length" is one of the feature metrics upon which the graph
is built.  The investigator can then move to the next "longer" wound mark which
is similar in the other search dimensions.  At that image, he or she can then
continue along that dimension, or choose another dimension.  A similar
hypertext linking for the text portion of the nodes is also appropriate.<p>
<p>
* <i>searching<p>
</i><p>
To find a starting point for this search, it will be necessary to find a first
approximate match.  For this, once the appropriate feature vector space is
determined, an appropriate norm, perhaps weighted will be chosen, and a
distance measure from the unknown image will be calculated.  This distance
function will then be used to find an appropriate starting point.<p>
<p>
* <i>consultation</i><p>
<p>
While an investigator should be able quickly and intuitively to traverse the
graph which organizes the registry without assistance, the power of this
proposal is greatly enhanced by the ability to allow interactive consultation
on the images during exploration.  As stated in the introduction, patterned
injury analysis is a profoundly experience-determined skill - a picture may be
worth a thousand words, but it is only ten minutes of good experience.  The
RPID will allow an investigator in a local office to collaborate with
specialists at the AFIP to receive timely aid in the evaluation of his or her
images and in the exploration of the existing library.  By allowing real-time
interaction in image consultation, this will increase productivity of forensic
pathologists many fold.<p>
By allowing interactive consultation to involve application software as well as
image display, the RPID will also enable pathologists to demonstrate their
image processing skills during a consultation.  This can be extremely important
for explaining the forensic use of image processing methods to criminal
investigators and lawyers, who often are not technologically sophisticated.<p>
<b><p>
2.4. Digital library structure</b><p>
The computer and communications infrastructure will be built by the UNC-CH
Collaboratory Project in close consultation with the American Registry of
Pathologists (ARP) and AFIP.  The system will integrate several basic
technologies: distributed data storage, hypermedia, information search and
retrieval, generalized computer conferencing, intelligent agents, and tools for
representing and browsing large databases.<p>
The core of the library will be distributed hypermedia repository, based on
ABC/DGS, a distributed storage subsystem and applications-support layer
developed at UNC [12, 13, 25].  Users will interact with the repository and
with each other through a network, such as the Internet.  The hypermedia
repository will support forms of information, including individual images,
blocks of text, statistical data, and audio/video sequences, as individual
content objects.  Individual objects will be organized as a large graph
structure that can be both browsed and searched.<p>
To support browsing, objects will be linked with one another along a variety of
dimensions.  For example, all of the data associated with a given case could be
linked into a tree that included as one branch the images, as another branch
the various reports associated with the case, and, as a third branch, video
clips of the crime scene.  However, images in one case could also be linked to
similar or related images in other cases according to specific features, such
as length of wound, depth, or shape.  Thus, users will be able to browse within
the material from a given case but also across the primary structure of the
library to data associated with other cases.  The system will record a trace of
users' browsing paths to facilitate subsequent analysis, as a learning aid, and
as a means of capturing one form of expert knowledge.<p>
To support search, data will be characterized in terms of specific features and
parameters on those features when they are entered in the repository.  Features
will be used to generate links in the hypermedia graph structure.  They will
also be stored in a conventional information retrieval system.  Consequently,
users will be able to submit queries to the retrieval system, obtain a set of
objects, which they may then view or otherwise access through the hypermedia
system.  Once a user has arrived at a given object, he or she will be able to
branch out using hypermedia browsing facilities to other objects linked to it.
Thus, the system will combine capabilities of both hypermedia browsing and
information retrieval.<p>
The system will be based on an open architecture so that arbitrary applications
can be included in the  environment.  This will enable users to invoke
specialized tools - such as image enhancement programs, other software they are
accustomed to using, and new tools as they are developed - on data included in
the repository.  Once such applications have been run, their output can also be
stored in the repository for future use.  Thus, the system will provide a
flexible, easy to use environment for exploratory data analysis.  A
conferencing feature will permit two or more users, possibly located at remote
sites, to share the same view of the data, or of an application being used with
the data, through a computer network.  It will also permit users to move
smoothly from individual to collaborative work, and back.<p>
While a major use of the system will be through browsing or searches controlled
directly the expert user, the system will also include intelligent agents that
can traverse the hypermedia store and apply algorithms to content objects.
Thus, for example, users will be able to provide an agent with an image of a
wound and with an image comparison algorithm and have the agent (operating
asynchronously and concurrently with  the user) explore the graph structure of
the repository searching for similar images.  When if finds candidate images,
it could construct a link between the original and the candidate for subsequent
consideration by the user.<p>
The system must also include provision for audio and, ideally, video
conversations/discussions.  Initially, this will be limited to channels
supplied by the telephone system.  As digital technologies mature, we hope to
include video and to support both forms through the computer network.  This
will enable the system to offer communication through the workstation as
opposed to requiring the user to go outside the computing environment.<p>
However, since consultation is so important and since it is so hard to get
people together at the same time, particularly if their skills are in high
demand, the system will include facilities to enable a user to organize bodies
of material, including his or her recorded statements, that can later be viewed
and responded to by another, consulting user.  Thus, collaborators will be able
to carry on extended, asynchronous "conversations" with regard to specific data
without both having to be available at the same time.  A similar facility for
collecting, recording, and playback will also be useful to forensic
pathologists when presenting evidence in the courtroom.<p>
<b><p>
3. Research Issues</b><p>
We have already mentioned that user studies are an integral component of the
RPID project, but we will not further discuss this aspect here.  The other
major dimension of our research is in technical support for the collaborating
users.  We have identified hypotheses in three broad technical areas: user
functions, distributed hypermedia data storage and access, and support for
collaboration between heterogeneous hardware and  software platforms.  All
three areas present fundamental problems that must be solved if the information
highway and the digital library concept are to be broadly useful for technical
and scientific work, but we only have space in this report to adequately
outline our investigations in the user functions area.<p>
The problem of different data types and formats being non-uniformly mixable in
current hypermedia systems is being addressed by developing a set of orthogonal
linking operations.  These allow full interconnection of heterogeneous data in
a uniform manner.  We are concentrating especially on video, seeking to make it
a first class hypermedia data form capable of being manipulated both by
sequence <i>and content</i> (as is text in current hypermedia systems).  We are
also comparing various hypermedia models, classifying them according to their
performance and suitability for the different data forms and uses in a large
digital library.<p>
<b><p>
3.1. Image analysis services within hypermedia</b><p>
In support of our system-required techniques for linking uniformity, and for
general support of forensic investigation, the RPID provides integrated image
analysis services.  Of all the various data types that will appear in the RPID,
photographic images (and by extension, frame-based video) require the most
expertise to address by content.  In addition, image analysis is used by
forensic pathologists to enhance difficult-to-see information from images, or
to enhance the appearance of images to highlight information.  The RPID must
offer an efficient and flexible image processing component.  Previous image
analysis research has shown that:<p>
<p>
* that domain-specific libraries or packages fare better in terms of user
satisfaction and power than general image analysis packages.<p>
* that when elaborate user interfaces are included, their care and feeding
tends to overwhelm the image analysis content of the package, and<p>
*  that the field of image analysis is fragmented such that every laboratory
has its own private image library structure, none of which are ideal for
PIDL.<p>
<p>
We are selecting specific image analysis tools for inclusion in the RPID
infrastructure.  Users may bring other tools they select into the environment
as well, using the results of our interoperability research.<p>
<b><p>
3.2. Orthogonal data types: linking uniformity</b><p>
We are creating algorithms that will allow users of video data to identify and
hypertextually annotate moving objects in a video sequence.  These methods will
work in real-time, and will apply both to stored video (<i>e.g..,</i> clips
from a crime scene[1]) and to real-time video (<i>e.g.,</i>
a teleconference). We will require algorithms to find object edges in an image
and outline them; algorithms for tracking these outlines from image to image in
the frame sequence; and integration of this information with a hypertext
interface for collecting and organizing information to be linked to the video
objects.<p>
We are basing our image manipulation algorithms on work done at the University
of Florida on face tracking in an image sequence [3, 4].  This project
demonstrated that tracking a face moving in frame-based video sequence could be
done at a rate of 16 frames per second using an Intel 386 processor.  Initial
acquisition of a face, though, in the first frame took about 0.5 minute.  We
expect that considerable improvement can be obtained with workstations and new
algorithms.  Our goal is to be able to acquire and track objects in video in
real-time; this would allow hyperlinks to be anchored into video sequences as
they are generated (teleconferencing) in addition to after they are stored
(archival clips).<p>
The results of this work will allow the video (and audio) sequences to be
treated uniformly as a component of a hyperlinked information network,
alongside other forms of information such as text and still graphics.
Frame-based video technology is not specific to this environment and the work
will immediately apply to hypermedia systems in general.  In current hypermedia
systems, video clips are added to the basic information structure mainly as
"view-only" components.  A reader may play the video, reverse it, freeze it and
so on, as with a VCR.  However, no interesting interactions are possible with
the <i>content </i>of the video information, as is possible for the content of
text and static graphics.  This research will allow video information to be
more fully integrated into hypermedia systems, increasing the orthogonality of
the set of operations in such systems by allowing video information to be
interacted with and manipulated in the same ways that static information is
now.  The result for hypermedia will be systems that are less "modal" and more
seamless.<i><p>
<p>
Specific research issues:</i>  The questions we will seek to answer in our
experiments are these:<p>
<p>
* Can a good orthogonal set of linking operations be designed and implemented
for the varied data forms in RPID?<p>
* What data abstractions are needed to support uniform treatment of linking
among heterogeneous forms?<p>
* What linking operations can be devised for audio sequences?<p>
* Can object location and tracking algorithms be developed for content-based
video linking?<p>
* Can the video content-linking algorithms be made efficient enough to operate
in real-time (<i>e.g., </i>for linking into teleconferencing sessions)?<p>
<b><p>
3.3. Authoring and browsing in different data models</b><p>
Several major hypermedia data models have been proposed in the past decade, but
no comprehensive study has been done to determine the relative effectiveness of
each, or to compare them against one another for a classification of utility in
various domains.  We propose to perform such a comparative study using the
digital forensic medicine library.  Before describing the goals and structure
of our experiments, we outline the major models we will implement and work
with:<p>
<p>
* <i>Dexter (structured hierarchical graphs) [9}.</i>  Dexter was an effort,
after a decade of applied hypermedia research, to formalize the features of
successful system developments.  The model owes much to earlier work on the HAM
[2].   The basic Dexter model contains a structured graph, and an execution
rule for presenting graph elements and traversing its links.<p>
<p>
* <i>ABC (hierarchical directed graphs) [12].</i>  ABC uses hierarchical
directed graphs as the basic structure for hypermedia information, much like
Dexter, but has some differences in link types and access methods.  It is also
the native model atop the current DGS system, and will be the primary
implementation vehicle for the initial RPID.<p>
<p>
* <i>WWW (unstructured flat graphs) [10].</i>  The basic data model for the
World Wide Web (WWW) hypermedia facility is a graph with no central definition,
and with nodes distributed very widely geographically.  The graph is flat, in
that no node itself contains a graph.  Each node is a "document component"
containing names of links to other content elements (Uniform resource Locators,
or URLs). WWW graphs tend to be exceptionally dynamic, with a non-trivial
possibility of finding no nodes at the ends of some links.<p>
<p>
* <i>Hypersets (sets) [16].</i>  The basic model underlying the Hypersets
hypermedia system is defined in terms of mathematical set theory.  At any point
in browsing, a reader is presented with a set of nodes, all related by sharing
some characteristic (<i>e.g., </i> all concerning injuries from dog bites).
The reader may select any member of the set for viewing, or may "browse" by
requesting a list of all other sets of which a particular element is a member.
Thus, the notion of "link" found in other models is served in Hypersets by set
intersection.  Set-based hyperdocuments have been shown to be especially useful
for  taxonomic organization.<p>
<p>
* <i>Trellis (hierarchical parallel automata) [19].</i>  Trellis is
distinguished from other models by its use of Petri nets, a class of parallel
automation, to define both the static and the dynamic structures of a
hyperdocument.  Information elements are mapped to the places in a Petri net,
and links are mapped to the transitions in the automation.  Browsing proceeds
as allowed by valid execution sequences of the net.  Browsing may create and
synchronize parallel paths of activity.  The Trellis model has been shown to be
effective for defining collaboration protocols within hyperdocuments [20].<p>
<p>
These models have been extensively explored and reported on individually, but
very little comparative research has been done among these models.<p>
The ABC hypermedia system is built on top of the DGS distributed graph server
[25].  The DGS is a general distributed storage system with directed graphs as
its main storage abstraction (comparable to files in traditional storage
systems).  The interpretation of graphs required by the ABC system is supplied
collectively in two places:  in the DGS engine itself, and in each application
interface.  We can construct all  these hypermedia models on top the DGS with
an appropriate semantics for interpretation of the stored graphs.<p>
Each hypermedia data model is either an interpreted graph (<i>e.g.,</i>  a
Petri net, a semantic net) or can be easily and efficiently represented as a
graph (<i>e.g.,</i> sets).  To implement this variety of models in the DGS we
will first create a <i>semantics layer</i> for the DGS.  This addition will
allow rules to be specified telling how the components of a graph will be used
at execution time.  For example, the Trellis model can be represented by
specifying that a graph has two node types (place, transitions), that the graph
is bipartite, and that "execution" takes place by moving markers among the
place nodes in a certain way.  For initial experimentation with the semantics
layer concept we expect to specify these rules in a Prolog-like notation: after
determining the best way to encode structural and behavioral characteristics,
we will move into a compiled format such as an object library.<p>
<i><p>
Specific research issues:</i>  The questions we will seek to answer in our
experiments are these:<p>
<p>
* Is one model more appropriate than the others for organizing very large data
sets? relatedly, is there one model that is more capable than the others of
being scaled up?<p>
* Is there one model that is more appropriate for organizing heterogeneous data
sets?<p>
* Is one model more appropriate than the others for organizing information
systems that are meant to support collaborative work?<p>
* Can a hierarchy of functionality, or expressive power, be placed on these
models?  Which models can  readily be described in terms of the others?  Is
there one model that can be said to he the most general?<b><p>
<p>
3.4. Search and retrieval within hypermedia</b><p>
Hypermedia provides an excellent browsing capability once a starting point has
been found.  Bur to find one or more data objects from which to initiate
hypermedia searching requires a different form or retrieval.  Earlier discussed
a strategy for retrieving data objects based on the value of various attributes
found in those data objects (for example, wound length, or the particular
caliber of the weapon used).  The specific attributes to be used and the
specific values for these attributes are hand coded by trained specialists from
the various forms of information found in the data object.  To hypermedia
browsing and retrieval based on attributes we add a third strategy:  access to
data objects using classifications and information retrieval techniques applied
to the text portion of the data object.<p>
In addition to photos, drawings, videos, etc., each data object will have a
text component.  This, for example, could be the report written by the police,
coroner, or attending physician, or could be the pathologists report written
after analysis.  We propose to use text-based information retrieval techniques
on these reports as a means of entry into the hypermedia database.<p>
This is a particularly attractive application for text-based information
retrieval.  The textual material tends to be brief and to the point with very
little that is irrelevant to the case.  The context of the written material is
limited and, especially with material written by physicians, the vocabulary is
precise and restricted.<p>
We propose to experiment with several strategies for automatic classification
and retrieval with this portion of the database.  Initially, we propose to
classify textual material using a standard medical thesaurus with additional
terms unique to forensic pathology added where necessary [15].  This will
create for each text object a vector of keywords representing the content of
that piece of text.  Experimental results will determine whether any of the
various keyword weighting schemes are useful or whether simple boolean
weighting performs satisfactory [26, 24].<p>
Users can access the textual database using three basic strategies.  First, the
user can enter a free form query which is then subjected to the same
classification process as was the data.  The query and data descriptor vectors
are then compared [26]; those data objects scoring highest are presented to the
user ordered by their score.  Second, the user can be presented with the
thesaurus and instructed to formulate his or her query from this controlled
vocabulary.  The user will then be presented with preliminary search results
(<i>e.g.,</i> "your query matched 50 data objects") and will be able to accept,
broaden, or narrow the query using the thesaurus hierarchy.  And third, we can
combine these techniques with relevance feedback [17,5, 8] allowing the user to
formulate a free form query and then, after an initial search, have the system
revise the query based on explicit or implicit user interaction and on the data
objects that the user has judged relevant.<p>
Additionally, we propose to subject the text portion of the database to
automatic classification and thesaurus construction strategies [24, 6, 14, 11]
to determine whether purely automatic techniques provide satisfactory results.
We have an advantage over traditional information retrieval applications in
that this portion of the system does not need to provide high recall and only
moderately high precision [26, 24] in order for it to be judged successful.  In
our application, retrieval from the text portion serves mainly to provide an
entry point into the hypermedia which is then browsed extensively.  A search is
successful if it provides one or more data objects from which a complete
(<i>i.e.,</i> high recall) search of the hypermedia can be made.<p>
<p>
In the initial version of the RPID, the user will be presented with data
objects that match the query, but the user will not be given more detailed
information as to exactly what in the data object text matched the query.  As
an enhancement to the basic system, we propose to embed our existing full text
retrieval system [18, 22, 21, 23] into the hypermedia system.  This will direct
the user to those portions of the text where matches occur.  This will make it
much easier for the user to scan the text and judge relevance, and will also be
the basis for text-based hyperlinking from one data object text to
another.<b><p>
<p>
<p>
References</b><p>
[1]	A. M. Burger, B. D. Meyer, C. P. Jung, and K. B. 	Long.  The virtual
notebook system.  In  	<i>Proceedings of ACM Hypertext '91,</i> pages 395-	401.
ACM, December 1991.<p>
<p>
[2]	Brad Campbell and Joseph M. Goodman.  HAM:  	A general purpose hypertext
abstract machine. 	<i>Communications of the ACM,,</i> 31(7):856-861, 	July
1988.<p>
<p>
[3]	K. Deng and J. N. Wilson.  An approximation-	based video tracking system.
In <i>Proc. of SPIE: 	image Algebra and Morphological Image 	Processing II,</i>
volume 1568, Jule 1991.<p>
<p>
[4]	K. Deng and J. N. Wilson.  Contour estimation 	using global shape
constraints and local forces. In 	<i>Proc.  of SPIE: Geometric Methods in
Computer 	Vision, </i>volume 1570, July 1991.<p>
<p>
[5]	S. R. Friedman, J. A. Maceyak, and S. F. Weiss.  	A relevance feedback
system based on document 	transformations.  In F. Salton, editor, <i>The SMART
Retrieval System, </i>Prentice-Hall, 1971.<p>
<p>
[6]	E. Fox, T. Nutter, T. Ahlswede, M. Evens, and J. 	Markowitz.  Building a
large thesaurus for 	information retrieval.  In <i>Proc. of the Second
Conference on Applied Natural Language 	Processing.</i>  Association for
Computational 	Linguistics, February 1988.<p>
<p>
[7]	Mark E. Frisse.  Searching for information in a 	hypertext medical
handbook.  <i>Communications of  	the ACM, </i>31(7):880-886, July 1988.<p>
<p>
[8]	D. Harman.  Relevance feedback revisited.  In 	<i>Proc. of the fifteenth
annual international ACM 	SIGIR Conference on Research and Development 	in
Information Retrieval.</i>  June 1992.<p>
<p>
[9]	Frank Halasz and Mayer Schwartz.  The Dexter 	hypertext reference model.
In Judi Moline, Dan 	Benigni, and Jean Baronas, editors, <i>Proceedings 	of the
Hypertext Standardization Workshop,</i> 	pages 95-133.  National Institute of
Standards and 	Technology, February 1990.  NIST Special 	Publication 500-178.
Workshop held January <p>
	16-18, 1990.<p>
<p>
[10]	K. Hughes.  Entering the world-wide web.  	Technical report.  Honolulu
Community College,  	September 1993.<p>
<p>
]11]	K. Sparck Jones and D. M. Jackson.  The use of 	automatically obtained
keyword classification for 	information retrieval.  <i>Information Storage and
Retrieval, </i>5, 1970.<p>
<p>
[12]	K. Jeffay, J. K. Lin, J. Menges, F.D. Smith, and J. 	B. Smith.
Architecture of the artifact-based 	collaboration system matrix.  In
<i>Proceedings of 	CSCW '92 (Toronto),</i> pages 195-202. ACM Press, 	1992.<p>
<p>
[13]	K. Jeffay, J. Menges, J. K. Lin, F. D. Smith , and 	J. B. Smith.
Architecture of the artifact-based 	collaboration system matrix.  In <i>CSCW
'92:  	Proc. of the Conf. on Computer-Supported 	Cooperative Work. </i>ACM
Press, November 1992.<p>
<p>
[14]	K. Sparck Jones.  Automatic indexing. <i>Journal of 	Documentation,</i>
30, 1974.<p>
<p>
[15]	Mesh (medical subject headings 1988). 29(1, part 	2), January 1988.<p>
<p>
[16]	H. Van Dyke Parunak.  Don't link me in: Set-	based hypermedia for
taxonomic reasoning.  In 	<i>Proceedings of Hypertext 91, </i>pages 233-242.
ACM, December 1991.<p>
<p>
[17]	J. J. Rocchio.  Relevance feedback in information 	retrieval.  In G.
Salton, editor, <i>The SMART 	Retrieval System.</i> Prentice Hall, 1971.<p>
<p>
[18]	G. Salton.  Another look at automatic text-	retrieval systems.
<i>Communications of the ACM, 	</i>29(7), July 1986.<p>
<p>
[19]	P. David Stotts and Richard Furuta, Petri-net-	based hypertext:  Document
structure with 	browsing semantics.  <i>ACM Transactions on 	Information
Systems,</i> 7(1):3-29, January 1989.<p>
<p>
[20]	P. D. Stotts and R. Furuta.  Modeling and 	prototyping collaborative
software processes.  In 	<i>Proceedings of the NATO Advanced Research 	Workshop
on Integration of Information and 	Collaboration Models,</i> June 1993.  To
appear.  	Also published as Technical Report TR93-020, 	Computer Science
Collaboratory, Univ. of North 	Carolina at Chapel Hill, 1993; and as Tech
Report TAMU-HRL 93-006, Hypermedia 	Research Laboratory, Texas A&amp;M
University, 	July 1993.<p>
<p>
[21]	J. B. Smith, G. J. Ferguson, J. D. Bolter, M. 	Lansman, D. V. Beard, and
S. F. Weiss.  We: A 	writing environment for professionals.  In <i>Proc. 	of
the 1987 National Computer Conference,</i> 	1987.<p>
<p>
[22]	J. B. Smith, G. F. Ferguson, and S. F. Weiss.  A 	hypertext writing
environment and its cognitive 	basis.  In <i>Proc. of the Hypertext '87
Workshop, </i> 	November 1987.<p>
<p>
[23]	J. B. Smith, G. J. Ferguson, and S. F. Weiss.  	Microarras:  An advanced
full-text retrieval and 	analysis system.  In <i>1987 Intl. Conf. on Research
and Development in Information Retrieval, </i>1987.<p>
<p>
[24]	G. Salton and M. J. McGill.  <i>Introduction to 	Modern Information
Retrieval.  </i>McGraw Hill, 	1983.<p>
<p>
[25]	D. E. Shackelford, J. B. Smith, and F. D. Smith.  	The architecture and
implementation of a 	distributed hypermedia storage system.  In 	<i>Proceedings
of ACM Hypertext '93, </i>pages 1-13.  	ACM, November 1993.<p>
<p>
[26]	C. J. van Rijsbergen.  Butterworths, 1979.<p>
<p>
<p>
<p>
<p>
<hr>
<a name="fn0">[1]</a>Consider the video we all saw recently on television, of
mobs draggin the bodies of U.S. soldiers through the streets of Mogadishu,
Somalia.

<!--#include virtual="/DL94/footer.ihtml" -->
Last Modified: <!--#echo var="LAST_MODIFIED" --> <br>

</body>
</html>
