Charles Kacmar(1), Dean Jue(2), David Stage(3), Christie Koontz(2) (1) Department of Computer Science, (2)Florida Resources and Environmental Analysis Center Florida State University (3)Growth Management Data Network Coordinating Council State of Florida Authors Addresses: 1203 Love, Tallahassee, FL, 32306-4019, 904.644.9661, kacmar@cs.fsu.edu 2130 Herb Morgan, Tallahassee, FL, 32306-4015, 904.644.3410, {djue,ckoontz}@opus.freac.fsu.edu 3725 S. Calhoun, 112 Bloxham Bldg., Tallahassee, FL 32399-0950, 904.488.7986, davidstage@aol.com
KEYWORDS: Spatial, geographic, GIS, metadata
Support for business operations and management structures prompted the application of digital library technology to meet the needs of the Florida Growth Management Data Network Coordinating Council (GMDNCC). The Council is a consortium of the eleven major agencies in the state of Florida. It is an organization that is based on a well-defined management structure and relies heavily on interorganizational cooperation and information sharing to support state operations. The digital library system supporting this governmental body is a network- based system that attempts to:
Human spatial information management concerns the human's ability to organize and access information by "placing" information in a logical space [13, 23]. Geographic spatial information management concerns organizing and accessing information by geographic characteristics, such as world quadrangle, latitude, longitude, or jurisdictional boundary. The need to support both aspects of spatiality requires that the system support collection, access, and retrieval using geographic elements, and that the system support textual and graphical views over the metadata for people in individual agencies and for the overall organization.
The remaining sections of this paper discuss the implementation of a network-based digital library system that supports the Florida Growth Management Data Network Coordinating Council. The motivation for this approach, elaborated in Section 2, concerns the focus on spatial data and a spatial access paradigm, discussed in Section 3. Section 4 discusses the mechanisms for collecting data and documents and facilities to manage the document collection. Section 5 discusses the operation and access facilities to these documents. Section 6 reports on the current status of the research, discusses future directions and differences between this approach and others. Section 7 provides a summary.
Another issue concerns the manner by which existing libraries and document repositories are maintained. Existing systems usually organize documents by convenience of the facility serving the information and not necessarily along content or conceptual means. Spatial data users require the creation and coordination of geographic maps with metadata and documents. To do this using the system structures, protocols, and network access methods listed above requires a level of human and computing resources that most governmental agencies cannot afford. Resources, in the form of labor and technical expertise, are needed to create and maintain access structures and manage document repositories. Maintaining access structures in any form - menus, hypermedia links, or queries - can be difficult and incur significant expense.
To address these problems, we extend the operational framework (i.e., buildings, floors, shelving, cataloging systems) of today's public libraries to create a model for organizing and partitioning electronic library collections on a global scale. We define and adopt the concept of information zone to support the management and facility structures within governmental bodies. This conceptual framework divides the digital library into geospatial-information zones, where each zone contains a collection of documents that are related. Access and organizational structures to zones are supported by software and provide for two types of partitioning - (1) mutually exclusive zones, where each zone contains a distinct set of documents, and (2) overlapping zones, where the same document may appear several times to provide for multiple paths of access. To the end-user, this partitioning is completely transparent - users are generally unaware and do not care where the information is stored. From a management perspective, this framework allows state information managers to parallel the real management structures within the government. For example, the system can support a traditional vertical organization within a specific agency, as well as support interrelationships among data, a necessary and important element of the eleven member agencies of the GMDNCC.
Spatial data content and management is driven by three primary elements: attributes, time, and user tasks. Attributes define the contents and characteristics of spatial data as well as some of the limits of how the data can be used. For example, dimensional attributes can provide the height of a forest canopy, area of a city, width of a road, or coordinate reference (e.g., latitude and longitude). Some attributes supply value while others, such as a coordinate reference, serve to define the positional location of a graphical feature relative to other features. This relationship provides for the dynamic composition and analysis of spatial data from multiple sources with similar degrees of accuracy. For example, a digital soils map of a county could be superimposed on a digital hydrography map of a much larger region if both data files were gathered at similar levels of accuracy and their coordinate reference points indicated an overlap in the maps' area of coverage. The new map composition could then be used to analyze, for instance, soil types relative to the proximity of major rivers.
Temporal elements determine the scope of use for spatial data and provide a chronology for recording variations over time. These elements allow a researcher to use temporal data as a basis for analyzing variations. For example, a meteorologist may package a collection of map displays into an animation to better understand the path of a hurricane.
Spatial data are produced in two primary formats, raster and vector [3]. Raster is a grid-type format most often used to interpret color and gray-scale photographs of remotely-sensed scenes (e.g. satellite imagery). The imagery is stored as dots or pixels, each associated with a different shade, density, or other attribute value. Examples of data sets collected and stored in raster format include those collected by NASA's earth- observing systems (EOS), the National Oceanic and Atmospheric Administration's Advanced Very High Global Resolution Radiometer (AVHRR) instrument data, and the Landsat multispectral data from Earth Observation Satellite Company (EOSAT).
The vector format represents spatial data by points, lines, or polygons. Small objects (e.g., water hydrant locations) are represented by points while linear features such as roads, rivers, and contour lines are defined by lines with x and y coordinate end points. Polygons, which are composed of individual lines, are used to represent areas (e.g., countries, voting districts). The Census Bureau's TIGER Line files, containing most street segments in the U.S., are an example of spatial data in vector format.
Digital spatial data sets are viewed using geographic information systems (GIS). The GIS can combine, overlay, add, subtract, multiply, and divide spatial features and their attributes. This allows researchers to see patterns in spatial data as well as make dynamic interpretations of those data. Complex questions from the researcher can be presented to the GIS.
Traditional boolean word operations are not optimal for determining whether a spatial document is relevant to a particular task. Providing a list of geographic named features at all levels of resolution included in a particular spatial document is a major problem because of the disagreement over names, the constant changing of names (e.g., Leningrad back to St. Petersberg), and the multiplicity of names for the same place. Work has begun in developing extensions to WAIS to support boolean searches for spatial data [20] but the best approach for integrating spatial searching/querying with existing access tools is still unclear at this time. Research is also ongoing regarding the use of object-oriented structures for storage and retrieval of spatial digital data [7].
Spatial data files are complex objects and it is often difficult to construct locator and description records for them because of the issues mentioned above as well as the ease with which spatial data may be misused. For example, users often will zoom-in beyond the resolution of the spatial document. The display will show no objects at a location in that document, when in reality, objects are present if another more detailed view of the document had been used (the raw data was collected at a level of resolution different than what is currently displayed).
Thus, current methods of supporting access to spatial documents, using locator records and text-based organization and searching, are simply not adequate for spatial data [10]. Problems concerning data type and number of access points, and the need for spatial operators to provide effective retrieval, make spatial digital information somewhat distinct from other forms of digital data. As a result, the optimal access paradigm for spatial data is spatial access -navigation through a geographic space [1]. To support this, geographic spaces must manifest as digital maps, populated with selectable features that support navigation. More importantly, spatial digital libraries must provide for the collection, organization, and access of spatial data through spatially-oriented tools.
The system supports the management structure of the GMDNCC by allowing information managers in an agency to define metadata and documents produced by the agency. Access structures unique to that agency are supported automatically. Moreover, access paths to metadata and documents without regard for agency are also provided automatically. This enables the user to "cut across" the vertical structure of government to find, view, and retrieve information produced by various agencies.
The process of contributing information to the library begins when an agency information manager creates a metadata (catalog) record and identifies the publically available documents associated with the metadata. Metadata includes attribute/value pairs that provide for the capture and analysis of spatial objects and features. Metadata does not provide a sufficient level of abstraction for human access. For this reason, a higher level of representation is needed -a representation that allows users to document and express conceptual properties of data. This requires that the metadata contain high-level conceptual representations such as descriptions and abstracts. This information is captured using the metadata collection facility and used by librarians-to facilitate the construction of access structures that are more appropriate to the task-based needs of information managers and end-users.
The design of the metadata collection facility was guided by several important factors. First, different spatial data sets need different metadata representations and for this reason, the tools needed to present and collect metadata in different formats. Second, metadata needs to be checked to insure accuracy. This is an extremely difficult task, especially since much of the metadata information is conceptual in nature or "codified." In some cases, spatial metadata cannot be validated automatically (e.g., the world quadrangle from which an environmental sample was collected). Third, metadata may need to be stored in a geographically distant location. In essence, the metadata collection facility must serve as a "front-end" to post- processing or distribution of the metadata record.
Figure 1: Metadata collection tool.
Referring to Figure 1, the metadata collection tool divides the screen into three regions. Menu/action buttons appear along the lower portion of the window, field names appear along the left side, and field data appears and is entered in the main portion of the window. The design of the screen is form fill-in [22], but also support radio menus and selection lists. Clicking on a field name provides the user access to the codebook for the field. Thus, the order and content of fields and the immediate availability of the codebook enhance the metadata collection and maintenance processes.
Figure 2: Modes of operation.
The metadata collection tool provides three important characteristics. First, it supports on-line access to the codebook. Second, the "menu/action buttons" that appear in the lower area of the screen are configurable and each is associated with specific processing facilities. This allows someone to decide what actions are available to users for each type of metadata record. Third, the fields that comprise the metadata records are based on a semi-structured format [17] and are modelled after the Internet Engineering Task Force (IETF) specification [6]. The metadata collector and metadata format provide for complete definition of metadata records. This allows the metadata content and format to be tailored specifically to the type of data and documents to be stored in the library. Figure 3 illustrates how this is accomplished.
Referring to Figure 3, the metadata template serves as a "blueprint" for describing the elements of spatial documents the user wants to register with the library. The first 4 lines define the "action buttons". Each button is associated with a shell script or program that is executed when that button is pressed. The remaining entries in the template define the fields that will comprise a metadata record of this type. Three types of fields exist in the template shown in the figure - radio button (R), numeric (N) and character (C). The numbers following the field type indicate, respectively, number of entries in a menu or number of lines of data in the field, and for data entry fields, width of each line, and number of instances of this field in the record (repeating field).
#Add add_script.csh #Delete delete_script.csh Agency R 3 Agriculture Commerce Transportation Contact_Name C 1 50 1 Contact_Phone C 1 15 1 ShortDescription C 1 80 1 LongDescription C 5 80 1 Intended_Scale C 1 10 1 Percent_Complete C 1 5 1 Update_Schedule C 5 80 1 Update_Cost N 1 5 1 Source C 3 80 3 Contact_Name C 1 50 1 Keywords C 8 25 1 Figure 3: Metadata record template.
The entry screen into the library is shown in Figure 4. Five access paths are provided (1) field-name index, (2) inverted index, (3) agency-specific view, (4) spatial/map, and (5) term query.
Figure 4: Entry screen.
Navigating into the field-name and inverted index paths presents the user with screens as shown below, respectively. The user traverses the graph structure by navigating the desired item.
Figure 5: Screen shots of the field-name (top) and inverted index (bottom) paths.
Navigating into the spatial/map access path allows the user to locate and access metadata records and documents by clicking on a geographic area (grid).
Figure 6: Spatial/map access path.
Figure 7: Thematic view.
Figure 8: Aerial photography view.
The agency view access path supports access through a hierarchically structured directed graph. Shown in Figure 7 is the highest level access document into the thematic records and documents. Figure 8 shows the highest level document into the aerial photography records and documents.
Previous approaches (see Frank [10] for a metaanalysis) have resolved some of these problems through the creation of support tools that supplement geographic information systems, and in some cases, interpret and display data in spatial data sets. For example, the Louisiana Coastal Geographic System Network (LCGISN) [18, 11] and Northwest Land Information System Network (NWLISN) use text-based high-level menus to provide the user with advanced cataloging, search, and access services over a large centralized database of spatial metadata. When the user finds a relevant data set, it is viewed by launching the GIS from the access facility. MITMapper [8] provides users with both text and graphical access facilities to a centralized database of metadata. Metadata is partitioned into classes and is browsed through a series of text menus or by clicking on a map graphic. Once the user locates the spatial data set of interest, the GIS system is launched from the metadata browser. The importance of these efforts concerns the variety of access paths provided to users to locate relevant data.
In an effort to encourage and coordinate spatial data efforts throughout the U.S., the USGS began sponsoring additional research efforts that may lead to a national spatial data locator and access service [24]. Users will locate spatial data sets using metadata access facilities. This approach will provide both text (via WAIS [14] ) and graphical-based (via Mosaic [19, 4, 25] ) network access to spatial information.
The approach presented in this paper is consistent with the USGS effort and provides for an integration of components involving metadata collection, distribution, and network access within a spatially-oriented, navigational-based, graphical, locator and access library system. This approach provides for agency specific access paths to metadata and documents, while at the same time, enforcing state-wide standards to increase the consistency and integrity of metadata.
Over the past year, the digital library system described in this paper has supported a collection of the major spatial metadata records for the state of Florida. This collection, called the Florida Data Directory, is a collection of metainformation that is partitioned into three major data types: survey/thematic, aerial photography, and satellite imagery. To further demonstrate the ability of the system to support document and data set collection and access, the Florida Game and Fresh Water Fish commission supplied an environmental habitat report and spatial data sets for all sixty-seven Florida counties. The data sets are linked to the environmental report providing end-users access to all elements used in the production of the report.
Current work is focused on enhancing the library system in several ways. First, information managers require tools to catalog and distribute documents. These tools must capture all of the attributes associated with a documents while at the same time enforcing metadata standards. Work on developing an enhanced metadata collector is in progress.
Second, the system is attempting to reduce the costs of government, eliminating redundancy and reducing the time required for information managers and other state employees to locate and retrieve information produced by other agencies. Data is an expensive resource that should be shared among all agencies. Locator and notification services are two mechanisms being used to enhance awareness of relevant data for state employees. Work is continuing to enhance automated awareness components.
Third, the system should operate within the state's infrastructure. Since most state agencies do not have the resources (human or hardware) to do their own document production and storage, the state library system, especially librarians, should play a critical role in every aspect of information management - cataloging, collection, distribution, reference, and access. Ideally, librarians will work with agencies to identify documents for cataloging and placement into repositories, verify and validate metainformation, work with organizations to establish relevant access paths into the data, and assist end-users in accessing information using terminals in public libraries.
Fourth, the system should provide better support for both the vertical and horizontal management structures that exist within state government. In the existing system; metadata and its associated geographic elements are defined using text; a single library structure is supported; agency-specific views over metadata are provided but they require some manual intervention. The new version, expected in early Summer 1995, will broaden the focus of automation to support (1) metadata creation using graphical geographic elements; (2) support multiple digital libraries within a logical management structure; and (3) automatically create and manage the entire repository, which includes the depositing of metadata, documents, and the data.
A spatial digital library offers a solution to these problems by providing the information manager with tools that support cataloging, locator, and access tasks. This paper reports on the development and deployment of a digital library system that automates the creation and management of a geospatial digital library, improving end user access to information through geographically-based (spatial) textual and graphical directed graphs. This approach is based on a spatial access metaphor that organizes digital libraries into information zones. The tools supporting this library provide for dynamic and structured partitioning of metadata and documents within a graph-based network of collection and distribution centers. The result is a digital library system that can model the management structure of an organization, but also allows users to cross organizational boundaries during information creation, navigation, search, and retrieval. Ideally, librarians with their cataloging expertise and infrastructure, will facilitate all aspects of digital library operations, from identifying relevant documents to include in the collection, to assisting agencies in creating custom access paths for public users. Current work is focused on enhancing the existing prototype, with significant effort devoted to complete automation of the repository.
This work is supported in part by the Federal Geographic Data Committee of the U.S. Geological Survey under Cooperative Agreement No. 1434-94-A-1288.