ICADL 2010 Proceedings available online now.
Papers
Session 1b - Digital Libraries of Heritage Materials
A Visual Dictionary for an Extinct Language (Short Paper)
Kyle Williams, Sanvir Manilal, Lebogang Molwantoa, Hussein Suleman
Abstract. Cultural heritage artefacts are often digitised in order to allow
for them to be easily accessed by researchers and scholars. In the
case of the Bleek and Lloyd dictionary of the xam Bushman language,
14000 pages were digitised. These pages could not be transcribed, however,
because the language and script are both extinct. A custom digital
library system was therefore created to manage and provide access to
this collection as a purely .visual dictionary.. Results from user testing
showed that users found the system to be interesting, simple, efficient
and informative.
A Scalable Method for Preserving Oral Literature from Small Languages (Full Paper)
Steven Bird
Abstract. Can the speakers of small languages, which may be remote,
unwritten, and endangered, be trained to create an archival record of
their oral literature, with only limited external support? This paper describes
the model of "Basic Oral Language Documentation", as adapted
for use in remote village locations, far from digital archives but close to
endangered languages and cultures. Speakers of a small Papuan language
were trained and observed during a six week period. Linguistic performances
were collected using digital voice recorders. Careful speech versions
of selected items, together with spontaneous oral translations into
a language of wider communication, were also recorded and curated. A
smaller selection was transcribed. This paper describes the method, and
shows how it is able to address linguistic, technological and sociological
obstacles, and how it can be used to collect a sizeable corpus. We conclude
that Basic Oral Language Documentation is a promising technique
for expediting the task of preserving endangered linguistic heritage.
Digital Folklore Contents on Education of Childhood Folklore and
Corporate Identification System Design (Full Paper)
Ya-Chin Liao, Kuo-An Wang, Po-Chou Chan, Yi-Ting Lin, Jung-I
Chin, Yung-Fu Chen
Abstract. Digital artifacts preserved in digital repositories of museums are
mostly static images. However, the artifacts may be lost, degraded, or damaged
no matter how well the preservation and exhibition environments have been
controlled, which makes the artifacts difficult to recover. Furthermore, if not
properly inherited, information regarding making, function, and usage of an
artifact might be lost after several generations. Hence, in addition to digitizing
folklore artifacts, we have also digitized the crafts in how to make them and
skills and rituals in how to use them to be recorded in videos. With abundant
digitized collections, the repository website is becoming more and more
popular for teachers and students, especially in kindergartens and elementary
schools, to extract and create useful teaching materials for folklore education.
Recently, folklore contents have been encouraged to be applied in the education
of English as second language (ESL), social work, and mathematics. In this
study, we applied the digital folklore contents for developing story books to be
used in childhood folklore education and for instructing students to design
corporate identification system (CIS) as a class exercise. Technology
acceptance model (TAM) was used to evaluate perceived usefulness (PU),
perceived ease of use (PEU), and behavior intention (BI) in using these digital
contents to accomplish their tasks. The results show that the scores of PU, PEU,
and BI are all greater than 3 (5-point Likert scale) indicating usefulness and
ease of use of the contents and website, as well as a positive attitude toward
continuous use of the contents in various educational areas.
Ancient-to-modern Information Retrieval for Digital Collections of
Traditional Mongolian Script (Short Paper)
Biligsaikhan Batjargal, Garmaabazar Khaltarkhuu, Fuminori Kimura,
Akira Maeda
Abstract. This paper discusses our recent improvements to the traditional
Mongolian script digital library (TMSDL), which can be used to access ancient
historical documents written in traditional Mongolian using a query in modern
Mongolian. The results of the experiment show that the percentage of
successfully retrieved queries was improved.
Session 2b - Annotation and Collaboration
A Collaborative Scholarly Annotation System for Dynamic Web
Documents - a Literary Case Study (Full Paper)
Anna Gerber, Andrew Hyland, Jane Hunter
Abstract. This paper describes ongoing work within the Aus-e-Lit project at the
University of Queensland to provide collaborative annotation tools for
Australian Literary Scholars. It describes our implementation of an annotation
framework to facilitate collaboration and sharing of annotations within research
sub-communities. Using the annotation system, scholars can collaboratively
select web resources and attach different types of annotations (comments, notes,
queries, tags and metadata), which can be harvested to enrich the AustLit
collection. We describe how rich semantic descriptions can be added to the
constantly changing AustLit collection through a set of interoperable annotation
tools based on the Open Annotations Collaboration (OAC) model. RDFa
enables scholars to semantically annotate dynamic web pages and contribute
typed metadata about the IFLA FRBR entities represented within the AustLit
collection. We also describe how the OAC model can be used in combination
with OAI-ORE to produce scholarly digital editions, and compare this approach
with existing scholarly annotation approaches.
The Relation between Comments inserted onto Digital Textbooks by
Students and Grades earned in the Course (Full Paper)
Akihiro Motoki, Tomoko Harada, Takashi Nagatsuka
Abstract. When students read textbooks in the classroom, they usually apply active reading. The practice of marking in university textbooks is a familiar one. They scribble comments on the margin, highlight elements, underline words and phrases, and correlate distinct parts to foster critical thinking. While the use of annotations during active reading supports the students themselves, these can also be useful for other readers. Investigations were carried out to evaluate the comments inserted by students onto their digital textbooks and how this relates to their eventual grade earned at the end of course. The results of our study highlight two main factors influencing students; eventual grade, quantity and quality of annotation. Students who wrote a lot of comments and focused upon the more important keywords in the text trend to receive a higher grade. Accordingly, our analysis was based on number and quality of text word selection.
Visualizing and Exploring Evolving Information Networks in Wikipedia (Full Paper)
Ee-Peng Lim, Agus Trisnajaya Kwee, Nelman Lubis Ibrahim, Aixin
Sun, Anwitaman Datta, Kuiyu Chang, Maureen
Abstract. Information networks in Wikipedia evolve as users collaboratively edit articles that embed the networks. These information networks
represent both the structure and content of community's knowledge and
the networks evolve as the knowledge gets updated. By observing the
networks evolve and finding their evolving patterns, one can gain higher
order knowledge about the networks and conduct longitudinal network
analysis to detect events and summarize trends. In this paper, we present
SSNetViz+, a visual analytic tool to support visualization and exploration of Wikipedia's information networks. SSNetViz+ supports timebased network browsing, content browsing and search. Using a terrorism
information network as an example, we show that different timestamped
versions of the network can be interactively explored. As information networks in Wikipedia are created and maintained by collaborative editing
efforts, the edit activity data are also shown to help detecting interesting
events that may have happened to the network. SSNetViz+ also supports temporal queries that allow other relevant nodes to be added so as
to expand the network being analyzed.
Session 3b - Mobility and Migration
Do Games Motivate Mobile Content Sharing? (Full Paper)
Dion Hoe-Lian Goh, Chei Sian Lee, Alton Yeow-Kuan Chua
Abstract. Indagator (Latin for explorer) is a game which incorporates multiplayer, pervasive gaming elements into mobile content sharing. Indagator allows users to annotate real world locations with multimedia content, and concurrently, provide opportunities for play through creating and engaging interactive game elements, earning currency, and socializing. A user study of Indagator was conducted to examine the impact of the usability of Indagator's content sharing and gaming features, as well as demographic profiles on participants' motivation to use the application. Participants felt that the features in Indagator were able to support the objectives of content sharing and gaming, and that the idea of gaming could be a motivator for content sharing. In terms of motivation to use, usability of Indagator's gaming features, gender and participants. familiarity with mobile gaming emerged as significant predictors. Implications and future research directions are discussed.
A Multifaceted Approach to Exploring Mobile Annotations (Full Paper)
Guanghao Low, Dion Hoe-Lian Goh, Chei Sian Lee
Abstract. Mobile phones with capabilities such as media capture and location detection have become popular among consumers, and this has made possible the development of location-based mobile annotation sharing applications. The present research investigates the creation of mobile annotations from three perspectives: the recipients of the annotations, the type of content created, and the goals behind creating these annotations. Participants maintained a two week-long diary, documenting their annotation activities. Results suggest that range of motivational factors, including those for relationship maintenance and entertainment. Participants were also more inclined to create leisure-related annotations, while the types of recipients were varied. Implications of our work are also discussed.
Model Migration Approach for Database Preservation (Full Paper)
Arif Ur Rahman, Gabriel David, Cristina Ribeiro
Abstract. Strategies developed for database preservation in the past
include technology preservation, migration, emulation and the use of a
universal virtual computer. In this paper we present a new concept of
"Model Migration for Database Preservation". Our proposed approach
involves two major activities. First, migrating the database model from
conventional relational model to dimensional model and second, calculating
the information embedded in code and preserving it instead of
preserving the code required to calculate it. This will affect the originality
of the database but improve two other characteristics: the information
considered relevant is kept in a simple and easier to understand format
and the systematic process to preserve the dimensional model is independent
of the DBMS details and application logic.
Session 3c - Natural Language Processing
Automated Processing of Digitized Historical Newspapers beyond the
Article Level: Finding Sections and Regular Features (Full Paper)
Robert B. Allen, Catherine Hall
Abstract. Millions of pages of historical newspapers have been digitized but in most cases access to these are supported by only basic search services. We are exploring interactive services for these collections which would be useful for supporting access, including automatic categorization of articles. Such categorization is difficult because of the uneven quality of the OCR text, but there are many clues which can be useful for improving the accuracy of the categorization. Here, we describe observations of several historical newspapers to determine the characteristics of sections. We then explore how to automatically identify those sections and how to detect serialized feature articles which are repeated across days and weeks. The goal is not the introduction of new algorithms but the development of practical and robust techniques. For both analyses we find substantial success for some categories and articles, but others prove very difficult.
Keyphrases Extraction from Scientific Documents: Improving Machine
Learning Approaches with Natural Language Processing (Full Paper)
Mikalai Krapivin, Aliaksandr Autayeu, Maurizio Marchese, Enrico Blanzieri, Nicola Segata
Abstract. In this paper we use Natural Language Processing techniques
to improve dirent machine learning approaches (Support Vector Machines
(SVM), Local SVM, Random Forests) to the problem of automatic
keyphrases extraction from scientific papers. For the evaluation we propose
a large and high-quality dataset: 2000 ACM papers from the Computer
Science domain. We evaluate by comparison with expert-assigned
keyphrases. Evaluation shows promising results that outperform state-of-the-art
Bayesian learning system KEA improving the average F-Measure
from 22% (KEA) to 30% (Random Forest) on the same dataset without
the use of controlled vocabularies. Finally, we report a detailed analysis
of the ect of the individual NLP features and data set size on the
overall quality of extracted keyphrases.
Measuring Peculiarity of Text using Relation between Words on the Web (Short Paper)
Takeru Nakabayashi, Takayuki Yumoto, Manabu Nii, Yutaka Takahashi, Kazutoshi Sumiya
Abstract. We define the peculiarity of text as a metric of information
credibility. Higher peculiarity means lower credibility. We extract the
theme word and the characteristic words from text and check whether
there is a subject-description relation between them. The peculiarity
is defined using the ratio of the subject-description relation between a
theme word and characteristic words. We evaluate the extent to which
peculiarity can be used to judge by classifying text from Wikipedia and
Uncyclopedia in terms of the peculiarity.
Imitating Human Literature Review Writing: An Approach to Multi-Document Summarization (Short Paper)
Kokil Jaidka, Christopher Khoo, Jin-Cheon Na
Abstract. This paper gives an overview of a project to generate literature reviews from a set of research papers, based on techniques drawn from human summarization behavior. For this study, we identify the key features of natural literature reviews through a macro-level and clause-level discourse analysis; we also identify human information selection strategies by mapping referenced information to source documents. Our preliminary results of discourse analysis have helped us characterize literature review writing styles based on their document structure and rhetorical structure. These findings will be exploited to design templates for automatic content generation.
Session 4b - Metadata
A Study of Users' Requirements in the Development of Palm Leaf Manuscripts Metadata Schema (Full Paper)
Nisachol Chamnongsri, Lampang Manmart, Vilas Wuwongse
Abstract.
This paper presents the users' behavior, their needs and expectations with respect to palm leaf
manuscripts (PLMs) which are ancient Thai documents.. We focus on access tools, access
points and how users select PLMs. The data were collected by in-depth interviews of 20 users
including researchers, local scholars and graduate students who are working on research in the
field and using PLMs for information and knowledge resources. The research results present
two important characteristics of user behaviors: previous knowledge of items, and exploratory
searches. Users adopt a 4-step pattern in searching for the PLMs. Finally, we discuss the
important information in searching for the PLMs and we compare this with the frequently
consulted bibliographic elements and Dublin Core elements.
Landscaping Taiwan's Cultural Heritages- The Implementation of TELDAP Collection-Level Description (Full Paper)
Hsueh-Hua Chen, Chiung-Min Tsai, Ya-Chen Ho
Abstract. This paper depicts the implementation process of the collection-level description of TELDAP. Our study looks into collection-level description in order to eliminate problems users might encounter when accessing and retrieving resources caused by having only large amounts of item-level metadata. The implementation process is divided into five stages. In order to facilitate the application of collection-level description, we have put forth revised schema for the usage of currently available description standards. In the future, we intend to fortify relationships between item-level and collection-level metadata, and provide versions in different languages, expanding the accessibility of valuable resources to more users.
GLAM Metadata Interoperability (Short Paper)
Shirley Lim, Chern Li Liew
Abstract. Both digitised and born-digital images are a valuable part of cultural
heritage collections in galleries, libraries, archives and museums (GLAM).
Efforts have been put into aggregating these distributed resources. High quality
and consistent metadata practice across these institutions are necessary to
ensure interoperability and the optimum retrieval of digital images. This paper
reports on a study that involves interviews with staff members from ten
institutions from the GLAM sector in New Zealand, who are responsible for
creating metadata for digital images. The objective is to understand how GLAM
institutions have gone about creating metadata for their image collections to
facilitate access and interoperability (if any) and the rationale for their practice,
as well as the factors affecting the current practice.
Metadata Creation: Application for Thai Lanna Historical and Traditional Archives (Short Paper)
Churee Techawut
Abstract. This paper describes the process of metadata creation of the
Thai Lanna historical and traditional archives (shortened to the Lanna
Archives) by applying the Singapore Framework for Dublin Core Application
Profiles. Its metadata model for scholarly works based on the
Functional Requirements for Bibliographic Records (FRBR) is adapted
to create a data model and metadata scheme for the Lanna Archives. The
proposed metadata scheme provides the level of detail which describing
digital Lanna Archives require and also supports information consistency
and information sharing.
Session 4c - Usability and Navigation
A User-Centric Evaluation of the Europeana Digital Library (Full Paper)
Milena Dobreva, Sudatta Chowdhury
Abstract. Usability of digital libraries is an essential factor for the user
attraction. Europeana, a digital library which is built around the idea to provide
a single access point to the European cultural heritage, is paying special
attention to the user needs and behaviour. This paper presents user-related
outcomes addressing the dynamics of user perception from a study which
involved focus groups and media labs in four European countries. While
Europeana was positively perceived by all groups in the beginning of the study,
some groups were more critical after performing a task which involved eight
types of searches. The study gathered opinions on the difficulties encountered
which help to understand better users' expectations within the content and
functionality domains of digital libraries which would be of possible interest to
all stakeholders in digital library projects.
Digital Map Application for Historical Photos (Full Paper)
Weiqin Chen, Thomas Nottveit
Abstract. Although many map applications are available for presenting, browsing and sharing photos over the Internet, historical photos are not given enough attention. In addition, limited research efforts have been made on the usability and functionalities of such map applications for photo galleries. This paper aims to address these issues by studying the role of digital maps in presenting, browsing and searching historical photos. We have developed a map application and conducted formative evaluation with users focusing on usability and user involvement. The evaluation has shown positive responses from users. The search and navigation functions in the map application were found especially useful. The map was found to be important in involving users to share local knowledge about historical photos.
Supporting Early Document Navigation with Semantic Zooming (Full Paper)
Tom Owen, George Buchanan, Parisa Eslambolchilar, Fernando Loizides
Abstract. Traditional digital document navigation found in Acrobat
and HTML document readers performs poorly when compared to paper
documents for this task. We investigate and compare two methods for
improving navigation when a reader first views a digital document. One
technique modifies the traditional scrolling method, combining it with
Speed-Dependent Automatic Zooming (SDAZ). We also examine the effect
of adding "semantic" rendering, where the document display is altered
depending on scroll speed. We demonstrate that the combination of
these methods reduces user effort without impacting on user behaviour.
This confirms both the utility of our navigation, and the minimal use
information seekers use of much of the content of digital documents.
Session 5b - Knowledge Structures
PODD: An Ontology-driven Data Repository for Collaborative
Phenomics Research (Full Paper)
Yuan-Fang Li, Gavin Kennedy, Faith Davies, Jane Hunter
Abstract. Phenomics, the systematic study of phenotypes, is an emerging
field of research in biology. It complements genomics, the study of
genotypes, and is becoming an increasingly critical tool to understand
phenomena such as plant morphology and human diseases. Phenomics
studies make use of both high- and low-throughput imaging and measurement
devices to capture data, which are subsequently used for analysis.
As a result, high volumes of data are generated on a regular basis, making
storage, management, annotation and distribution a challenging task.
Sufficient contextual information, the metadata, must also be maintained
to facilitate the dissemination of these data. The challenge is further complicated
by the need to support emerging technologies and processes in
phenomics research. This paper describes our effort in designing and developing
an ontology-driven, open, extensible data repository to support
collaborative phenomics research in Australia.
A Configurable RDF Editor for Australian Curriculum (Full Paper)
Diny Golder, Les Kneebone, Jon Phipps, Steve Sunter, Stuart A. Sutton
Abstract. Representing Australian Curriculum for education in a form
amenable to the Semantic Web and conforming to the Achievement Standards
Network (ASN) schema required a new RDF instance data editor for describing
bounded graphs—what the Dublin Core Metadata Initiative calls a 'description
set'. Developed using a 'describe and relate' metaphor, the editor reported here
eliminates all need for authors of graphs to understand RDF or other Semantic
Web formalisms. The Description Set Editor (ASN DSE) is configurable by
means of a Description Set Profile (DSP) constraining properties and property
values and a set of User Interface Profiles (UIP) that relate the constraints of the
DSP to characteristics of the user interface. When fully deployed, the editor
architecture will include a Sesame store for RDF persistence and a metadata
server for deployment of all RESTful web services. Documents necessary for
configuration of the editor including DSP, UIP, XSLT, HTML, CSS, and
JavaScript files are stored as web resources.
Thesaurus Extension using Web Search Engines (Full Paper)
Robert Meusel, Mathias Niepert, Kai Eckert, Heiner Stuckenschmidt
Abstract. Maintaining and extending large thesauri is an important
challenge facing digital libraries and IT businesses alike. In this paper
we describe a method building on and extending existing methods from
the areas of thesaurus maintenance, natural language processing, and
machine learning to (a) extract a set of novel candidate concepts from
text corpora and (b) to generate a small ranked list of suggestions for
the position of these concept in an existing thesaurus. Based on a modification
of the standard tf-idf term weighting we extract relevant concept
candidates from a document corpus. We then apply a pattern-based machine
learning approach on content extracted from web search engine
snippets to determine the type of relation between the candidate terms
and existing thesaurus concepts. The approach is evaluated with a large-scale
experiment using the MeSH and WordNet thesauri as testbed.
Session 6b - Images and Retrieval
Preservation of Cultural Heritage: From Print Book to Digital Library - A Greenstone Experience (Short Paper)
Henny M. Sutedjo, Gladys Sau-Mei Theng, Yin-Leng Theng
Abstract. We argue that current development in digital libraries presents an opportunity to explore the use of DL as a tool for building and facilitating access to digital cultural resources. Using Greenstone, an open source DL, we describe a 10-step approach in converting an out-of-print book, 'Costumes through Times', and constructing a DL creation of costumes.
Improving Social Tag-Based Image Retrieval with CBIR Technique (Short Paper)
Choochart Haruechaiyasak, Chaianun Damrongrat
Abstract. With the popularity of social image-sharing websites, the
amount of images uploaded and shared among the users has increased
explosively. To allow keyword search, the system constructs an index
from image tags assigned by the users. The tag-based image retrieval
approach, although very scalable, has some serious drawbacks due to the
problems of tag spamming and subjectivity in tagging. In this paper, we
propose an approach for improving the tag-based image retrieval by exploiting
some techniques in content-based image retrieval (CBIR). Given
an image collection, we construct an index based on 130-scale Munsell-based
colors. Users are allowed to perform query by keywords with color
and/or tone selection. The color index is also used for improving ranking
of search results via the user relevance feedback.
Identifying Persons in News Article Images Based on Textual Analysis (Full Paper)
Choochart Haruechaiyasak, Chaianun Damrongrat
Abstract. A large portion of news articles contains images of persons
whose names appear in the news stories. To provide image search of
persons, most search engines construct an index from textual descriptions
(such as headline and caption) of images. The index search approach,
although very simple and scalable, has one serious drawback. A query
of a person name could match some news articles which do not contain
images of the target person. Therefore, some irrelevant images could be
returned as search results. Our main goal is to improve the performance
of the index search approach based on the syntactic analysis of person
name entities in the news articles. Given sentences containing person
names, we construct a set of syntactic rules for identifying persons in
news images. The set of syntactic rules is used to filter out images of
non-target persons from the results returned by the index search. From
the experimental results, our approach improved the performance over
the basic index search by 10% based on the F1-measure.
Kairos: Proactive Harvesting of Research Paper Metadata from Scientific Conference Web Sites (Full Paper)
Markus H¨anse, Min-Yen Kan, Achim P. Karduck
Abstract. We investigate the automatic harvesting of research paper
metadata from recent scholarly events. Our system, Kairos, combines a
focused crawler and an information extraction engine, to convert a list of
conference websites into a index filled with fields of metadata that correspond
to individual papers. Using event date metadata extracted from
the conference website, Kairos proactively harvests metadata about the
individual papers soon after they are made public. We use a Maximum
Entropy classifier to classify uniform resource locators (URLs) as scientific
conference websites and use Conditional Random Fields (CRF)
to extract individual paper metadata from such websites. Experiments
show an acceptable measure of classification accuracy of over 95% for
each of the two components.
Session 9b
Oranges Are Not the Only Fruit: An Institutional Case Study Demonstrating Why Data Digital Libraries Are Not the Whole Answer to E-research (Full Paper)
Dana McKay
Abstract. Data sharing and e-research have long been touted as the future of research, and a general public good. A number of studies have suggested data digital libraries in some form or another as an answer to a perceived data deluge, and the focus in Australia is very much on digital libraries. Moreover, the Australian National Data Service positions the institution as the core unit for setting data policy and doing initial data management. In this paper we present the results of an institution-wide survey that shows that data digital libraries cannot be the only answer to the question of research data, at least at an institutional level, and that the current focus on digital libraries may actively alienate some researchers.
Open Access Publishing: an Initial Discussion of Income Sources, Scholarly Journals and Publishers (Short Paper)
Panayiota Polydoratou, Margit Palzenberger, Ralf Schimmer, Salvatore Mele
Abstract. The Study for Open Access Publishing (SOAP) project is one of the
initiatives undertaken to explore the risks and opportunities of the transition to open
access publishing. Some of the early analyses of open access journals listed in the
Directory of Open Access Journals (DOAJ) show that more than half of the open
access publishing initiatives were undertaken by smaller publishers, learned societies
and few publishing houses that own a large number of journal titles. Regarding
income sources as means for sustaining a journal's functions, "article processing
charges", "membership fee" and "advertisement" are the predominant options for the
publishing houses; "subscription to the print version of the journal", "sponsorship"
and somewhat less the "article processing charges" have the highest incidences for all
other publishers.