Gesellschaft fr Informatik e.V.

Lecture Notes in Informatics


INFORMATIK 2009 - Im Focus das Leben P-154, 625-638 (2008).

Gesellschaft für Informatik, Bonn
2008


Editors

Stefan Fischer, Erik Maehle, Rüdiger Reischuk (eds.)


Copyright © Gesellschaft für Informatik, Bonn

Contents

The EDIT cyberplatform for taxonomy and the taxonomic workflow: selected components

Pepé Ciardelli , Patricia Kelbert , Andreas Kohlbecker , Niels Hoffmann , Anton Güntsch and Walter G. Berendsohn

Abstract


The EDIT Cyberplatform for Taxonomy is an EU-funded set of loosely coupled tools for the editing, management and presentation of taxonomic data in biology. This paper looks at the fundamental workflow issues the Cyberplatform is intended to address, then examines three of its main components from this workflow perspective. Using these components as an example, we will demonstrate concrete ways the Cyberplatform can improve and accelerate this workflow. The paper starts by describing the Cyberplatform and its goals of loose coupling and interoperability, then looks at how these are built into the Common Data Model (CDM) Java library which forms the foundation for most Platform components. The first Platform component we examine in depth is the EDIT Desktop Taxonomic Editor, which presents a modern solution to the challenges of capturing the taxonomic workflow in software, by using techniques such as dragand-drop, on-the-fly parsing, and unobtrusive feedback. The EDIT Specimen Explorer, the second component examined, helps find taxonomically relevant specimen and observation data by searching the GBIF (Global Biodiversity Information Facility) index using checklist-based thesauri to deliver more complete and targeted results, thereby improving and accelerating the workflow for exploring the taxonomic data available in the community as a whole. Finally, we look at a pilot project for print publishing software, which aims to remove the final bottleneck in the taxonomic workflow, the back-and-forth between taxonomist and publishing house. The European Distributed Institute of Taxonomy (EDIT) is an EU-funded project designed to help integrate the traditionally disparate field of scientific taxonomy as practiced in Europe. The EDIT Cyberplatform for Taxonomy brings the taxonomic workflow to the Internet, providing an open architecture to connect and integrate existing applications and developing new tools where bottlenecks and areas for improvement have been identified. Areas for improvement which will be addressed in this paper include: increasing the community's ability to exchange data by means of a common data model and improved tools for import, export and querying; improving upon the last generation of data input and editing tools with more of an emphasis on user-friendly and intuitive frontend software; and putting as much of the preparation of pre-publication drafts as possible into the hands of the taxonomist with the aim of accelerating and improving the quality of publication. The workflow the Cyberplatform seeks to optimize is in essence the “revisionary” process by which an existing classification of a group of organisms is revised, and by which previously unclassified organisms are assigned to a “taxon”, i.e. a class of organisms. As commonly modelled when developing taxonomic software, a taxon has a scientific name with a taxonomic rank (e.g. kingdom, species, etc.) and information is assigned to it such as physical and media specimens; descriptive data including geographical distribution and visual observations; and citations from existing literature. The workflow begins either in the field with the collection and transport of specimens, or with a review of existing literature and specimens. Both of these are then followed by further analysis of specimens, the resolution of the name according to community nomenclatural codes, and peer review and publication of revisionary treatment including new taxa (if any). An example of a workflow bottleneck addressed by the Cyberplatform is the frequent difficulty in examining relevant specimen material, either because of delays inherent in obtaining physical specimens from cooperating institutions, or because of naming discrepancies in the way specimens are labelled; the component described in Chapter 4 tackles the second half of this problem. 2 A platform for Cybertaxonomy The EDIT Platform for Cybertaxonomy, henceforth called “the Platform”, covers the breadth of the taxonomic workflow, from fieldwork to publication. It provides taxonomists but also life science in general with a set of loosely coupled tools for: full, customized access to taxonomic data; editing and management of data; and collaborative work in teams. The Platform provides various tools to facilitate fieldwork, analyze data, assemble treatments, and publish efficiently. Reliability and reusability of data is a key requirement for each of these tools and thus for the Platform as a whole. Development of the Platform is coordinated by the Dept. of Biodiversity Informatics at the Botanic Garden and Botanical Museum Berlin-Dahlem, and its various components are implemented by a team of 15 software developers and architects from multiple institutions all over Europe. 2.1 General architecture - loosely coupled components Several existing applications support various facets of the taxonomic workflow, but until now, there has been little interoperability between these applications. A main goal of the Platform is to provide an open architecture to allow connection and integration of existing applications and to provide new developments where necessary. Thus, the Platform is not a monolithic application, but rather consists of independent, interoperable components. Platform components are generally either web services or applications, both desktop and web-based. Several components are designed to be shared by collaborating users within a community, as delimited for instance by taxonomic group or geographic focus. Each community shares a community web store based on the CDM (Common Data Model - see 2.3 below) to store and publish their data and to communicate with the public and other members. The EDIT data portal is the Platform's solution for publishing CDM data via the web, and can be easily customized by the non-technical user. Other community components are modern communication tools like blogs, forums, mailing lists and other collaborative tools based on the content management system Drupal [Dr09]. Other components include central services such as EDIT Map Services - a map generation service - and personal components including special hardware such as a water resistant GPS/GIS handhelds with integrated camera for efficient data acquisition in the field. 2.2 Integrating existing applications and data standards Establishing interoperability between various existing applications and data standards is a major challenge. Even applications from the same domain of expertise often do not share the same data formats. Rather than attempting to implement data exchange functionality into all Platform-related applications, we decided to create instead central transformation services and data store, whereby applications exchange data either via import-export functionality in all important data formats or via web services. Fig. 1: The interactive EDIT Platform Cybergate gives an overview on the platform and on the allimportant question of how dataflow between components can be achieved. (Screenshot from


Full Text: PDF

Gesellschaft für Informatik, Bonn
ISBN 978-3-88579-241-3


Last changed 24.01.2012 22:06:15