Gesellschaft für Informatik e.V.

Lecture Notes in Informatics

Datenbanksysteme für Business, Technologie und Web (BTW) P-214, 225-240 (2013).

Gesellschaft für Informatik, Bonn

Copyright © Gesellschaft für Informatik, Bonn


Experiences from developing the domain-specific entity search engine geneview

Philippe Thomas , Johannes Starlinger and Ulf Leser


GeneView is a semantic search engine for the Life Sciences. Unlike traditional search engines, GeneView analyzes texts upon import to recognize and properly handle biomedical entities, relationships between those entities, and the structure of documents. This allows for a number of advanced features required to work effectively with scientific texts, such as entity disambiguation, ranking of documents by entity content, linking to structured knowledge about entities, userfriendly highlighting of entities etc. As of now, GeneView indexes approximately ~21,4M abstracts and ~358K full texts with more than 200M entities of 11 different types and more than 100K relationships. In this paper, we describe the architecture underlying the system with a focus on the complex pipeline of advanced NLP and information extraction tools necessary for achieving the above functionality. We also discuss open challenges in developing and maintaining a semantic search engine over a large (though not web-scale) corpus.

Full Text: PDF

Gesellschaft für Informatik, Bonn
ISBN 978-3-88579-608-4

Last changed 04.10.2013 18:38:51