The well-known quote: ‘The nice thing about standards is that there are so many to choose from’ could well apply to bio-medical databases and other information resources in the area.
Anyone used to accessing such resources on a daily basis will be familiar with their large number, their highly disparate structures and the enormous diversity in the kinds of information they contain. Apparently simply queries such as ‘Give me all publications that mention gene X’, ‘Are there any images showing the location and expression pattern of the product of gene Y?, or ‘Why does a particular set of genes in cells taken from a patient display similar changes in expression?’ often result in hours of systematic work involving finding the relevant databases, interrogating them and then patching together some kind of integrated picture of the results of repeated queries, each tailored to particular demands of the interface to the database involved.
Of course in the best of all possible worlds, none of this would be necessary. Biologists would have agreed on a single set of standards for database interoperability, federated their data into a single well-srtuctured resource and provided an intuitive semantic web-based user interface. This has not yet happened. Until it does, there remains a need for search and retrieval tools that are capable of querying distributed resources, of making semantic interconnections between them and of retrieving information in a form that can be manipulated and analysed.
E-BioSci and ORIEL are two EU-funded projects that focus on the provision of services and tools that will improve the interconnectivity of the scientific literature with molecular datasets and images for the research community. E-BioSci is a web platform, whose current prototype (http://www.e-biosci.org) allows seamless searching and interconnection of literature text with molecular databases. ORIEL (http://www.oriel.org), besides providing some of the underlying technology for the E-BioSci platform, has supported the development of the image ontology-based BioImage database (http://www.bioimage.org) and has produced a number of state of the art standalone tools that help researchers explore the scientific literature and to extract and integrate the information it contains. One recent development, made by Robert Hoffman and Alfonso Valencia (CNB / CSIC, Madrid) is the iHOP system (Information Hyperlinked over Proteins; http://www.pdg.cnb.uam.es/UniPub/iHOP/) that converts the 14 million abstracts contained in the PubMed (National Library of Medicine) bibliographic database into a network of interlinked references to genes, proteins, mutations, diseases and (bio)chemical compounds. |