Home Overview Publications People Contact Us Downloads Tools

BioMediator: Overview

 

The core BioMediator system consists of the components needed for a general purpose distributed federated data integration system. BioMediator has a modular architecture, designed to perform data integration over multiple structured and semi-structured biologic data sources in a flexible and reconfigurable way.


Fig. 1.

The current system consists of four tiers including a user query interface, query processor, semantic translation engine, and interfaces to each of the data sources. The biologist begins by querying the system for information about a particular entity (e.g. gene, protein, phenotype) using one of two methods, either using a graph-based GUI application or using a servlet-based HTML form (Fig. 1G1 and 1G2 respectively). The seeding query is passed to the query processor (Fig. 1B) which provides an API for launching and managing queries posed against the mediated schema. The metawrapper (Fig. 1C) translates these mediated schema queries into source specific queries using forward mapping rules. The wrappers (Fig. 1E) pass the remapped queries through to the data sources (Fig. 1F). Data sources return results in native format, which are translated to XML syntax with native semantics by the wrappers. The metawrapper applies reverse mapping rules in translating the XML result streams from native semantics to mediated schema semantics. Both the forward and reverse mapping rules employed by the metawrapper are generated by the plug-in (Fig. 1D). The query processor then retrieves that XML data from the metawrapper, organizes it and generates events which can be used to synthesize a navigable, graph-based representation of the result set. Once a result graph has been constructed, it may be repeatedly queried, expanded and grown using the query processor's API.


Fig. 2.

The system relies heavily on the source knowledge base (SKB), a central repository for the biologist’s mediated schema represented in Protégé (Fig. 1A). The mediated schema represents a given biologist’s view of the biological concepts (entities) and relationships of interest between them independent of what actual data sources contain those entities and relationships. The mediated schema contains an entity (or class) hierarchy, which includes the attributes that are valid for each entity. These attributes connect instances (or resources) to data values (i.e., attributes are datatype properties). The mediated schema also enumerates the valid relationships (or object properties) that can be used to interrelate instances. A simplified example mediated schema typical to biology can be seen in Fig. 2. The SKB also includes a catalog that contains information about the underlying sources. For each data source, we capture the following pieces of information: a) the entities it contains, b) the relationships it contains, and c) the primary entity around which the database is organized (when a crossreference is ambiguous, we assume it references the primary entity).


Fig. 3.

The browser engine (Raven's) design naturally allows for the layering of filtering plug-ins on top of it. By doing so, we’ve been able to apply logic from expert systems as well as store the pre-filtered output from the engine for subsequent re-filtering using different criteria. The event-generating nature of BioMediator’s third generation query processor will allow for the seamless addition of new filters (such as inference rule engines), as well as new data visualization techniques and interfaces to guide query expansion in the future.

Home Overview Publications People Contact Us Downloads Tools

BioMediator™ is being developed at the University of Washington [UW] by collaborators in:
the Division of Biomedical and Health Informatics [
DBHI], the Department of Pediatrics [Peds],
the Department of Biological Structure [
BIOSTR], and the Department of Computer Science and Engineering [CSE].
Copyright © 2006-2007, University of Washington. All rights reserved.