DanQingShanShui (483K)

Schema and Ontology Mapping

SchemaMapping (40K) Schema mapping takes two schemas as input and produces as output a semantic correspondence between the schema elements in the two input schemas. Schema mapping is a key operation for many applications such as data integration, ontology matching, data warehousing, message translation in E-commerce, and semantic query processing. The research of schema mapping and ontology mapping share many common features because from the database point of view, an ontology is also a representation of a conceptual model. Because of the use of data-extraction ontologies in our schema mapping study, the method is suitable for ontology mapping scenario as well.

This is a finished project that I was involved. There are also some other research projects in which I have involved or am involving.
Project Description


This schema mapping study is intended to study a method to solve complex schema mappings, especially the generic n:m mapping cardinality problem, a long-lasting hard problem on schema mapping research. In this research, we have used the terms direct mapping to denote 1:1 mappings and indirect mapping to denote the remaining types of mappings, 1:n, n:1, and n:m.

In this study, we write schemas as conceptual graphs, and look for mappings between two graphs. In addition, we use two schema mapping resources to help establish mappings: data frames and domain ontology snippets. A data frame encapsulates the essential properties of everyday data items such as currencies, dates, weights, and measures. A data frame extends an abstract data type to include not only an internal data representation and applicable operations but also highly sophisticated representational and contextual information that allows a string that appears in a text document to be classified as belonging to the data frame. Molecular components of extraction ontologies, which we call domain ontology snippets, describe a small, conceptual aggregate of information such as names being aggregates of titles, first names, surnames, and optional suffixes such as Jr. or Sr. Domain ontology snippets are similar to ontology base components as described in [1] and they serve the same purpose: to declare plausible domain fact types.

We have employed several different methods to establish both direct and indirect mappings between two input schemas. Many of these methods have been surveyed in [2]. With one classification standard, we use both instance-level and schema-level matchers in our system. Moreover, we use both object-set and structure matchers to perform schema mapping in different level.

Object-set matchers find correspondences based on similarities between the object sets of two schemas. Widely adopted techniques include name and description matching at the schema level and value matching at the instance level [2]. In our work we have used three object-set matchers: a name matcher, a value-characteristic matcher, and a data-frame matcher.

Structure matchers use structural information both to suggest mappings and to confirm mappings suggested by object-set matchers. In our work, we not only consider using the structural similarity of the two schemas, but we also use reference structures, available to us as domain ontology snippets. The idea is to use an intermediaries structure to help build mappings. It may be difficult to map a structure A directly to a structure B, but easy to map both A and B to the structure C, which may help us decide that A corresponds to B.

We have built a hybrid schema mapping system that combines all the matchers. Our experiments shows that we can achieve quite good precisions and recalls on complex schema mapping tasks, which can contain as high as 50% indirect mappings inside. Barly any earlier automatic schema mapping study performs better than this performance. For more technical details and experimental results for our system, [3] is a good reference for interested readers.

References:

[1] P. Spyns, R. Meersman, and M. Jarrar. Data modeling versus ontology engineering. SIGMOD Record, 31(4):12–17, December 2002.

[2] E. Rahm and P. Bernstein. A survey of approaches to automatic schema matching. The VLDB Journal, 10(4):334–350, December 2001.

[3] David W. Embley, Li Xu, and Yihong Ding. Automatic Direct and Indirect Schema Mapping: Experiences and Lessons Learned. SIGMOD Record, 33(4):14-19, December 2004.

Collaborators



This is the project for Li's PhD disseration, and thus she had led this research project.

Li Xu (Computer Science, U. of Airzona, South)
David W. Embley (Computer Science, BYU)
Stephen W. Liddle (Bussiness School, BYU)
Deryle W. Lonsdale (Linguistic, BYU)
Dennis Ng (Computer Science, BYU)

Other Research Topics



For interested readers who would like to explore more on this topic with me, send me email at ding@cs.byu.edu. Also, there are some other research projects in which I have involved or am involving.
Last updated: Sep 16th, 2005