Domain-aware ontology matching
Quesada Real, Francisco José
During the last years, technological advances have created new ways of communication, which have motivated governments, companies and institutions to digitalise the data they have in order to make it accessible and transferable to other people. Despite the millions of digital resources that are currently available, their diversity and heterogeneous knowledge representation make complex the process of exchanging information automatically. Nowadays, the way of tackling this heterogeneity is by applying ontology matching techniques with the aim of finding correspondences between the elements represented in different resources. These approaches work well in some cases, but in scenarios when there are resources from many different areas of expertise (e.g. emergency response) or when the knowledge represented is very specialised (e.g. medical domain), their performance drops because matchers cannot find correspondences or find incorrect ones. In our research, we have focused on tackling these problems by allowing matchers to take advantage of domain-knowledge. Firstly, we present an innovative perspective for dealing with domain-knowledge by considering three different dimensions (specificity - degree of specialisation -, linguistic structure - the role of lexicon and grammar -, and type of knowledge resource - regarding generation methodologies). Secondly, domain-resources are classified according to the combination of these three dimensions. Finally, there are proposed several approaches that exploit each dimension of domain-knowledge for enhancing matchers’ performance. The proposals have been evaluated by matching two of the most used classifications of diseases (ICD-10 and DSM-5), and the results show that matchers considerably improve their performance in terms of f-measure. The research detailed in this thesis can be used as a starting point to delve into the area of domain-knowledge matching. For this reason, we have also included several research lines that can be followed in the future to enhance the proposed approaches.