Edinburgh Research Archive

Learning, deducing and linking entities

dc.contributor.advisor
Cao, Yang
dc.contributor.advisor
Mai, Luo
dc.contributor.author
Tugay, Resul
dc.contributor.sponsor
Republic of Türkiye, Ministry of National Education
en
dc.date.accessioned
2023-10-25T10:15:59Z
dc.date.available
2023-10-25T10:15:59Z
dc.date.issued
2023-10-25
dc.description.abstract
Improving the quality of data is a critical issue in data management and machine learning, and finding the most representative and concise way to achieve this is a key challenge. Learning how to represent entities accurately is essential for various tasks in data science, such as generating better recommendations and more accurate question answering. Thus, the amount and quality of information available on an entity can greatly impact the quality of results of downstream tasks. This thesis focuses on two specific areas to improve data quality: (i) learning and deducing entities for data currency (i.e., how up-to-date information is), and (ii) linking entities across different data sources. The first technical contribution is GATE (Get the lATEst), a framework that combines deep learning and rule-based methods to find up-to-date information of an entity. GATE learns and deduces temporal orders on attribute values in a set of tuples that pertain to the same entity. It is based on creator-critic framework and the creator trains a neural ranking model to learn temporal orders and rank attribute values based on correlations among the attributes. The critic then validates the temporal orders learned and deduces more ranked pairs by chasing the data with currency constraints; it also provides augmented training data as feedback for the creator to improve the ranking in the next round. The process proceeds until the temporal order obtained becomes stable. The second technical contribution is HER (Heterogeneous Entity Resolution), a framework that consists of a set of methods to link entities across relations and graphs. We propose a new notion, parametric simulation, to link entities across a relational database D and a graph G. Taking functions and thresholds for measuring vertex closeness, path associations and important properties as parameters, parametric simulation identifies tuplest in D and vertices v in G that refer to the same real-world entity, based on topological and semantic matching. We develop machine learning methods to learn the parameter functions and thresholds. Rather than solely concentrating on rule-based methods and machine learning algorithms separately to enhance data quality, we focused on combining both approaches to address the challenges of data currency and entity linking. We combined rule-based methods with state-of-the-art machine learning methods to represent entities, then used representation of these entities for further tasks. These enhanced models, combination of machine learning and logic rules helped us to represent entities in a better way (i) to find the most up-to-date attribute values and (ii) to link them across relations and graphs.
en
dc.identifier.uri
https://hdl.handle.net/1842/41099
dc.identifier.uri
http://dx.doi.org/10.7488/era/3838
dc.language.iso
en
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Wenfei Fan, Resul Tugay, Yaoshu Wang, Min Xie, Muhammad Asif Ali. Learning and Deducing Temporal Orders. International Conference on Very Large Data Bases (VLDB), 2023
en
dc.relation.hasversion
Wenfei Fan, Liang Geng, Ruochun Jin, Ping Lu, Resul Tugay, Wenyuan Yu. Linking Entities across Relations and Graphs. International Conference on Data Engineering (ICDE), 2022
en
dc.subject
machine-learning
en
dc.subject
entity representation
en
dc.subject
natural language processing
en
dc.subject
big data
en
dc.title
Learning, deducing and linking entities
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en

Files

Original bundle

Now showing 1 - 1 of 1
Name:
Tugay2023.pdf
Size:
1.3 MB
Format:
Adobe Portable Document Format
Description:

This item appears in the following Collection(s)