Schema aware knowledge graph completions

Wang, Fangrong

Schema aware knowledge graph completions

Simple item page

dc.contributor.advisor

Pan, Jeff

dc.contributor.advisor

Bundy, Alan

dc.contributor.author

Wang, Fangrong

dc.contributor.sponsor

Huawei

en

dc.date.accessioned

2024-07-23T11:14:36Z

dc.date.available

2024-07-23T11:14:36Z

dc.date.issued

2024-07-23

dc.description.abstract

Knowledge Graph Completion (KGC) aims to complete the structure of knowledge graphs by predicting the missing entities or relationships in them and mining unknown facts over the relational triples of the form ⟨h, r,t⟩, where h is the head entity, r the relation and t the tail entity. Recent success of knowledge graphs (KG) has spurred widespread interests in methods for the problem of KGC. However, efforts to understand the quality of the candidate triples from these methods, in particular from the schema aspect, have been limited. In fact, most existing Knowledge Graph completion methods do not guarantee that the expanded Knowledge Graphs are consistent with the schema of the initial Knowledge Graph. As a result, while existing KGC approaches perform well in completing the graph under the rank based evaluation metrics such as Hit@N and Mean Reciprocal Rank, they often struggle to ensure the consistency of newly generated triples. Therefore, an approach that tries to balance completeness and consistency is needed. Existing KGC research often uses the silver standard method [1] to measure the performance of Knowledge Graph completion approaches, assuming that the KG itself is already of reasonable quality. In the silver standard method, some existing links in the data sub-graph (ABox) are removed for testing if triple producers can help to recover the missing links. Our experiments bring the silver standard method into question. They show that only 89.9% of triples are consistent with the original NELL-995 dataset [2] schema and satisfy the domain and range constraints. The corresponding ratios for the DBpedia Politics subset (DBped-P)[3] are 99.6% triples consistent with the DBpedia-2016 1 schema and 57.8% triples satisfy the property domain and range constraints. If a description logic reasoner is used to infer triples that logically follow an existing knowledge graph, and the graph contains inconsistent subset, it would stop the reasoner from accurately deducing logical consequences or drawing reliable inferences. The main aim of this thesis is to address the fundamental problems of “completeness” and “consistency” in the field of Knowledge Graph Completion by exploring a combination of conventional Knowledge Graph techniques for completion and semantic reasoning in an iterative way. Based on this, we explore the construction and enrichment of a domain-specific knowledge graph in a real-world scenario. As expressed in the title of this thesis, “Schema Aware Knowledge Graph Completions”, there are two important notions that are investigated: (1) schema-aware, referring to the usage of a schema of a Knowledge Graph in determining the consistency of triples; (2) Knowledge Graph Completion (KGC), referring to our method of combining different Knowledge Graph Completion approaches with a semantic reasoning service, which are embedding-based KGC, a literal-embedding-based KGC, rule learning, and a materialisation service from an existing reasoner. This thesis addresses three research questions: Q1: How can we increase the number of consistent triples with respect to a schema of a Knowledge Graph that are produced by a Knowledge Graph Completion approach? In addressing the first research question, we build a schema-aware KGC system, namely SICKLE. We employ various strategies at different levels. At the system level, we combined different types of triple producers and employ an approximated consistency checking service to produce schema-correct triples. Only schema-correct triples are added to the target KG and used in iterative training. We flexibly assemble KGC pipelines with four types of triple producers which learn new triples based on existing KG and a semantic reasoning service in an iterative manner and run in parallel or series mode. The different type of methods operate independently and are able to focus on their strengths and can benefit each other with new schema-correct triples fed back into next learning procedure. At the functional level, we integrate a literal-embedding-based methods. We found that knowledge from pre-trained language models enhances schema-awareness in KGC tasks, in particular, benefit schema-related KGC tasks, such as type prediction. At the algorithm level, we implement a schema-aware sampling strategy for the embedding-based methods. Schema-correct triples are sampled as positive examples and schema-inconsistent triples are sampled as negative examples. We observed that this sampling method not only have positive effect on producing schema-correct triples, but also improves performance in a downstream type prediction task. Unlike the existing approaches, that only focus on the completeness issue of a Knowledge Graph, our combination of approaches successfully produces more new triples that are consistent with regards to the schema of a Knowledge Graph. Q2: How can we encourage models to produce more consistent triplets while maintaining accuracy in link prediction? To tackle the second question, we optimize the combined KGC pipeline that address the first research question from two different directions. Firstly, we experiment with multiple data fusion methods, calibrating scores from different models and aggregating them into a final confidence score. We then use this fused score to re-rank the prediction results, leading to superior performance in the same link prediction task. Secondly, we extend the schema-aware negative sampling to open-world informative negative sampling. In essence, we utilize schema to identify consistent and inconsistent subsets, and leverage the currently popular large language models to rank those negative samples within the consistent set that appeared more likely to be incorrect. Our negative sampling strategy outperforms two closed-world assumption (CWA) based sampling strategies in terms of link prediction results. Q3: In a real scenario, given multiple complex data sources, how to make the a domain specific KG an evolving process by combining data-driven methods and knowledge-based methods, with regard to the schema of KG? We explore an approach to obtain and complete knowledge in a real scenario within a 5G network log system. This involves extracting and predicting knowledge from a segment of 5G network system logs and making the evolution of the knowledge graph an iterative process. We break down this application work into two sub-tasks: construction and completion. The construction process can be regarded as a triple producer based on log extraction. We combine language models and Horn rules from background knowledge to learn triples from arbitrary logs, facilitating knowledge representation and reasoning. We implement a local-to-global strategy for triple inference, which reduces the reasoning query space. Then, with the constructed log KG, the completion process is where the SICKLE system is applied. We employ negative Horn rules from background knowledge and KG schema to assure the quality of learned triples. With the log extraction systems, we build a log KG and demonstrate its capability in root cause analysis. With the SICKLE system, we enrich the log KG with more schema-consistency triples.

en

dc.description.abstract

2025-07-23

en

dc.identifier.uri

https://hdl.handle.net/1842/42015

dc.identifier.uri

http://dx.doi.org/10.7488/era/4737

dc.language.iso

en

dc.publisher

The University of Edinburgh

en

dc.relation.hasversion

Wang, F., Bundy, A., Li, X., Zhu, R., Nuamah, K., Xu, L., . . . Pan, J. Z. (2023), Schema-aware Iterative Completion for Knowledge Graphs Revisited, Accepted by World Wide Web Journal, 2023

en

dc.relation.hasversion

Wang, F., Bundy, A., Li, X., Zhu, R., Nuamah, K., Xu, L., . . . Pan, J. Z. (2021). LEKG: A System for Constructing Knowledge Graphs from Log Extraction. ACM International Conference Proceeding Series, (i), 181–185. https://doi.org/10.1145/3502223.3502250

en

dc.subject

knowledge graph

en

dc.subject

schema and consistency

en

dc.subject

Knowledge Graph Completion

en

dc.subject

KGC

en

dc.subject

KGC methods

en

dc.subject

schema-consistent triples

en

dc.title

Schema aware knowledge graph completions

en

dc.type

Thesis or Dissertation

en

dc.type.qualificationlevel

Doctoral

en

dc.type.qualificationname

PhD Doctor of Philosophy

en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Wang2024.pdf
Size:: 3.47 MB
Format:: Adobe Portable Document Format
Description:

Download

This item appears in the following Collection(s)

Informatics thesis and dissertation collection