Edinburgh Research Archive

Schema aware knowledge graph completions

dc.contributor.advisor
Pan, Jeff
dc.contributor.advisor
Bundy, Alan
dc.contributor.author
Wang, Fangrong
dc.contributor.sponsor
Huawei
en
dc.date.accessioned
2024-07-23T11:14:36Z
dc.date.available
2024-07-23T11:14:36Z
dc.date.issued
2024-07-23
dc.description.abstract
Knowledge Graph Completion (KGC) aims to complete the structure of knowledge graphs by predicting the missing entities or relationships in them and mining unknown facts over the relational triples of the form ⟨h, r,t⟩, where h is the head entity, r the relation and t the tail entity. Recent success of knowledge graphs (KG) has spurred widespread interests in methods for the problem of KGC. However, efforts to understand the quality of the candidate triples from these methods, in particular from the schema aspect, have been limited. In fact, most existing Knowledge Graph completion methods do not guarantee that the expanded Knowledge Graphs are consistent with the schema of the initial Knowledge Graph. As a result, while existing KGC approaches perform well in completing the graph under the rank based evaluation metrics such as Hit@N and Mean Reciprocal Rank, they often struggle to ensure the consistency of newly generated triples. Therefore, an approach that tries to balance completeness and consistency is needed. Existing KGC research often uses the silver standard method [1] to measure the performance of Knowledge Graph completion approaches, assuming that the KG itself is already of reasonable quality. In the silver standard method, some existing links in the data sub-graph (ABox) are removed for testing if triple producers can help to recover the missing links. Our experiments bring the silver standard method into question. They show that only 89.9% of triples are consistent with the original NELL-995 dataset [2] schema and satisfy the domain and range constraints. The corresponding ratios for the DBpedia Politics subset (DBped-P)[3] are 99.6% triples consistent with the DBpedia-2016 1 schema and 57.8% triples satisfy the property domain and range constraints. If a description logic reasoner is used to infer triples that logically follow an existing knowledge graph, and the graph contains inconsistent subset, it would stop the reasoner from accurately deducing logical consequences or drawing reliable inferences. The main aim of this thesis is to address the fundamental problems of “completeness” and “consistency” in the field of Knowledge Graph Completion by exploring a combination of conventional Knowledge Graph techniques for completion and semantic reasoning in an iterative way. Based on this, we explore the construction and enrichment of a domain-specific knowledge graph in a real-world scenario. As expressed in the title of this thesis, “Schema Aware Knowledge Graph Completions”, there are two important notions that are investigated: (1) schema-aware, referring to the usage of a schema of a Knowledge Graph in determining the consistency of triples; (2) Knowledge Graph Completion (KGC), referring to our method of combining different Knowledge Graph Completion approaches with a semantic reasoning service, which are embedding-based KGC, a literal-embedding-based KGC, rule learning, and a materialisation service from an existing reasoner. This thesis addresses three research questions: Q1: How can we increase the number of consistent triples with respect to a schema of a Knowledge Graph that are produced by a Knowledge Graph Completion approach? In addressing the first research question, we build a schema-aware KGC system, namely SICKLE. We employ various strategies at different levels. At the system level, we combined different types of triple producers and employ an approximated consistency checking service to produce schema-correct triples. Only schema-correct triples are added to the target KG and used in iterative training. We flexibly assemble KGC pipelines with four types of triple producers which learn new triples based on existing KG and a semantic reasoning service in an iterative manner and run in parallel or series mode. The different type of methods operate independently and are able to focus on their strengths and can benefit each other with new schema-correct triples fed back into next learning procedure. At the functional level, we integrate a literal-embedding-based methods. We found that knowledge from pre-trained language models enhances schema-awareness in KGC tasks, in particular, benefit schema-related KGC tasks, such as type prediction. At the algorithm level, we implement a schema-aware sampling strategy for the embedding-based methods. Schema-correct triples are sampled as positive examples and schema-inconsistent triples are sampled as negative examples. We observed that this sampling method not only have positive effect on producing schema-correct triples, but also improves performance in a downstream type prediction task. Unlike the existing approaches, that only focus on the completeness issue of a Knowledge Graph, our combination of approaches successfully produces more new triples that are consistent with regards to the schema of a Knowledge Graph. Q2: How can we encourage models to produce more consistent triplets while maintaining accuracy in link prediction? To tackle the second question, we optimize the combined KGC pipeline that address the first research question from two different directions. Firstly, we experiment with multiple data fusion methods, calibrating scores from different models and aggregating them into a final confidence score. We then use this fused score to re-rank the prediction results, leading to superior performance in the same link prediction task. Secondly, we extend the schema-aware negative sampling to open-world informative negative sampling. In essence, we utilize schema to identify consistent and inconsistent subsets, and leverage the currently popular large language models to rank those negative samples within the consistent set that appeared more likely to be incorrect. Our negative sampling strategy outperforms two closed-world assumption (CWA) based sampling strategies in terms of link prediction results. Q3: In a real scenario, given multiple complex data sources, how to make the a domain specific KG an evolving process by combining data-driven methods and knowledge-based methods, with regard to the schema of KG? We explore an approach to obtain and complete knowledge in a real scenario within a 5G network log system. This involves extracting and predicting knowledge from a segment of 5G network system logs and making the evolution of the knowledge graph an iterative process. We break down this application work into two sub-tasks: construction and completion. The construction process can be regarded as a triple producer based on log extraction. We combine language models and Horn rules from background knowledge to learn triples from arbitrary logs, facilitating knowledge representation and reasoning. We implement a local-to-global strategy for triple inference, which reduces the reasoning query space. Then, with the constructed log KG, the completion process is where the SICKLE system is applied. We employ negative Horn rules from background knowledge and KG schema to assure the quality of learned triples. With the log extraction systems, we build a log KG and demonstrate its capability in root cause analysis. With the SICKLE system, we enrich the log KG with more schema-consistency triples.
en
dc.description.abstract
2025-07-23
en
dc.identifier.uri
https://hdl.handle.net/1842/42015
dc.identifier.uri
http://dx.doi.org/10.7488/era/4737
dc.language.iso
en
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Wang, F., Bundy, A., Li, X., Zhu, R., Nuamah, K., Xu, L., . . . Pan, J. Z. (2023), Schema-aware Iterative Completion for Knowledge Graphs Revisited, Accepted by World Wide Web Journal, 2023
en
dc.relation.hasversion
Wang, F., Bundy, A., Li, X., Zhu, R., Nuamah, K., Xu, L., . . . Pan, J. Z. (2021). LEKG: A System for Constructing Knowledge Graphs from Log Extraction. ACM International Conference Proceeding Series, (i), 181–185. https://doi.org/10.1145/3502223.3502250
en
dc.subject
knowledge graph
en
dc.subject
schema and consistency
en
dc.subject
Knowledge Graph Completion
en
dc.subject
KGC
en
dc.subject
KGC methods
en
dc.subject
schema-consistent triples
en
dc.title
Schema aware knowledge graph completions
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en

Files

Original bundle

Now showing 1 - 1 of 1
Name:
Wang2024.pdf
Size:
3.47 MB
Format:
Adobe Portable Document Format
Description:

This item appears in the following Collection(s)