Options
Entity Matching for Person Records in Authority Files
Hebeis, Maximilian (2026): Entity Matching for Person Records in Authority Files, Bamberg: Otto-Friedrich-Universität, doi: 10.20378/irb-112660.
Author:
Publisher Information:
Year of publication:
2026
Pages:
Supervisor:
Language:
English
Remark:
Masterarbeit, Otto-Friedrich-Universität Bamberg, 2025
DOI:
Abstract:
Authority control is of increasing importance in multiple research fields. To ensure reusability as promoted by the FAIR principles, authority files provide persistent identifiers to named entities such as persons, organisations, or geographical locations. Initially conceived as controlled vocabularies in the library context, in recent years authority files have evolved into broader knowledge bases with additional information on the entity referred to in each authority record. Reconciling authority files which contain records referencing the same entity remains a challenge. This thesis represents a case study into applying learning-based entity matching to person records from two large authority databases, namely the German national Integrated Authority File (GND) and the crowd-sourced open knowledge base Wikidata. A workflow is built to extract the person records from the data dumps of both authority files in the RDF/N-Triples format and store them in MongoDB collections. The person records with existing link attributes to a record from the respective other database are then used as source data to train, validate and test learning-based classifiers on. These classifiers are subsequently compared based on their performance classifying person record pairs correctly as matching (referring to the same person) or non-matching. The two approaches compared are classifiers based on classical machine learning and classifiers based on deep learning, i.e. a multilingual variant of the language model BERT. The results show that learning-based entity-matching architectures achieve high F1 scores on GND and Wikidata authority data; despite their semantic capabilities, BERT-based approaches don’t manage to outperform ML classifiers on the data used in this study.
GND Keywords: ; ; ;
Person
Normdatei
Maschinelles Lernen
Deep Learning
Keywords: ; ; ; ; ; ; ; ;
Entity Matching
Entity Resolution
Authority Control
Authority File
BERT
Deep Learning
Machine Learning
GND
Wikidata
DDC Classification:
RVK Classification:
Type:
Masterthesis
Activation date:
January 28, 2026
Permalink
https://fis.uni-bamberg.de/handle/uniba/112660