Options
BibDedupe : An Open-Source Python Library for Bibliographic Record Deduplication
Wagner, Gerit (2025): BibDedupe : An Open-Source Python Library for Bibliographic Record Deduplication, in: Bamberg: Otto-Friedrich-Universität, S. 1–6.
Faculty/Chair:
Author:
Publisher Information:
Year of publication:
2025
Pages:
Source/Other editions:
The journal of open source software : a developer friendly journal for research software packages, The Open Journal, 2024, Jg. 9, Nr. 97, 6318, S. 1–6, ISSN: 2475-9066
Year of first publication:
2024
Language:
English
Abstract:
BibDedupe is a Python library developed for bibliographic record deduplication in meta-analysis and research synthesis. It is constructed with a focus on four requirements: (1) Zero false positives: The primary objective is to prevent incorrectly merging distinct entries. This focus on zero false positives is crucial to ensure trustworthiness and prevent biased conclusions in the analysis. (2) Reproducibility: BibDedupe implements fixed rules to produce consistent results, in line with the scientific standard of reproducibility. (3) Efficiency: The library is also tuned for low false-negative rates and rapid processing, to ensure scalability of the duplicate identification process. (4) Continuous evaluation and improvement: It is continuously evaluated on over 160,000 records from 10 datasets to ensure its effectiveness, especially in follow-up refinements. Unlike general-purpose deduplication tools, BibDedupe is specifically designed for the unique requirements of bibliographic data in meta-analysis and research synthesis. In this context, BibDedupe aims to provide a Python library that improves the effectiveness and efficiency of duplicate identification, potentially benefitting review papers across scientific disciplines.
Keywords:
BibDedupe
Type:
Article
Activation date:
November 10, 2025
Permalink
https://fis.uni-bamberg.de/handle/uniba/110330