BibDedupe: An Open-Source Python Library for Bibliographic Record Deduplication
literature-review
Summary
BibDedupe is a Python library developed for bibliographic record deduplication in meta-analysis and research synthesis. It is constructed with a focus on four requirements: (1) Zero false positives: The primary objective is to prevent incorrectly merging distinct entries. This focus on zero false positives is crucial to ensure trustworthiness and prevent biased conclusions in the analysis. (2) Reproducibility: BibDedupe implements fixed rules to produce consistent results, in line with the scientific standard of reproducibility. (3) Efficiency: The library is also tuned for low false-negative rates and rapid processing, to ensure scalability of the duplicate identification process. (4) Continuous evaluation and improvement: It is continuously evaluated on over 160,000 records from 10 datasets to ensure its effectiveness, especially in follow-up refinements. Unlike general-purpose deduplication tools, BibDedupe is specifically designed for the unique requirements of bibliographic data in meta-analysis and research synthesis. In this context, BibDedupe aims to provide a Python library that improves the effectiveness and efficiency of duplicate identification, potentially benefitting review papers across scientific disciplines.
Additional resources
- Code / source: https://github.com/CoLRev-Environment/bib-dedupe
Citation (APA style)
Wagner, G. (2024). BibDedupe: An Open-Source Python Library for Bibliographic Record Deduplication. Journal of Open Source Software 9(97), 6318. https://doi.org/10.21105/JOSS.06318
Citation: BibTeX
@article{Wagner2024,
doi = {10.21105/JOSS.06318},
author = {Wagner, Gerit},
journal = {Journal of Open Source Software},
title = {BibDedupe: An Open-Source Python Library for Bibliographic Record Deduplication},
year = {2024},
volume = {9},
number = {97},
pages = {6318},
url = {https://joss.theoj.org/papers/10.21105/joss.06318},
fulltext = {https://joss.theoj.org/papers/10.21105/joss.06318.pdf},
abstract = {BibDedupe is a Python library developed for bibliographic record deduplication in meta-analysis and research synthesis. It is constructed with a focus on four requirements: (1) Zero false positives: The primary objective is to prevent incorrectly merging distinct entries. This focus on zero false positives is crucial to ensure trustworthiness and prevent biased conclusions in the analysis. (2) Reproducibility: BibDedupe implements fixed rules to produce consistent results, in line with the scientific standard of reproducibility. (3) Efficiency: The library is also tuned for low false-negative rates and rapid processing, to ensure scalability of the duplicate identification process. (4) Continuous evaluation and improvement: It is continuously evaluated on over 160,000 records from 10 datasets to ensure its effectiveness, especially in follow-up refinements. Unlike general-purpose deduplication tools, BibDedupe is specifically designed for the unique requirements of bibliographic data in meta-analysis and research synthesis. In this context, BibDedupe aims to provide a Python library that improves the effectiveness and efficiency of duplicate identification, potentially benefitting review papers across scientific disciplines.}
}Citation: RIS
TY - JOUR
AU - Wagner, Gerit
TI - BibDedupe: An Open-Source Python Library for Bibliographic Record Deduplication
T2 - Journal of Open Source Software
PY - 2024
VL - 9
IS - 97
SP - 6318
DO - 10.21105/JOSS.06318
UR - https://joss.theoj.org/papers/10.21105/joss.06318
ER -