BibDedupe: An Open-Source Python Library for Bibliographic Record Deduplication

literature-review
Author

Wagner, Gerit

Published

2024

Doi
Keywords

literature-review

Summary

BibDedupe is a Python library developed for bibliographic record deduplication in meta-analysis and research synthesis. It is constructed with a focus on four requirements: (1) Zero false positives: The primary objective is to prevent incorrectly merging distinct entries. This focus on zero false positives is crucial to ensure trustworthiness and prevent biased conclusions in the analysis. (2) Reproducibility: BibDedupe implements fixed rules to produce consistent results, in line with the scientific standard of reproducibility. (3) Efficiency: The library is also tuned for low false-negative rates and rapid processing, to ensure scalability of the duplicate identification process. (4) Continuous evaluation and improvement: It is continuously evaluated on over 160,000 records from 10 datasets to ensure its effectiveness, especially in follow-up refinements. Unlike general-purpose deduplication tools, BibDedupe is specifically designed for the unique requirements of bibliographic data in meta-analysis and research synthesis. In this context, BibDedupe aims to provide a Python library that improves the effectiveness and efficiency of duplicate identification, potentially benefitting review papers across scientific disciplines.

Additional resources

Citation (APA style)

Wagner, G. (2024). BibDedupe: An Open-Source Python Library for Bibliographic Record Deduplication. Journal of Open Source Software 9(97), 6318. https://doi.org/10.21105/JOSS.06318

Citation: BibTeX

@article{Wagner2024,
  doi        = {10.21105/JOSS.06318},
  author     = {Wagner, Gerit},
  journal    = {Journal of Open Source Software},
  title      = {BibDedupe: An Open-Source Python Library for Bibliographic Record Deduplication},
  year       = {2024},
  volume     = {9},
  number     = {97},
  pages      = {6318},
  url        = {https://joss.theoj.org/papers/10.21105/joss.06318},
  fulltext   = {https://joss.theoj.org/papers/10.21105/joss.06318.pdf},
  abstract   = {BibDedupe is a Python library developed for bibliographic record deduplication in meta-analysis and research synthesis. It is constructed with a focus on four requirements: (1) Zero false positives: The primary objective is to prevent incorrectly merging distinct entries. This focus on zero false positives is crucial to ensure trustworthiness and prevent biased conclusions in the analysis. (2) Reproducibility: BibDedupe implements fixed rules to produce consistent results, in line with the scientific standard of reproducibility. (3) Efficiency: The library is also tuned for low false-negative rates and rapid processing, to ensure scalability of the duplicate identification process. (4) Continuous evaluation and improvement: It is continuously evaluated on over 160,000 records from 10 datasets to ensure its effectiveness, especially in follow-up refinements. Unlike general-purpose deduplication tools, BibDedupe is specifically designed for the unique requirements of bibliographic data in meta-analysis and research synthesis. In this context, BibDedupe aims to provide a Python library that improves the effectiveness and efficiency of duplicate identification, potentially benefitting review papers across scientific disciplines.}
}

Citation: RIS

TY  - JOUR
AU  - Wagner, Gerit
TI  - BibDedupe: An Open-Source Python Library for Bibliographic Record Deduplication
T2  - Journal of Open Source Software
PY  - 2024
VL  - 9
IS  - 97
SP  - 6318
DO  - 10.21105/JOSS.06318
UR  - https://joss.theoj.org/papers/10.21105/joss.06318
ER  -