Assessing data quality – A probability-based metric for semantic consistency

data-management

student-paper

Authors

Heinrich, Bernd

Klier, Mathias

Schiller, Alexander

Wagner, Gerit

Published

2018

Doi

10.1016/J.DSS.2018.03.011

Keywords

data-management, student-paper

Summary

We present a probability-based metric for semantic consistency using a set of uncertain rules. As opposed to existing metrics for semantic consistency, our metric allows to consider rules that are expected to be fulfilled with specific probabilities. The resulting metric values represent the probability that the assessed dataset is free of internal contradictions with regard to the uncertain rules and thus have a clear interpretation. The theoretical basis for determining the metric values are statistical tests and the concept of the p-value, allowing the interpretation of the metric value as a probability. We demonstrate the practical applicability and effectiveness of the metric in a real-world setting by analyzing a customer dataset of an insurance company. Here, the metric was applied to identify semantic consistency problems in the data and to support decision-making, for instance, when offering individual products to customers.

Article / DOI link

Citation (APA style)

Heinrich, B., Klier, M., Schiller, A., & Wagner, G. (2018). Assessing data quality – A probability-based metric for semantic consistency. Decision Support Systems 110, 95–106. https://doi.org/10.1016/J.DSS.2018.03.011

Citation: BibTeX

@article{HeinrichKlierSchillerEtAl2018,
  doi        = {10.1016/J.DSS.2018.03.011},
  author     = {Heinrich, Bernd and Klier, Mathias and Schiller, Alexander and Wagner, Gerit},
  journal    = {Decision Support Systems},
  title      = {Assessing data quality – A probability-based metric for semantic consistency},
  year       = {2018},
  volume     = {110},
  pages      = {95--106},
  url        = {https://www.sciencedirect.com/science/article/pii/S0167923618300599},
  abstract   = {We present a probability-based metric for semantic consistency using a set of uncertain rules. As opposed to existing metrics for semantic consistency, our metric allows to consider rules that are expected to be fulfilled with specific probabilities. The resulting metric values represent the probability that the assessed dataset is free of internal contradictions with regard to the uncertain rules and thus have a clear interpretation. The theoretical basis for determining the metric values are statistical tests and the concept of the p-value, allowing the interpretation of the metric value as a probability. We demonstrate the practical applicability and effectiveness of the metric in a real-world setting by analyzing a customer dataset of an insurance company. Here, the metric was applied to identify semantic consistency problems in the data and to support decision-making, for instance, when offering individual products to customers.},
  news_announced = {2026-02-22}
}

Citation: RIS

TY  - JOUR
AU  - Heinrich, Bernd
AU  - Klier, Mathias
AU  - Schiller, Alexander
AU  - Wagner, Gerit
TI  - Assessing data quality – A probability-based metric for semantic consistency
T2  - Decision Support Systems
PY  - 2018
VL  - 110
SP  - 95
EP  - 106
DO  - 10.1016/J.DSS.2018.03.011
UR  - https://www.sciencedirect.com/science/article/pii/S0167923618300599
ER  -

--- title: "Assessing data quality – A probability-based metric for semantic consistency" date: "2018" date-format: "YYYY" categories: ["data-management", "student-paper"] keywords: ["data-management", "student-paper"] doi: "10.1016/J.DSS.2018.03.011" url: "https://www.sciencedirect.com/science/article/pii/S0167923618300599" journal.name: "Decision Support Systems" outlet: "Decision Support Systems" author: "Heinrich, Bernd and Klier, Mathias and Schiller, Alexander and Wagner, Gerit" authors: - name: "Heinrich, Bernd" - name: "Klier, Mathias" orcid: "0000-0001-7109-0339" - name: "Schiller, Alexander" - name: "Wagner, Gerit" orcid: "0000-0003-3926-7717" citation_key: "HeinrichKlierSchillerEtAl2018" free_fulltext: false self_archiving_possible_1y: false self_archiving_possible_2y: true format: html: include-after-body: ../../assets/metrics-scripts.html --- # Summary ::: { .justify } We present a probability-based metric for semantic consistency using a set of uncertain rules. As opposed to existing metrics for semantic consistency, our metric allows to consider rules that are expected to be fulfilled with specific probabilities. The resulting metric values represent the probability that the assessed dataset is free of internal contradictions with regard to the uncertain rules and thus have a clear interpretation. The theoretical basis for determining the metric values are statistical tests and the concept of the p-value, allowing the interpretation of the metric value as a probability. We demonstrate the practical applicability and effectiveness of the metric in a real-world setting by analyzing a customer dataset of an insurance company. Here, the metric was applied to identify semantic consistency problems in the data and to support decision-making, for instance, when offering individual products to customers. ::: <div class="text-center my-3"> <a class="btn btn-sm btn-outline-secondary me-2" href="https://doi.org/10.1016/J.DSS.2018.03.011" target="_blank" role="button"> <i class="bi bi-box-arrow-up-right"></i> Article / DOI link </a> </div> ```{=html} <div class="metrics-row">  <div class="metric"> <div class="altmetric-embed" data-badge-type="donut" data-badge-popover="right" data-doi="10.1016/J.DSS.2018.03.011" data-hide-no-mentions="true"> </div> </div>  <div class="metric"> <span class="__dimensions_badge_embed__" data-doi="10.1016/J.DSS.2018.03.011" data-style="small_circle" data-hide-zero-citations="true" data-legend="hover-right"> </span> </div>  <div class="metric"> <div class="scite-badge" data-doi="10.1016/J.DSS.2018.03.011"> </div> </div> </div> ``` ## Citation (APA style) <div class="apa-citation"> <p style="text-indent:-2.5em; margin-left:2.5em;"> Heinrich, B., Klier, M., Schiller, A., & Wagner, G. (2018). Assessing data quality – A probability-based metric for semantic consistency. *Decision Support Systems* 110, 95--106. https://doi.org/10.1016/J.DSS.2018.03.011 </p> </div> ## Citation: BibTeX ```bibtex @article{HeinrichKlierSchillerEtAl2018, doi = {10.1016/J.DSS.2018.03.011}, author = {Heinrich, Bernd and Klier, Mathias and Schiller, Alexander and Wagner, Gerit}, journal = {Decision Support Systems}, title = {Assessing data quality – A probability-based metric for semantic consistency}, year = {2018}, volume = {110}, pages = {95--106}, url = {https://www.sciencedirect.com/science/article/pii/S0167923618300599}, abstract = {We present a probability-based metric for semantic consistency using a set of uncertain rules. As opposed to existing metrics for semantic consistency, our metric allows to consider rules that are expected to be fulfilled with specific probabilities. The resulting metric values represent the probability that the assessed dataset is free of internal contradictions with regard to the uncertain rules and thus have a clear interpretation. The theoretical basis for determining the metric values are statistical tests and the concept of the p-value, allowing the interpretation of the metric value as a probability. We demonstrate the practical applicability and effectiveness of the metric in a real-world setting by analyzing a customer dataset of an insurance company. Here, the metric was applied to identify semantic consistency problems in the data and to support decision-making, for instance, when offering individual products to customers.}, news_announced = {2026-02-22} } ``` ## Citation: RIS ```bibtex TY - JOUR AU - Heinrich, Bernd AU - Klier, Mathias AU - Schiller, Alexander AU - Wagner, Gerit TI - Assessing data quality – A probability-based metric for semantic consistency T2 - Decision Support Systems PY - 2018 VL - 110 SP - 95 EP - 106 DO - 10.1016/J.DSS.2018.03.011 UR - https://www.sciencedirect.com/science/article/pii/S0167923618300599 ER - ```