Data Cleansing

Data Cleansing

Zur Datenbereinigung (engl. data cleansing oder data scrubbing) gehören verschiedene Verfahren zum Entfernen und Korrigieren von Datenfehlern in Datenbanken oder anderen Informationssystemen. Die Fehler können beispielsweise aus inkorrekten (ursprünglich falschen oder veralteten), redundanten, inkonsistenten oder falsch formatierten Daten bestehen.

Wesentliche Schritte zur Datenbereinigung sind die Duplikaterkennung (Erkennen und Zusammenlegen von gleichen Datensätzen) und Datenfusion (Zusammenführen und Vervollständigen lückenhafter Daten).

Die Datenbereinigung ist ein Beitrag zur Verbesserung der Informationsqualität. Allerdings betrifft Informationsqualität auch viele weitere Eigenschaften von Datenquellen (Glaubwürdigkeit, Relevanz, Verfügbarkeit, Kosten...), die sich mittels Datenbereinigung nicht verbessern lassen.

Siehe auch


Wikimedia Foundation.

Игры ⚽ Нужно сделать НИР?

Schlagen Sie auch in anderen Wörterbüchern nach:

  • Data cleansing — Not to be confused with Sanitization (classified information). Data cleansing, data cleaning, or data scrubbing is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database. Used… …   Wikipedia

  • data cleansing — / deɪtə ˌklenzɪŋ/, data cleaning / deɪtə ˌkli:nɪŋ/ noun checking data to make sure it is correct …   Marketing dictionary in english

  • Cleansing — may refer to: Cleansing (album) by Prong Ethnic cleansing Cleanliness Body cleansing or detoxification disputed alternative medical practice Colon cleansing Data cleansing in data management, the detection and correction of corrupt or inaccurate… …   Wikipedia

  • Data management — comprises all the disciplines related to managing data as a valuable resource. Contents 1 Overview 2 Topics in Data Management 3 Body Of Knowledge 4 Usage …   Wikipedia

  • Data Intensive Computing — is a class of parallel computing applications which use a data parallel approach to processing large volumes of data typically terabytes or petabytes in size and typically referred to as Big Data. Computing applications which devote most of their …   Wikipedia

  • Data quality — Data are of high quality if they are fit for their intended uses in operations, decision making and planning (J. M. Juran). Alternatively, the data are deemed of high quality if they correctly represent the real world construct to which they… …   Wikipedia

  • Data quality assessment — is the process of exposing technical and business data issues in order to plan data cleansing and data enrichment strategies. Technical quality issues are generally easy to discover and correct, such as • Inconsistent standards in structure,… …   Wikipedia

  • Data quality assurance — is the process of profiling the data to discover inconsistencies, and other anomalies in the data and performing data cleansing activities (e.g. removing outliers, missing data interpolation) to improve the data quality . These activities can be… …   Wikipedia

  • Cleansing and Conforming Data — This process of Cleansing and Conforming Data change data on its way from source system(s) to the data warehouse and can also be used to identify and record errors about data. The latter information can be used to fix how the source system(s)… …   Wikipedia

  • Data mining — Not to be confused with analytics, information extraction, or data analysis. Data mining (the analysis step of the knowledge discovery in databases process,[1] or KDD), a relatively young and interdisciplinary field of computer science[2][3] is… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”