Adrian
2 min readJan 19, 2021

--

I’m having five important comments related to your ‘data downtime’ KPI as metric for measuring data quality. First, it appears to measure the performance of a process and therefore can be hardly considered as basis for measuring the intrinsic quality of the data. Consistency, completeness, accuracy, uniqueness, conformity and integrity are intrinsic data quality dimensions (aka variables in your post), while timeliness and accessibility are extrinsic dimensions, as they are dependent more on the infrastructure than on the data themselves. This delimitation is important, because one deals with two different perspectives that imply different types of actions, respectively approaches. It makes sense to split the two when considering KPIs.

Secondly, downtime refers to the time data is out of action or unavailable for use. Unless that’s addressed by design, namely the data with defects are not shown, then the data are further available and all the consequences deriving from this. Therefore, the metric can easily create confusion.

Thirdly, unless you are referring to a data product or system, I don’t think that the KPI is a good metric measure because is sensitive to data growth – if the data volume increases, more likely also the value will increase considerably.

Fourthly, multiplying the number of incidents to something that has the potential of having big values, carries the possibility of having the number of incidents disappearing in the big values. It’s enough to have a few outliers that impact your metric considerably, even if per total data quality is acceptable for the business.

This brings me to the fifth remark, data quality is best defined as the “fit for use” and this is context dependent. The KPIs need to consider this aspect, otherwise metric’s meaning for the business doesn’t have much value.

Not sure what you mean by ‘traditional methods’ because there seems no general accepted approach on how to measure data quality, even if Six Sigma provides a good basis for building upon it as it considers the defects in relation with the opportunities. This approach addresses better the data growth and allows using all kind of statistical and non-statistical tools coming with Six Sigma. It can be time consuming and prone to rule changes though that’s the reality.

--

--

Adrian

IT professional/blogger with more than 24 years experience in IT - Software Engineering, BI & Analytics, Data, Project, Quality, Database & Knowledge Management