Technology Record - Issue 23: Winter 21/22

78 www. t e c h n o l o g y r e c o r d . c om How can I make sure that data actually supports the business objectives of our customers? As chief technology officer, I dedicate a lot of time to thinking about how to solve data health problems such as this. Data health involves tools, of course, but also processes and people. It concerns every employee who has contact with data, therefore the approach to data health must be pervasive. Data quality is essential to data health. Traditionally, data originates from human entries or the integration of third-party data, both of which are prone to errors and extremely difficult to control. In addition, the data that works beautifully for its intended applications can give rise to objective quality problems when extracted for another use – typically analytics. Outside of its intended environment, the data is removed from the business logic that puts it into context, and from the habits, workarounds and best practices of regular users, which often go undocumented. So even when a data format or content is not objectively a quality issue within its original silo, it will almost certainly become one when extracted and combined with others for an integration or an analytics project. Academic research describes up to 10 data quality dimensions, but, in practice, there are five that are critical to most users: completeness, timeliness, accuracy, consistency and accessibility. Each of these dimensions correspond to a challenge for an analytics group: if the data doesn’t provide a clear and accurate picture of reality, it will lead to poor decisions, missed opportunities, increased costs or compliance risks. This makes measuring data quality a complex, multidimensional problem. Data quality assessment must be a continuous process, as more data flows into the organisation all the time. Traditionally data quality assessment has been done on top of the applications, databases, data lakes or data warehouses where data lives. Many data quality products must collect data in their own system before they can run the assessment like an audit, as part of a data governance workflow. A more modern approach is pervasive data quality, integrated directly into the data supply chain. The more upstream the assessment is made, the earlier risks are identified, and the less costly the remediation will be. The assessment of data quality typically starts by observing the data and computing the relevant data quality metrics. But companies should also be looking at quality metrics that can be aggregated across dimensions, such as the Talend Trust Score. Static or dynamic reports, dashboards and drill-down explorations that focus on data quality issues and how to resolve them (not to be confused with business intelligence) provide perspective on overall data quality. For more fine-grained insight, issues will be Solving the data problem Understanding how best to use data and how to allow employees to work with it takes a pervasive and continuous approach K R I SHNA TAMMANA : TA L END V I EWPO I NT “Data quality improvement is a balance between tools, processes and people”