What was clear when writing my recent Journal article, “Data Rights: Single vs. Multiple Ownership?,” is the knowledge that data ownership rights, data source and data quality are unquestionably critical to any data analytics and visualization, at least to a reasonable person, because they form the basis upon which simple and/or complex important decisions are made nowadays. What is still in question is a consensus on the degree or the extent to which data ownership rights, data source and quality matters. In other words, could data with 75% accuracy provide me a close enough approximation to that of 90-95%, and so on and so forth? As you may know, it is really difficult to have 100% clean data all the time, even in a well-maintained system, because of lag times and other interface, data ingestion, mapping, and syndication failures and/or errors. Therefore, the aim of this blog post is to share these thoughts and insights on the essence of understanding data rights, data source and data quality, in practical terms.
In my Journal article, I provided a practical guide that is capable of helping researchers and practitioners alike in making plausible determinations and assumptions about data ownership rights and ascertaining the relationship or correlation it has with data quality and accurate data analytics.
Thinking about multiple data ownership, which characterizes most data today, here is an example with current affairs. Take the Johns Hopkins Coronavirus Resource Center COVID-19 map as an example in which the data sources are listed as the World Health Organization (WHO), Centers for Disease Control and Prevention (CDC), European Centre for Disease Prevention and Control (ECDC), National Health Commission of the People’s Republic of China (NHC), ncov.dxy.cn (DXY), coronavirus.1point3acres.com (Global COVID-19 Tracker and Interactive Charts), Worldometers.info, and the COVID Tracking Project (testing and hospitalizations across the United States). In this case, the information and the visualization provided by the Coronavirus Resource Center would reasonably be said to have come from multiple data owners. The issue is that any inaccuracy in the data, inadvertently or intentionally, from any of these data sources could taint the quality of the overall data and the information being provided.
On the other hand, most data with single data ownership seem to have better quality because they are usually primary data and they are usually collected for a purpose, and inaccuracies in such data may derail the purpose for they are being collect. For example, online information from a user or an organization for an online transaction needs to be correct to ensure the completion of the transaction from purchase to delivery. Additionally, medical information from a patient requires some level of accuracy, otherwise the diagnosis, prognosis or treatment of the patient may be in jeopardy.
Editor’s note: For further insights on this topic, read Patrick Offor’s recent Journal article, “Data Rights: Single vs. Multiple Ownership?,” ISACA Journal, volume 3, 2020.