Since the creation of the first pictograms humankind has struggled with how to govern the use of data.
The dawning of the computing era in the 1960s unlocked incredible capabilities for the creation and storage of data, and ever since then data professionals have wrestled with the challenge of classifying, verifying, and securing that data.
Billions of dollars have been spent on tools for organising data – databases, warehouses, marts, lakes, and a pantheon of tools for data management and manipulation. But still, numerous surveys point to most organisations continuing to suffer data quality issues that impact their performance.
According to the advanced analytics and AI leader for the Southern Hemisphere at GHD Digital, Sarah Dods, good data governance means that you are informed and in control of your data systems.
"Without data governance and a system in place where somebody is the custodian of that data system and they know intimately where it has come from and what’s in it, the ability to really get caught up in unexpected situations is pretty high," Dods said.
According to the former chief data scientist for NSW, Dr Ian Oppermann, data governance can best be viewed in the context of the data lifecycle, incorporating its collection, transmission, storage, preparation, processing, usage, archiving, and deletion. Often, however, he said organisations stumble at the first step.
"Almost every data set arrives at your doorstep with a lack of information about how it got to you. Even when you put great care into capturing that information, there is a whole lot of metadata that is missing, and if you don’t have much metadata then it is a really risky prospect to use that data,” Oppermann said.
Significant work is being undertaken to provide a greater understanding of the problems of data classification, with Oppermann being involved in the creation of two new standards. The first of these, ISO/IEC 5207:2024, sets out terminology and use cases for data use, sharing, and exchange, while ISO/IEC 5212:2024 provides high-level guidance to organisations and individuals to assist in realising the benefits from data usage while managing risks.
"We took a simplistic approach," Oppermann said.
"We asked what do we need to know about the data before we use it, and what do we need to understand about the data as we use it? And once a data product is created, what are the guidance, restrictions, or prohibitions that need to be put around those data products so that when they start their own data lifecycle, we know what needs to be carried with it as it travels on its journey."
For data practitioners, questions of data governance are driving new behaviours as previous processes and considerations are overwritten.
For the strategic data and insights manager at DuluxGroup, Paul Ryan, most prominent among these are considerations of personally identifiable information.
"One catalyst is no doubt the elevated corporate risk profile embedded in changes to Australian privacy legislation due later this year, but there is something more fundamental happening," Ryan said.
"Businesses that once hoarded the most personal customer data like gold bullion now willingly shed it after mining and cloning underlying patterns."
Similarly, concerns about privacy violations regarding third-party tracking cookies have driven their near-demise, leading professionals again to consider different approaches, but concerns about governance may ensure some are short lived.
"Adtech solutions such as data cleanrooms – conjured to ward off a marketing dark age – are starting to look shaky," Ryan said.
"Like cookies, they circumvent the central feature of data privacy: individual consent.
"Adoption of specialist consent management platforms is set to explode, as companies search for simple templates to centralise the management and audibility of the contact data they capture, store, use, move and remove."
Ryan believes that just as with the gold rushes of the 19th century, the big winners of this 21st century data gold rush will no doubt be the secondary and tertiary waves of pragmatic service providers who 'follow the rainbow' at a slight distance.
"The artificial layers they generate from our behavioural patterns will create fortunes for some and reality for all," he said.
Regardless of how data governance is discussed conceptually, it still needs to be applied practically for it to have any impact in the real world, and that can come down to questions of ownership.
At ANZ, chief technology officer Tim Hogarth said a decision to take a data mesh approach to data organisation is enabling the bank to bring together data from different sources, with the teams that own the data within the bank taking responsibility for making data accessible to other divisions as data products.
"That's a big, big shift from the traditional approach where the person who was consuming the data just grabbed it and then it was their problem to try and manage it," Hogarth said.
"The person who owns the data has the responsibility of publishing, and then the person who consumes that has the responsibility of adhering to the rules.
"You move to a form of computational governance, so that you've got compliance that's baked in. It's almost impossible for someone to misuse the data, because it's just essentially part of the schema or part of the permission set."
The data mesh concept is taken further still in the form of a data fabric, which is an architectural framework of data services that standardise and automate data management practices, and that enable that data to be used for operational and analytic purposes.
The need to apply governance to data has been given greater urgency by the vital role data plays in the creation of AI systems, and the importance of ensuring that AI models are trained appropriately using suitable data that minimises the likelihood of errors, bias, or other negative outcomes.
The importance of data governance has also been given additional weight by increasing regulatory requirements for organisations to better understand their data and how it can be used – especially data containing the personal details of individuals.
For those organisations that are struggling to find the skills to create a comprehensive data governance framework, a new market of Data Governance-as-a-Service (DGaaS) providers have emerged to take on some of the heavy lifting by offering cloud-based services that promise the necessary tools, processes, and expertise.
Grand View Research estimates the global data governance market to be worth US$3.35 billion ($5.05 billion) in 2023, and projected to grow at a compound annual growth rate of 21.7 percent until 2030.
Another significant trend is for data governance considerations to now occur much earlier in the creation of data analytics and AI programs, often described as a ‘shift left' approach to governance.
According to the data education provider Dataversity, the goal of this approach is to both protect sensitive information and improve the overall quality of data collected.
But while questions of data governance are often bundled on to data and technology professionals, GHD Digital's Dods argues it is imperative that organisations take a far broader perspective.
"Data is a business function, and IT is a support function," Dods said.
"The drivers are different, the mindsets are different, but the systems overlap. What I am seeing now is a really interesting, shared responsibility conversation between IT and data and analytics."
We are proud to present the State of Data champions, and showcase the work they do.