Data Governance Checklist
I’ve heard a lot about data governance, but I still didn’t understand what it meant to implement it on practical level. According to Google, data governance is defined as:
Everything you do to ensure data is secure, private, accurate, available, and usable. It includes the actions people must take, the processes they must follow, and the technology that supports them throughout the data life cycle. 
Yes, you read that correctly. That’s about as generic of statement you can get.
I found that the best way of understanding data governance is by not trying to break it down into discrete concepts but by ensuring we check off all the boxes of an effective data governance program.
Has an organizational structure with different levels of data governance been established? Have the roles, responsibilities, and accountability for data decision making, management, and security been clearly defined and communicated?
In order to successfully implement data governance, you need to assign roles to the members of your organization. Here’s a list of possible positions along with their responsibilities:
- Executive sponsor: A senior employee who is ultimately responsible for the implementation and ongoing data governance processes. They act as the go between the most senior stakeholders (e.g. executives, board of directors) and the data governance lead or council.
- Data governance lead: A person responsible for all aspects of defining and operating the data governance policies and supporting the multiple data domains. Traditionally, this role tended to be the responsibility of the CIO or even the CTO. However, now, depending on the size of the organization, it’s recommended to have a dedicated position (i.e. Chief Data Officer or CDO).
- Data governance council: A governing body responsible for the strategic guidance of the data governance program, prioritization of the projects and initiatives, approval of organization-wide data policies and standards, as well as increasing awareness of the data governance program. In essence, this body sets the strategic direction for WHAT the data governance program needs to accomplish and WHEN it needs to accomplish it. In contrast, the data governance leads decides HOW the latter should be accomplished.
- Data Owner: A data owners is usually senior manager. You are likely to have a number of data owners within your data governance program, each of which is responsible for the data within their own area of the business.
- Data Steward: Data stewards should be people with a working knowledge of the data and understand how it is used by the business on a day-to-day basis. A data steward is generally appointed by the data owner to work with them or act as their representative in meetings.
- Data Stakeholder: A stakeholder in any data governance program is an individual or group that could affect, or be affected by data governance decisions, processes, policies, standards, etc. The obvious examples of stakeholders are business analysts and data architects.
- Data Custodian: Data custodians are typically technical. They’re responsible for building and maintaining the systems (e.g. database, data lake, data warehouse) in accordance with the businesses requirements.
Have standard policies and procedures about all aspects of data governance and the data management lifecycle, including collection, maintenance, usage and dissemination, been clearly defined and documented?
For a concrete example, see the Boston University’s Data Lifecycle Management Policy.
Have data governance policies and procedures been documented and communicated in an open and accessible way to all stakeholders, including staff, data providers, and the public?
Does the organization have a clearly documented set of policy, operational, and research needs that justify the collection of specific data elements?
General Data Protection Regulation (GDPR) is a regulation in EU law on data protection and privacy. GDPR requires you need to provide a legal justification for collecting personal data. This justification should be clearly documented somewhere.
Does the organization have a detailed, up-to-date inventory of all data elements that should be classified as sensitive, PII, or both? Have data records been classified according to the level of risk for disclosure of PII?
The data can be classified either manually or automatically (supported by a lot of tools nowadays). Once data has been classified, we can apply the appropriate column-level ACLs.
Have mechanisms been put in place to de-identify PII data whenever possible?
Even if you have some legitimate reason for holding on to Personal Identifiable Information, more often than not, you’re better off anonymizing the data or avoiding collecting it altogether as there can be severe repercussions in the case of a breach.
Does the organization conduct regular data quality audits to ensure that its strategies for enforcing quality control are up-to-date and that any corrective measures undertaken in the past have been successful in improving data quality?
In practice, this might mean generating metrics around data quality and interviewing users.
Are there policies and procedures in place to restrict and monitor staff data access, limiting what data can be accessed by whom?
We should assign differentiated levels of access based on job descriptions and responsibilities.
Have internal procedural controls been established to manage user access to PII?
Prior to letting members of the organization access PII, to minimize the risk of a leak, we should take the appropriate precautions such as:
- Security screenings
- Confidentiality agreements
Has a comprehensive security framework been developed, including administrative, physical, and technical procedures for addressing data security issues?
Your organization should policies in place that minimize cyber security threats. These might include:
- Minimum password requirements
- Antivirus software
- Regular software updates
- Staff training (e.g. phishing awareness, lock computer)
Does the organization regularly monitor or audit data security?
Every company should have a Security Operations Centre (SOC). The function of a SOC is to monitor, prevent, detect, investigate, and respond to cyber threats around the clock. In addition to monitoring using tools like Splunk, the SOC might conduct audits (e.g. penetration testing). These audits can be conducted automatically using software or manually by a certified professional.
Have policies and procedures been established to ensure the continuity of data services in an event of a data breach, loss, or other disaster?
Your organization should have an incident management playbook. The latter outlines the steps that need to be taken in the event of a data breach. These might include:
- Fix vulnerabilities that may have lead to the breach
- Stop additional data loss by immediately taking affected systems offline
- Notify the appropriate parties —In accordance with GDPR, all data breaches must be reported to the relevant supervisory authority within 72 hours
A designated incident response team should be responsible for taking the preceding actions. The team isn’t limited to the members of the SOC. It should include members from legal, human resources, public relations as well as external experts.
A data loss isn’t always caused by a malicious actor. It could also be the result of human error or a natural disaster. Backing up data on a regular basis is a reliable way to prevent data loss. When defining a back up policy, we need to consider the following:
- Recovery point objective: the maximum acceptable amount of data loss after an unplanned data-loss incident, expressed as an amount of time (e.g. can afford to lose the last hour’s worth of data).
- Recovery time objective: the maximum tolerable length of time that a system can be down after a failure or disaster
Are policies in place to guide decisions about data exchanges and reporting, including sharing data with educational institutions, researchers, policymakers, parents, and third-party contractors?
For instance, I worked for an organization where all customer data had to be shared using a designated application and not via email.
Whenever someone is coming in to work from the outside the organization, we must ensure they sign the document(s) outlining how the data should be handled. We may also require them to take training.