Metadata lifecycle management guideline
Final | October 2019 | v1.0.0 | OFFICIAL - Public |QGCIO
Introduction
Purpose
A Queensland Government Enterprise Architecture (QGEA) guideline provides information for Queensland Government agencies on the recommended practices for a given topic area. Guidelines are generally for information only and agencies are not required to comply. They are intended to help agencies understand the appropriate approach to addressing a particular issue or doing a particular task.
This document provides guidance to Queensland Government agencies which rely on metadata to facilitate effective data management, discovery and use. The guideline is structured around the typical phases of the information lifecycle and provides a recommended approach for managing metadata throughout each phase. As well as defining what metadata is and why its effective management is important, this guideline describes details of typical activities, expected outcomes and relevant stakeholders associated with each lifecycle phase to assist agencies to better manage their metadata holdings.
The intent of this document is to establish a point of reference from which agencies can formally develop specific policies, standards and procedures which meet agency business requirements for managing metadata throughout the lifecycle.
Audience
This document is primarily intended for:
- Metadata creators
- Information asset custodians
- Information management specialists
- Record keepers and archivists
- Enterprise architects
- Information architects
- Business analysts
- System administrators
- Data analysts and researchers
- Website developers
- Metadata consumers
Scope
In scope
All metadata currently produced, collected, managed, stored, published or shared by Queensland Government departments, either by manual or automated processes.
Out of scope
The following are out of scope of the current guideline:
- specific guidance on the selection of appropriate metadata schemas for particular subject domains
- this guideline does not provide recommendations regarding the expected useful life of metadata.
Background
Along with other data and information, it is important that agencies actively manage metadata throughout its lifecycle in order to maintain relevance, accuracy and currency. As increasingly large amounts of data and information are produced and collected, effective metadata management helps to ensure the ongoing ability of agencies to understand, process, integrate, maintain and manage their data, systems and workflows. Metadata documents agency knowledge about its data and provides a consistent reference source to help users understand what data the agency holds, what that data represents and how it can be used.
Because metadata plays a vital role in relation to data and information discovery and use, it is important agencies put strategies in place to ensure metadata is managed throughout its lifecycle. Without metadata management, there can be no data management which will negatively impact on the ability of an agency to effectively use and reuse its data and information.
As Queensland government agencies create, collect, use and store increasingly large amounts of data, the role of metadata and its effective management, becomes increasingly important.
Metadata fundamentals
Metadata is often defined as data about data however in an increasingly complex data landscape, taking such a narrow approach to defining metadata risks reducing its perceived value to both business and technical users.
In addition to simply describing data, metadata also assist users (either human or machine) to understand business and technical processes, constraints on data usage, data quality, data security and data lineage. According to the Data Management Body of Knowledge (DMBoK) metadata describes:
- the data itself (e.g. data elements, data models)
- the concepts represented by the data (e.g. business processes, technological infrastructure)
- the connections between the concepts and the data (e.g. relationships)
Therefore, metadata has the potential to provide an agency not only with an understanding of its data, but also its systems and its workflows, which is essential for both effective data management and use.
Without reliable metadata, an agency will struggle to effectively manage its information. Metadata provides the means for an agency to identify what data it has, where the data originated and how it flows through systems. It helps to define data quality, determine access rights and provides the organisational context to enable data to be located from a variety of starting points.
A metadata schema defines a comprehensive set of metadata elements for a dataset including any required fields, field types, definitions and data structures. Schemas provide procedural rules which ensure a standardised approach to both metadata creation and use, which in turn facilitates discoverability, interoperability and access.
There are many well established schemas which have been endorsed as standards for certain types of data and disciplines such as library science, education, archiving, e-commerce and the arts. Some examples of standards include DCAT for Open Data or ANZLIC for geospatial data.
A data dictionary contains field level definitions of the data elements of a database or metadata schema. It provides both users and creators additional context about the information each field can and should contain and is useful to maintain both system integrity and consistency. A data dictionary may also contain information such as data relationships, origins, usage and formats.
Metadata can be applied at a number of different levels. For example, metadata may be applied to:
- Component parts of a data object (sub-item level) such as scenes in a movie, images on a webpage, chapters in a book or tables in a relational database.
- A data object (item level) such as a book, a spreadsheet, a record or a webpage.
- A group of data objects (collection level) such as a library or archival collection, a database or an information asset.
Agencies should determine the most appropriate level/s for the application of metadata, in line with their business requirements, user needs and technical capabilities. This should be performed with an understanding of agency responsibilities to share and publish metadata about their services, data and information assets.
Metadata is often categorised into various types which can help users understand the nature of information metadata contains and the functions it serves. Some types of metadata are categorised according to where the metadata originates (e.g. business, technical, operational), whereas some metadata is categorised according to how it is used (e.g. descriptive, structural, administrative).
Business metadata
Business metadata provides a business context to other data and therefore typically uses non-technical language to define data concepts, subjects, entities and attributes. Business metadata describes the content of a data asset and how to locate it in plain English and tends to be less structured than technical metadata. This category of metadata may include information which is useful for business decision making (such as business requirements, process flows and business operations) framed in terms that are relevant to the business. Business metadata is of particular value to business users but may also be used by technical and operational staff. Business metadata may include:
- Business rules
- Data quality rules
- Business terms dictionary
- Data governance and data lineage
- Data stewards/owners
- Value constraints
- Security/privacy constraints
- Data usage notes
Technical metadata
Technical metadata provides context around the technical (or internal) details of data including systems, process and data movements basically, the digital characteristics of a data asset or how systems function. This category of metadata provides additional information about data structure and storage as well as the applications and process used to manipulate the data. Technical metadata is useful for digital object management and operability because it describes the form and structure of a data asset including the size and structure of the data. Technical metadata may include:
- Database column and table names
- Column properties
- Access permissions
- File format schema definitions
- Data lineage (upstream and downstream) documentation
- Recovery and backup rules
- Data access rights, groups and roles
Operational metadata
Operational metadata contains information regarding the processing and accessing of data, including data lineage, quality and provenance. This type of metadata describes the processes and events that occur within operational systems and the data objects which are affected. It is useful for tracking access and use of the data and may also identify how often the data is updated or refreshed. Operational metadata may include:
- Details of batch programs of job logs
- Results of audits
- Error logs
- Patches and version maintenance
- Data archiving and retention rules
- Technical roles and responsibilities
- Data sharing rules and agreements
Descriptive metadata
Descriptive metadata describes an asset for the purpose of identification and retrieval and is therefore useful for discovery, assessment and identification. This type of metadata underpins the ability of users to browse, search, sort and filter information, and is typically produced by content creators using standardised attributes (e.g. title, abstract, author, keywords, unique identifiers) to describe assets. Descriptive metadata may include:
- Catalogue records
- Finding aids
- Differentiations between versions
- Specialised indexes
- Curatorial information
- Hyperlinked relationships between resources
- Annotations by creators and users
Structural metadata
Structural metadata describes how the content of an asset can be used, reused and combined to form new assets and is therefore useful to match content with the precise needs of users. It describes how objects are organised and relationships within and among resources and their component parts (e.g. whether an asset is part of a single or multiple collection, number of pages per chapter, the structure of database objects). Structural metadata facilitates navigation and display of digital objects as well as helping to describe the relationship between two objects. Examples of Structural metadata include:
- Table of contents
- Chapters and parts
- Indexes
- Page numbers
- HTML Tagging
Administrative metadata
Administrative metadata is used to help manage a resource throughout its lifecycle and may include technical information (e.g. file type, format, encryptions keys, passwords), preservation information (e.g. data refresh details, documentation of physical condition) and rights information (e.g. related to intellectual property and licensing). Examples of administrative data may include:
- Creative commons licence
- Permissions management
- Acquisition information
- Rights and reproduction tracking
- Documentation of legal access requirements
- Location information
- Selection criteria for digitisation
Metadata management
To be data driven, an organisation should manage its data with metadata, and that metadata itself should in turn be managed. Regardless of types and uses of metadata, metadata management decisions should be driven by business requirements. All agencies will have different business drivers for managing metadata, which may vary across the organisation and across data repositories. Some common business drivers for implementing a consistent and structured approach to metadata management may include:
Understanding and describing existing data holdings
As well as providing a rich source of information about the context, history and origin of a data asset, metadata can also be used as a tool to help identify, locate and catalogue existing agency data holdings. Metadata is also useful in helping people from different parts of the organisation to identify differences and similarities between data assets. Understanding what data your agency holds is crucial to being able to use it efficiently and is also the first step to implementing an effective data governance strategy.
Facilitating data use and re-use
In order to use data appropriately and effectively, it must first be understood. Because metadata should accurately and consistently represent the content of data, it provides users with a level of confidence and understanding regarding both what the data is, and what it can be used for. Metadata can also help identify and enable multiple uses for the same data, such as strategic information within an agency, or information sharing between agencies.
Effective data governance
Data governance is concerned with maximising the value of data by exercising authority and control over data management practices. Effective data governance is underpinned by a consistent approach to metadata which promotes efficiency as well as knowledge of where data is located, what it means and what protections it requires. Metadata plays a critical role in relation to data governance, because it is the key to describing an organisations data and business processes, as well as their relationship to each other.
Increased confidence in data quality
Data quality is highly dependent on data governance, which in turn depends on effective metadata management. Because metadata describes data elements in terms of a controlled vocabulary (or data dictionary), it provides structure and consistency to those creating metadata as well as confidence to consumers regarding how the data can be used and whether it is fit for the intended purpose.
Enhanced discoverability
Metadata management can aid discoverability of data both within and between agencies by ensuring that it is described accurately, consistently and completely.This allows potential users, whether internal to the agency, external to the agency or a member of the community to discover, understand and request access to the data they require. If compiled into a data catalogue at the dataset level, metadata can act in a similar way to a library catalogue, allowing potential users to understand all relevant information (update frequency, security classification, licensing conditions etc) required to access and use the desired information.
Supporting data analytics
Precise data analytics relies on data which is both accurate and appropriate for the task at hand. Reliable and complete metadata, including consistent definitions of data elements, provides a level of confidence to those undertaking data analytics activities that the data they are analysing is fit for the intended purpose. Metadata provides a level of assurance to data analysts that the data they are using is not incorrect, out of date or unreliable.
Enabling compliance
Reliable and well managed metadata can help to ensure regulatory compliance in relation to agency specific legislation as well as the Information Privacy and the Right to Information Acts. Metadata can help to ensure that private data is adequately protected, and that information requested through the RTI process can be readily located within the designated timeframes. Effective metadata management can also assist agencies to meet the requirements of a range of other QGEA policies such as Information access and use (IS33), Information security policy (IS18:2018), the Queensland Government Information Security Classification Framework (QGISCF) and the Records governance policy.
Improving operational efficiency
Ensuring the effective management of metadata has the potential to produce a range of operational efficiencies such as streamlined workflows and improved communication particularly between data consumers and IT professionals. In addition, metadata management may facilitate the identification of redundant data and processes, reduce the amount of money spent on data storage and support better data driven decision making within agency business units.
As well as facilitating the realisation of the business benefits outlined above, effective metadata management can also help agencies avoid some of the risks associated with poor data management. These may include:
- Errors in judgement due to incorrect or incomplete knowledge about the data
- The inadvertent exposure of sensitive data or data misuse
- Loss of organisational knowledge about agency data due to lack of documentation
- Reliance on old or obsolete data
- Increased cost of data storage and management due to duplicated or redundant data
- Lack of consumer confidence in data due to incomplete or conflicting metadata
- Doubt about the reliability of data and/or metadata
- Poor decision making or increased time for decision making
As with other types of data and information, it is important that metadata is appropriately managed to ensure its ongoing relevancy and usefulness. Because it makes sense to manage metadata in association with the data and information it describes, the phases of the information asset lifecycle will be used as the basis for outlining metadata lifecycle management activities.
The objective of information asset lifecycle management is to optimise information asset acquisition, maximise the use of the information asset and reduce associated service and operational costs. Similarly, the objective of metadata lifecycle management is to optimise understanding of data and information, maximise its appropriate use and reuse and reduce agency time and effort in relation to managing, locating and understanding data and information holdings.
The lifecycle demonstrates typical activities and key business objectives of metadata management as it relates to the information lifecycle, from defining a metadata strategy through to archiving or disposal of metadata as required.
An overview of the metadata management activities conducted in each phase of the lifecycle is described in Table 1. These activities can be applied to all types of metadata, however as stated in the Metadata management principles, not all metadata is of equal value, and therefore agencies are encouraged to take a value-based approach to metadata management. This allows effort and energy to be focused on the management metadata associated with data and information which has the most business value to the agency, the Queensland Government and ultimately the people of Queensland.
Table 1 contains typical activities required to manage metadata throughout the information lifecycle. These activities are outlined in association with expected outcomes of the management activities and details of the potential stakeholders who may be involved.Agencies should develop formal processes, procedures and training to ensure that metadata lifecycle management activities can be effectively executed and controlled to meet business requirements.
Lifecycle | Activities | Outcomes | Stakeholders |
---|---|---|---|
Plan |
|
|
|
Lifecycle phase | Activities | Outcomes | Stakeholders |
---|---|---|---|
Construct, create, acquire |
|
|
|
Lifecycle phase | Activities | Outcomes | Stakeholders |
---|---|---|---|
Commission, organise, store |
|
|
|
Lifecycle phase | Activities | Outcomes | Stakeholders |
---|---|---|---|
Access |
|
|
|
Lifecycle phase | Activities | Outcomes | Stakeholders |
---|---|---|---|
Use |
|
|
|
Lifecycle phase | Activities | Outcomes | Stakeholders |
---|---|---|---|
Assess | In conjunction with usage and in accordance with your data and information management strategy assess:
|
|
|
Lifecycle phase | Activities | Outcomes | Stakeholders |
---|---|---|---|
Maintain | Based on the assessment phase, apply appropriate management strategies. These may include:
|
|
|
Lifecycle phase | Activities | Outcomes | Stakeholders |
---|---|---|---|
Retire |
|
|
|