What Does Good Data Look Like? | Butterfly Data

What does good data
look like?

The quality of your data directly determines the quality of your decisions. Discover the framework that separates trusted data from noise.

Explore the framework Our data quality services

20+

Years of data expertise

DAMA quality dimensions

B Corp

Certified consultancy

Vetted

Experienced team

The challenge

Data is everywhere.
But is yours good?

Every organisation holds vast amounts of data, used to draw insights and guide decisions. Yet the value of those insights is only as strong as the quality of the data behind them.

The key challenge is identifying which data is essential to your operations — understanding the decisions that need to be made and the data required to support them, then defining minimum quality standards.

Poor data inevitably leads to poor decision-making. But assessing data quality is itself valuable: it reveals where gaps exist, which formats work best, and what users truly need.

Poor data inevitably leads to poor decision-making. But assessing data quality is equally valuable in its own right.

Butterfly Data · What Does Good Data Look Like? 2025

The DAMA Framework

Six dimensions of data quality

DAMA — the UK government's recommended best practice — defines six dimensions for assessing data quality, together providing a comprehensive picture of what good data looks like.

Completeness

No fields are illegitimately missing. Missing data can appear as empty fields or placeholders like null, N/A, or 0. Some blanks are legitimate — a conditionally mandatory field only needs a value when another field triggers it.

Example

If 'UK Born?' is 'N', then 'Country of Birth' must be populated. If 'UK Born?' is 'Y', a blank country field is legitimate — not a quality failure.

Validity

Values conform to defined rules — covering length, format, permitted characters, and allowed values. Rules may be externally defined (e.g. UK postcodes) or set internally by your organisation.

Example

A UK phone number must be exactly 11 digits, start with 0, and contain only numeric characters. Similarly, multiple date formats (21/12/1995, May 12 2005, 12-4-01) in one column signals a validity failure.

Consistency

Data aligns with other records within the same dataset or across datasets. An address must match its postcode area. A person's birthplace should not contradict their country of origin elsewhere.

Example

A record showing 'UK Born?: Y' and 'Country of Birth: Germany' fails on consistency — Germany is not part of the UK, so the fields contradict each other.

Accuracy

The data reflects reality — the most important and hardest dimension to assess. Accuracy can be verified through common sense checks or comparison against authoritative sources such as Companies House.

Example

An adult patient's weight recorded as 50g is clearly inaccurate. Likewise, a company name that doesn't match Companies House records may cause issues in tax administration.

Timeliness

Information is available when needed. Critical datasets may require real-time feeds; others only annual refreshes. Data quality deteriorates as circumstances change — stale data can be as damaging as inaccurate data.

Example

Relying on ten-year-old census data to estimate current populations leads to poor planning decisions.

Uniqueness

No record is duplicated in a way that introduces conflicting information. Duplication creates unreliable records. Unique identifiers are invaluable for detection; composite keys handle legitimate historical duplicates.

Example

A National Insurance number appearing twice with conflicting data likely means one record is outdated. An analyst must use date metadata to determine which entry is reliable.

Who's responsible?

Everyone who touches the data

Just as everyone is responsible for data security, everyone is responsible for data quality. If you work with data, you must assess whether it is fit for purpose — and processes must exist to address issues when they arise.

When should you assess?

In practice, a tiered approach works best:

Critical or regulatory data → assess monthly or quarterly
Valuable but non-critical data → assess annually
New data entering systems → validate at point of ingestion
After major organisational changes → reassess affected datasets

Good vs poor quality

Recognise the warning signs

Poor quality data

Missing fields that shouldn't be missing
Multiple formats in the same column
Duplicated records with conflicting values
Values that simply don't make sense

Good quality data

Accurate and reflects reality
Consistent across datasets
Up-to-date and regularly maintained
Meaningful values for every field

Practical steps

How to improve data quality at source

Use dropdown forms for addresses

Replace free-text address fields with dropdown or postcode-lookup inputs to eliminate format inconsistencies at entry.

Offer calendar pickers for dates

A calendar selector removes free-text date ambiguity — no more dd/mm/yyyy vs mm/dd/yyyy confusion.

Enforce mandatory field completion

Disable form submission until all required fields are complete, with clear inline guidance when criteria aren't met.

Standardise null representations

Agree an organisation-wide standard for empty values — differentiating numeric and text fields — so missing data is always obvious.

Automate assessment of existing datasets

Large datasets require automated scripts or COTS tools. Identify poor quality first, then apply comprehensive remediation to bring data up to standard.

Start with what matters most

The MoSCoW method divides data into Must haves, Should haves, Could haves, and Won't haves — allowing critical datasets to be prioritised while less-essential data is queued.

The hardest part is agreeing shared standards. Define organisation-wide baselines for dates, addresses, country codes, and identifiers, while allowing teams to use more granular data where needed.

DAMA UK

The Data Management Association provides the UK government's recommended framework for data quality assessment and management.

Technical implementation

COTS tools reduce development time; bespoke solutions offer greater flexibility. Implementation varies by organisational size and capability. Butterfly Data can help you evaluate both.

Free download

Get the full whitepaper

Our 14-page guide covers the complete DAMA framework, worked examples, enrichment strategies, and a practical remediation roadmap. Written by Dr Kimberley Green, Data Analyst at Butterfly Data.