What Does Good Data Look Like? | Butterfly Data

Data Quality

What does good data
look like?

The quality of your data directly determines the quality of your decisions — and ultimately, your organisation's success. Discover the framework that separates trusted data from noise.

Explore the framework Our data quality services
20+
Years of data expertise
6
DAMA quality dimensions
B Corp
Certified consultancy
100%
Employee owned

Data is everywhere.
But is yours good?

Every organisation holds vast amounts of data — used, with varying degrees of effectiveness, to draw insights and guide decisions. Yet the value of those insights is only as strong as the quality of the data behind them.

The key challenge lies in identifying which data is essential to your organisation's operations. This begins with understanding the decisions that need to be made and the specific data required to support them — then defining minimum quality standards for each dataset.

As Butterfly Data has observed through extensive work in this area, poor data inevitably leads to poor decision-making. But the process of assessing data quality is itself valuable: it reveals where gaps exist, which formats work best, and what users truly need.

Poor data inevitably leads to poor decision-making. High-quality data is essential — but the process of assessing data quality across the organisation is equally valuable.

Butterfly Data · What Does Good Data Look Like? 2025

Six dimensions of data quality

The Data Management Association (DAMA) — now the UK government's recommended best practice — defines six dimensions for assessing data quality. Together they provide a comprehensive picture of what good data looks like.

01
Completeness

No fields are illegitimately missing across a dataset. Missing data can appear as truly empty fields or as placeholders such as null, N/A, or 0. Some blank fields are legitimate — a conditionally mandatory field only needs a value when another field triggers it.

Example

If 'UK Born?' is 'N', then 'Country of Birth' must be populated. If 'UK Born?' is 'Y', a blank country field is legitimate — not a quality failure.

02
Validity

Values are reasonable and conform to defined rules for that field — covering length, format, permitted characters, and allowed values. Rules can be externally defined (e.g. UK postcodes) or set internally by your organisation (e.g. department codes).

Example

A UK phone number must be exactly 11 digits, start with 0, and contain only numeric characters. Multiple date formats (21/12/1995, May 12 2005, 12-4-01) in one column signals a validity failure.

03
Consistency

Data aligns with other records within the same dataset or across different datasets. An address must correspond to its postcode area. A person's stated birthplace must not contradict their country of origin recorded elsewhere.

Example

A record showing 'UK Born?: Y' and 'Country of Birth: Germany' fails on consistency — Germany is not part of the UK, so the fields contradict each other.

04
Accuracy

The data reflects reality — the most important and most difficult dimension to assess. Accuracy can sometimes be verified through common sense checks, or by comparing against authoritative external sources such as Companies House or banking records.

Example

An adult patient's weight recorded as 50g is clearly inaccurate. A company name that doesn't match Companies House records may cause issues in tax administration.

05
Timeliness

Information is available when needed. Critical operational datasets may require real-time feeds; analytical datasets may only need annual refreshes. Data quality deteriorates as circumstances change — stale data can be as damaging as inaccurate data.

Example

A housing association allocating properties needs live data updates. Relying on ten-year-old census data to estimate current populations leads to poor planning decisions.

06
Uniqueness

No record is duplicated in a way that introduces conflicting information. Duplication goes beyond wasted storage — it creates unreliable, contradictory records. Unique identifiers are invaluable for detection; composite keys handle legitimate historical duplicates.

Example

A National Insurance number appearing twice with conflicting data likely means one record is outdated. An analyst must use capture date metadata to determine which entry is reliable.

Everyone who touches the data

Just as everyone is responsible for data security, everyone is responsible for data quality. If you work with data, you are responsible for assessing whether it is fit for purpose — and there must be organisational processes to address issues when they arise.

When should you assess?

In an ideal world, data is validated at collection, on load, and monitored continuously. In practice, a tiered approach works best:

  • Critical or regulatory data → assess monthly or quarterly
  • Valuable but non-critical data → assess annually
  • New data entering systems → validate at point of ingestion
  • After major organisational changes → reassess affected datasets

Recognise the warning signs

Poor quality data
  • Missing fields that shouldn't be missing
  • Multiple formats in the same column
  • Duplicated records with conflicting values
  • Values that simply don't make sense
Good quality data
  • Accurate and reflects reality
  • Consistent across datasets
  • Up-to-date and regularly maintained
  • Meaningful values for every field

How to improve data quality at source

1

Use dropdown forms for addresses

Replace free-text address fields with standardised dropdown or postcode-lookup inputs to eliminate format inconsistencies at the point of entry.

2

Offer calendar pickers for dates

A calendar selector removes the ambiguity of free-text date fields entirely — no more confusion between dd/mm/yyyy and mm/dd/yyyy formats.

3

Enforce mandatory field completion

Disable form submission until all required fields are complete, and provide clear inline validation guidance when validity criteria aren't met.

4

Standardise null representations

Establish an organisation-wide standard for empty or unknown values — differentiating between numeric and text fields — so missing data is always obvious and consistent.

5

Automate assessment of existing datasets

Large datasets require automated scripts or COTS tools managed by data quality experts. Start by identifying poor quality, then apply comprehensive remediation to bring data up to standard.

Start with what matters most

The MoSCoW method divides data needs into four categories: Must haves, Should haves, Could haves, and Won't haves. This allows critical datasets to be prioritised for assessment, while providing a hierarchy for less-essential data.

The most challenging aspect is agreeing on shared data standards. Best practice is to define organisation-wide baselines — covering formats for dates, addresses, country codes, and identifiers — while allowing teams to use more granular data where necessary.

The Data Management Association is the UK government's recommended best practice for data quality assessment and management. Their guidance provides a flexible, comprehensive framework for evaluating data quality across different datasets and contexts.

Whether you choose Commercial Off-The-Shelf (COTS) tools or a custom-built solution, implementation varies by organisational size and in-house technical capabilities. COTS tools reduce development time; bespoke solutions offer greater flexibility. Butterfly Data can help you evaluate both options.

Transform your data quality — and your decisions

Book a free discovery call with our experts. We'll help you understand where your data stands today and build a practical roadmap for improvement.

Book your discovery call
Email
hello@butterflydata.co.uk
Phone
+44 29 212 00140
Web
butterflydata.co.uk