Improving Data Quality for UK Government Organisation using SAS
of records processed annually
data feeds ingested and quality-assessed
engagement spanning several years
Want to find out more? Get in touch with our team today to learn more about how we could help your business.
Challenge
The government department ingests third-party data from multiple sources—including financial institutions, agencies and local authorities—alongside commercial and reference data and is responsible for assessing and remediating data quality issues before engaging with data suppliers to resolve them.
The requirement was to assess the quality of incoming submissions to prevent bad data from entering the pipeline and to enhance the quality of accepted submissions so that downstream consumers and systems would benefit from fewer ingestion issues, improved matching and analytics, and increased revenues from data exploitation.
Butterfly Data began by advising on platform and infrastructure before setting up environments and the route-to-live and onboarding developers and users. We then built the solution and have continued to support and maintain a live service ever since, delivering significant incremental improvements over several years.
The organisation selected SAS as its data quality vendor and required it to integrate with an existing Data Landing Service (DLS) and custom cloud infrastructure.
Each submission needed to be prepared, assessed, remediated and enriched before delivery to the organisation's primary data stores. Data also passed through a complex matching process, pairing incoming records against existing individual or business records already held.
Approximately 50 data feeds — spanning disparate file formats and structures, ranging from a handful of records to many millions, and from thousands of suppliers — meant quality varied greatly.
Further complexity came from a platform and infrastructure that were often unreliable, end users with limited SAS experience and highly varied skill sets, and an environment where a slow pace of change frequently required workarounds to meet delivery timescales.
Solution
During discovery, it became clear that the foundations were not adequately established to support a strategic solution. We drafted and oversaw the implementation of a low-level design to address this.
This enabled the establishment of a dedicated tenancy on the new SAS 9.4 Grid, spanning production, test, and development environments. We also led on the design and configuration of users, roles, clients, tool deployment and data access.
On this foundation, we built a bespoke, DLS-integrated, metadata-driven data quality service — primarily in SAS — along with innovative web-based reports and dashboards.
Data quality business rules were captured, documented and mapped to incoming submissions to enable automated processing. Initial rules focused on verifying field attributes such as lengths, datatypes and optionality and checking for non-allowable characters, pattern compliance and values against allowable lists. Over time, more complex logic to validate and enrich names and addresses was introduced.
Further enhancements introduced automatic remediation of common issues, alongside standardisation, verification, and enrichment using reliable reference data — resulting in a comprehensive data cleansing solution. Complementary dashboards and reports allowed non-technical users to view data quality issues and actions for each submission and make informed decisions about whether to ingest or reject.
Impact
Billions of records are now processed annually through the Data Quality Service, giving users visibility into data sources that were previously inaccessible with basic tooling.
Match rates and revenues have improved as a result of both the rejection of poor-quality data and the enhancement of accepted records, as well as the incorporation of new data sources.
Many unexpected issues requiring remediation were surfaced, prompting detailed discussions with data suppliers and improvements to the department's guidance documentation.
Efficiencies were gained through reduced manual handling, fewer human errors and fewer downstream pipeline issues.
The organisation is now engaging Butterfly Data for the next phase, which involves migrating to a modern technology stack and decoupling from the parent service to enable use across multiple departments and domains.
Client satisfaction guaranteed
Check out our related case studies.
Ready to transform your data?
Book your free discovery call and find out how our bespoke data services and solutions could help you uncover untapped potential and maximise ROI.



