Closing the Tax Gap for HMRC using a Data Quality Framework
Want to find out more? Get in touch with our team today to learn more about how we could help your business.
Challenge
In the 2020–2021 tax year, the UK government faced a staggering £32 billion tax gap, equating to the difference between taxes owed and taxes actually collected. A significant portion of this gap was attributed to poor data quality, limiting HMRC’s ability to detect anomalies, enforce compliance, and accurately forecast revenue.
The project with Butterfly Data was initiated to reduce this tax gap by addressing critical data quality issues across several major systems. However, the team encountered multiple challenges:
- Minimal visibility into existing data quality issues
- Undefined metrics for assessing or reporting data quality
- A poorly designed and inconsistently applied Enterprise Data Model (EDM)
- Requirement to use Data Management Studio (DMS) — an outdated and under-documented tool with limited capabilities
- Limited access to live data, delaying issue detection and remediation efforts
Solution
The team devised a scalable and efficient solution by leveraging a combination of SAS 9.4, Data Management Studio (DMS), and SAS Viya 3.5. To begin with, raw data was accessed through SAS 9.4 and strategically split into smaller, manageable segments. This approach helped to reduce the risk of system overload and ensured more efficient processing.
For data cleansing, the team developed custom and reusable jobs within DMS to apply data quality rules across multiple datasets. These rules were informed by previous work on a data quality dashboard project for the Metropolitan Police, allowing the team to quickly implement proven strategies in a new context.
Recognising the limitations of the existing Enterprise Data Model (EDM), the team designed a new multi-layered model structure. This included conceptual, logical, and physical data models (the latter assumed), which provided a clearer, more structured foundation for managing and analysing data quality.
To make quality more measurable, a scoring mechanism was introduced. This allowed both individual-level data issues and broader, system-level data quality scores to be captured and visualised, offering a clearer picture of where improvements were most needed.
Once the data had been cleaned and scored, it was uploaded to SAS Visual Analytics via the CAS (Cloud Analytic Services) engine. CAS enabled high-speed, in-memory computing, making it possible to visualise large volumes of data efficiently and responsively.
The final dashboard was designed around real user stories, ensuring that it would resonate with stakeholders and support actionable insights. This user-centric approach played a key role in driving engagement and building trust in the system.
To support future development, the team maintained an agile backlog, complete with effort estimates and thorough documentation. This ensured a clear path forward for ongoing improvement and facilitated a smooth handover to the next team.
Impact
Despite numerous constraints, the project successfully laid the foundation for a scalable, repeatable data quality framework:
- Developed a generic DMS job library that could be reused across datasets, increasing efficiency and reducing redundancy
- Created a processing framework that minimised server load by segmenting large datasets, ensuring consistent performance
- Delivered a visual analytics dashboard that made complex data quality insights accessible to non-technical stakeholders
- Enabled data-driven conversations with stakeholders, leading to better identification and prioritisation of data issues
- Established an agile backlog and documentation, setting up future teams for continued improvement and delivery
By using innovative workarounds such as dummy data for initial development, SAS Partner training, and internal upskilling, the team overcame access and tooling limitations, contributing to HMRC’s long-term goal of closing the tax gap through improved data integrity.
Client satisfaction guaranteed
Check out our related case studies.
Ready to transform your data?
Book your free discovery call and find out how our bespoke data services and solutions could help you uncover untapped potential and maximise ROI.
