Emergency Services
Data Science and AI

Generating Synthetic Face Data for Policing using Generative AI

Butterfly Data helped a UK police authority develop a synthetic face data generation tool using generative AI.
Synthetic

data generation using generative AI

Custom

prompt engine in Python

Systematic

testing for bias

Download case study

Get in touch

Want to find out more? Get in touch with our team today to learn more about how we could help your business.

Share

Challenge

The client - a UK police authority - needed a way to test facial recognition (FR) systems for bias and performance without relying on real-world images. Their goal was to create a solution that could generate synthetic facial data representing diverse demographics, enabling controlled and repeatable testing across a range of conditions.

Synthetic data was preferred as it eliminates personal data concerns, allows fine-tuned control over variables like age, ethnicity, and lighting, and supports the simulation of specific testing scenarios. This would enable a more rigorous and ethical evaluation of FR systems used in law enforcement settings.

Solution

To meet the client’s requirements, we built a synthetic face generation tool leveraging advanced computer vision techniques using generative AI. We used an open-source model as the foundation and built a custom prompt generation engine in Python to guide the creation of realistic, diverse mugshots.

The tool was designed to incorporate specific demographic characteristics and generate lifelike images, while reducing bias by avoiding stereotypes related to ethnicity, clothing styles, hairstyles, hair colour, and eye features. Using the prompt generator, we applied guided prompts to control demographic attributes and introduce variation in facial features and expressions, enabling the generation of faces tailored to the client’s needs and specific testing scenarios.

Impact

The project delivered a fully functional tool that enables the client to create large-scale, diverse synthetic facial datasets tailored to specific testing requirements.

The generated data enabled systematic testing of FR systems for potential bias, highlighting both the capabilities and current limitations of generative image technology.

We also recommended future enhancements to improve facial realism and demographic accuracy, and laid the foundation for ongoing research into ethical, AI-driven data generation in policing.

With this capability, the client can now iterate quickly, evaluate fairness in facial recognition models more effectively, and improve system reliability and public trust in FR technologies.

Ready to transform your data?

Book your free discovery call and find out how our bespoke data services and solutions could help you uncover untapped potential and maximise ROI.