What It Really Takes to Build An AI System: It’s more complicated than many think

Private AI
Feb 17, 2021
Share this post
Sharing to FacebookSharing to LinkedInSharing to XSharing to Email

Today we live in a world of unprecedented open-source code where companies such as Google and Facebook have put their internal AI solutions in the public domain. As this was previously unheard of, there are now plenty of resources on how to quickly and easily build an AI system.

There’s a saying ‘the last 20% of the work takes 80% of the time’ and nowhere is that more true when dealing with an AI system.

A massive amount of work to develop real-world AI applications still remains. This is because the level of quality, reliability for production deployments, and the amount of work required is frequently underestimated...even by experienced developers and managers.

I once worked on a Traffic Sign Recognition (TSR) system for an automaker. It is also frequently used in the classic ‘Build vs. Buy’ debate. So, I will refer to this experience as it is parallel with the key reasons why it takes so much effort to build an AI system and why it’s a better choice to purchase an existing set-up.

Let’s dive in!

Edge cases...edge cases are everywhere

Data is the number one time and money consumer. You may not have known this, but it stems from the consistent underestimation of the complexity of the real world and how many edge cases there are for even the most simple tasks.

During my TSR project in Europe, we encountered all sorts of things such as LED highway signs. These signs in addition to looking completely different from normal signs, are difficult to capture with a camera (try filming a computer screen).

Even when signs are perfectly visible in good conditions, they can be tricky to identify amongst all the noise. For example, trucks in Europe have speed limit stickers on the back that are identical to roadside signs but that indicate how fast they are allowed to drive. Things also get tricky at highway intersections, as exit speed limit signs can be perfectly visible from the highway itself. And what if the sign is covered in snow, which just so happens to be the same colour as most traffic signs?

The complexity of the real world isn’t limited to Computer Vision, I recently wrote a complementary article on regexes in the real world.

A good dataset is hard to find

Lots of models are published and open-sourced, but the datasets they are trained on for production applications are usually kept under lock & key. Some data (like credit card numbers) are especially hard to obtain. In fact, a ‘data moat’ is the main competitive advantage of many AI companies.

But what about all of those juicy datasets researchers use, you might wonder? Unfortunately, production applications don’t match up neatly with research tasks. And even if they did, research datasets usually don’t allow for commercial use (e.g. ImageNet). It’s also common to have a lot of labelling errors in research datasets, preventing the development of high quality models. A good example is Google’s OpenImages object detection dataset. Consisting of 1.7 million images with 600 different classes labelled, it could be useful for training object detection models. Unfortunately, the training split has less than half the labels per image that the validation split does, which would imply that a significant number of examples aren’t labelled.

Datasets for TSR also fall prey to these issues. Freely available TSR datasets don’t allow for commercial use, contain too few examples to be of any real use, and are marred by significant labelling errors. Additionally, they only use examples captured in good lighting conditions in one country. And cars have a pesky habit of travelling into new jurisdictions with different traffic laws and different traffic sign designs.

Creating a custom dataset for an AI system is expensive and time-consuming

Why not create your own dataset for your AI system, you say? Well, let’s have a look at that. First step is to decide on labels/outputs and collect data, making sure every single edge case is captured. Then it’s important to make sure you have good validation and test sets that provide a reliable, balanced snapshot of your performance.

Next comes the data hygiene and formatting, which can take a lot of time. It’s very important to get this step right. Transformer models, for example, suffer a surprisingly large drop in performance when this step isn’t done correctly.

For most tasks, the data then needs to be labelled. For the projects I’ve worked on, we’ve always built our own labelling tool or modified open source tools, as existing out-of-the-box tools never quite suit the task at hand. You’ll also need data infrastructure to manage, version and serve your new dataset.

Next, you’ll need to involve some humans to annotate your dataset

If you’re lucky and can share the data outside your organization and your task doesn’t require too much domain knowledge, you might outsource annotation tasks. If not, it takes a ton of work to hire and manage your new team of annotators. In either case, annotator training can also be some work, as most tasks require some domain knowledge and are typically more complicated than clicking on objects in an image. And since turnover in this type of role is high, you can expect to find yourself on that hamster wheel more than you’d expect. One of the best ways to support your annotators is by having an annotation guide they can start with reading before you jump into the annotation and feedback training cycle. Creating the annotation guide itself is a lot of work, as many labels are ambiguous if not defined correctly, often an exhaustive list of examples must be included, along with a living FAQ section that has to be added to as you discover that more and more clarifications are needed to account for the variety of understandings that humans can about a single concept.

Finally, it’s important to verify your process to ensure it maintains a high quality of output

Annotators also need to label edge cases consistently for the model to work well. For example, at Private AI we’re frequently confronted with thousands of tiny questions on what constitutes sensitive information. For example, “I like Game of Thrones” probably isn’t going to identify someone, but “I like David Lynch’s 1984 rendition of Dune” narrows things down a bit.

In summary, whilst data annotators can be found quite cheaply, a large amount of valuable dev/management time is required to construct a dataset. As an alternative, you can go to services like Amazon’s Mechanical Turk to outsource part of the process. In my experience however, these services are quite expensive and don’t deliver high quality labels. On top of this, in real projects, the requirements/specifications usually change. This means going over the data multiple times as internal and external requirements (like data protection regulations) change.

The process of building a dataset for your AI system has also gotten harder over the last 5 years. The TSR project I worked on was pre-GDPR, and nowadays privacy is a must when collecting data.

Model Stuff

You’ve got your data. Now what?

Now we’ve arrived at the most visible part of the process: building the model. We can use the plethora of open-source solutions out there, but there’s typically a lot of work to be done fixing small bugs that impact accuracy, accounting for the large variety of possible real-world input types, ensuring the code works as well as it can given the new data and labels you’ve added, etc. A while back I wrote my own MobileNet V3 implementation, as none of the implementations I could find matched the paper — not even the keras-applications implementation. Similarly at Private AI, getting state-of-the-art models to run at 100% of their capacity has been a lot of work. You also need to make sure that the code allows for commercial use — this typically knocks out a lot of research paper implementations.

A production system frequently relies on a combination of domain-specific techniques to improve performance, which requires integrating a bunch of different codebases together. Finally everything should be tested, something that open-source code is usually light on.

After all, who likes writing tests?

Deploying your AI system

So you’ve gotten the data and you’ve built your model — now it’s time to put it into production. This is another area open-source code is usually light on, even though things have gotten significantly better in the past few years. If your application is to run in the cloud, this can be quite simple (just put your Pytorch model into a Docker container), but that comes with a caveat: running ML in the cloud can get really expensive. Just a few GPU-equipped instances easily cost tens of thousands per year to run. And you’ll typically run in a few different zones to reduce latency.

Things get significantly more complicated when integrating into mobile apps or embedded systems. In these situations you’re usually forced to run on CPU due to hardware fragmentation (I’m looking at you, Android) or compatibility issues. That TSR project I worked on required all code to be written according to a 30-year-old C standard and had to fit in just a few megabytes! The use of external libraries was also precluded due to issues surrounding safety certification.

In any case, model optimization is usually necessary. The trouble is that Deep Learning inference packages are at a much lower state of readiness and much harder to use than training tools such as Tensorflow or Pytorch. Recently I converted a transformer model to Intel’s OpenVINO package. Except Intel’s demo example no longer worked with the latest version of Pytorch, so I had to go into OpenVINO’s source code and make some fixes myself.

Real-world applications also involve more than just running an AI model. There’s normally a lot of pre- and post-processing required, all of which also needs to be productionized. In particular, integration in an application may require porting to the application language (like C++ or Java). On that TSR project, a large amount of code was required to match the detected signs together with the navigation map.

Finally it’s worth noting that people with expertise in this area are REALLY hard to find.

Ongoing Tasks

So, we’re at the finish line! Your application is now in production, doing its thing sorting/identifying/talking with widgets. Now comes the ongoing maintenance.

Like any piece of software, there will be bugs and model prediction failures. In particular (and despite your best efforts), there will be plenty of work to do in collecting the data needed to fill in the edge cases that were missed during the initial data collection phase. The world we live in isn’t static, so data needs to be continually collected and put through the system. A good example is Covid-19. Try asking any pre-2019 chatbot what that is.

Finally, whilst not strictly necessary, it’s good practice to periodically evaluate and integrate the latest research advancements.

So, that’s what it really takes to build an AI system

As you can see, it typically takes a team with diverse specialities such as data science, model deployment to build a complete system, and application domain expertise. There remains enormous demand for these skills in 2021, meaning that building up a team can be a very costly exercise. Complicating the matter further is staff turnover, which could mean that the system your company just spent a large amount of time & money building is suddenly unmaintainable, presenting a very real business risk.

So hopefully this helps you approach your ‘buy vs build’ decision armed with more info. It’s considerably more complicated than ‘oh lets get model X and switch it on’. I’ve seen firsthand and heard many accounts of companies not batting an eyelid at giving hundreds of thousands per year to Amazon/Microsoft/Google for cloud computing, despite 3rd party solutions offering a fraction of the total cost of ownership. If you decide to build yourself, make sure you have a lot of contingency! And consider all the costs like cloud compute, hiring & management.

And that TSR application? I can say I was quite proud of how well our system worked, but it required many, many decades of developer time to achieve.

Join Private AI for more discussions on the build vs. buying an AI system on LinkedIn, Twitter, and Youtube.

Data Left Behind: AI Scribes’ Promises in Healthcare

Data Left Behind: Healthcare’s Untapped Goldmine

The Future of Health Data: How New Tech is Changing the Game

Why is linguistics essential when dealing with healthcare data?

Why Health Data Strategies Fail Before They Start

Private AI to Redefine Enterprise Data Privacy and Compliance with NVIDIA

EDPB’s Pseudonymization Guideline and the Challenge of Unstructured Data

HHS’ proposed HIPAA Amendment to Strengthen Cybersecurity in Healthcare and how Private AI can Support Compliance

Japan's Health Data Anonymization Act: Enabling Large-Scale Health Research

What the International AI Safety Report 2025 has to say about Privacy Risks from General Purpose AI

Private AI 4.0: Your Data’s Potential, Protected and Unlocked

How Private AI Facilitates GDPR Compliance for AI Models: Insights from the EDPB's Latest Opinion

Navigating the New Frontier of Data Privacy: Protecting Confidential Company Information in the Age of AI

Belgium’s Data Protection Authority on the Interplay of the EU AI Act and the GDPR

Enhancing Compliance with US Privacy Regulations for the Insurance Industry Using Private AI

Navigating Compliance with Quebec’s Act Respecting Health and Social Services Information Through Private AI’s De-identification Technology

Unlocking New Levels of Accuracy in Privacy-Preserving AI with Co-Reference Resolution

Strengthened Data Protection Enforcement on the Horizon in Japan

How Private AI Can Help to Comply with Thailand's PDPA

How Private AI Can Help Financial Institutions Comply with OSFI Guidelines

The American Privacy Rights Act – The Next Generation of Privacy Laws

How Private AI Can Help with Compliance under China’s Personal Information Protection Law (PIPL)

PII Redaction for Reviews Data: Ensuring Privacy Compliance when Using Review APIs

Independent Review Certifies Private AI’s PII Identification Model as Secure and Reliable

To Use or Not to Use AI: A Delicate Balance Between Productivity and Privacy

To Use or Not to Use AI: A Delicate Balance Between Productivity and Privacy

News from NIST: Dioptra, AI Risk Management Framework (AI RMF) Generative AI Profile, and How PII Identification and Redaction can Support Suggested Best Practices

Handling Personal Information by Financial Institutions in Japan – The Strict Requirements of the FSA Guidelines

日本における金融機関の個人情報の取り扱い - 金融庁ガイドラインの要件

Leveraging Private AI to Meet the EDPB’s AI Audit Checklist for GDPR-Compliant AI Systems

Who is Responsible for Protecting PII?

How Private AI can help the Public Sector to Comply with the Strengthening Cyber Security and Building Trust in the Public Sector Act, 2024

A Comparison of the Approaches to Generative AI in Japan and China

Updated OECD AI Principles to keep up with novel and increased risks from general purpose and generative AI

Is Consent Required for Processing Personal Data via LLMs?

The evolving landscape of data privacy legislation in healthcare in Germany

The CIO’s and CISO’s Guide for Proactive Reporting and DLP with Private AI and Elastic

The Evolving Landscape of Health Data Protection Laws in the United States

Comparing Privacy and Safety Concerns Around Llama 2, GPT4, and Gemini

How to Safely Redact PII from Segment Events using Destination Insert Functions and Private AI API

WHO’s AI Ethics and Governance Guidance for Large Multi-Modal Models operating in the Health Sector – Data Protection Considerations

How to Protect Confidential Corporate Information in the ChatGPT Era

Unlocking the Power of Retrieval Augmented Generation with Added Privacy: A Comprehensive Guide

Leveraging ChatGPT and other AI Tools for Legal Services

Leveraging ChatGPT and other AI tools for HR

Leveraging ChatGPT in the Banking Industry

Law 25 and Data Transfers Outside of Quebec

The Colorado and Connecticut Data Privacy Acts

Unlocking Compliance with the Japanese Data Privacy Act (APPI) using Private AI

Tokenization and Its Benefits for Data Protection

Private AI Launches Cloud API to Streamline Data Privacy

Processing of Special Categories of Data in Germany

End-to-end Privacy Management

Privacy Breach Reporting Requirements under Law25

Migrating Your Privacy Workflows from Amazon Comprehend to Private AI

A Comparison of the Approaches to Generative AI in the US and EU

Benefits of AI in Healthcare and Data Sources (Part 1)

Privacy Attacks against Data and AI Models (Part 3)

Risks of Noncompliance and Challenges around Privacy-Preserving Techniques (Part 2)

Enhancing Data Lake Security: A Guide to PII Scanning in S3 buckets

The Costs of a Data Breach in the Healthcare Sector and its Privacy Compliance Implications

Navigating GDPR Compliance in the Life Cycle of LLM-Based Solutions

What’s New in Version 3.8

How to Protect Your Business from Data Leaks: Lessons from Toyota and the Department of Home Affairs

New York's Acceptable Use of AI Policy: A Focus on Privacy Obligations

Safeguarding Personal Data in Sentiment Analysis: A Guide to PII Anonymization

Changes to South Korea’s Personal Information Protection Act to Take Effect on March 15, 2024

Australia’s Plan to Regulate High-Risk AI

How Private AI can help comply with the EU AI Act

Comment la Loi 25 Impacte l'Utilisation de ChatGPT et de l'IA en Général

Endgültiger Entwurf des Gesetzes über Künstliche Intelligenz – Datenschutzpflichten der KI-Modelle mit Allgemeinem Verwendungszweck

How Law25 Impacts the Use of ChatGPT and AI in General

Is Salesforce Law25 Compliant?

Creating De-Identified Embeddings

Exciting Updates in 3.7

EU AI Act Final Draft – Obligations of General-Purpose AI Systems relating to Data Privacy

FTC Privacy Enforcement Actions Against AI Companies

The CCPA, CPRA, and California's Evolving Data Protection Landscape

HIPAA Compliance – Expert Determination Aided by Private AI

Private AI Software As a Service Agreement

EU's Review of Canada's Data Protection Adequacy: Implications for Ongoing Privacy Reform

Acceptable Use Policy

ISO/IEC 42001: A New Standard for Ethical and Responsible AI Management

Reviewing OpenAI's 31st Jan 2024 Privacy and Business Terms Updates

Comparing OpenAI vs. Azure OpenAI Services

Quebec’s Draft Regulation Respecting the Anonymization of Personal Information

Version 3.6 Release: Enhanced Streaming, Auto Model Selection, and More in Our Data Privacy Platform

Brazil's LGPD: Anonymization, Pseudonymization, and Access Requests

LGPD do Brasil: Anonimização, Pseudonimização e Solicitações de Acesso à Informação

Canada’s Principles for Responsible, Trustworthy and Privacy-Protective Generative AI Technologies and How to Comply Using Private AI

Private AI Named One of The Most Innovative RegTech Companies by RegTech100

Data Integrity, Data Security, and the New NIST Cybersecurity Framework

Safeguarding Privacy with Commercial LLMs

Cybersecurity in the Public Sector: Protecting Vital Services

Privacy Impact Assessment (PIA) Requirements under Law25

Elevate Your Experience with Version 3.5

Fine-Tuning LLMs with a Focus on Privacy

GDPR in Germany: Challenges of German Data Privacy (Part 2)

Comply with US Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence using Private AI

How to Comply with EU AI Act using PrivateGPT