2025-02-28 | Quality Control

AI Inspection with 86% Cost Reduction Using On-Device AI

Since NPUs are generally slower than GPUs, running AI inspections in real time required model optimization and lightweighting. AHHA Labs applied its proprietary quantization technique to maintain model performance while ensuring real-time execution.

Challenge

  • Defects caused by external contaminants are unpredictable in shape, making rule-based algorithms ineffective—necessitating a deep learning model based on unsupervised learning.
  • Existing facility PCs lacked the GPU capability required to run deep learning models.
  • Client needed a real-time AI inspection solution that minimized both hardware modifications and investment costs.

Approach

  • Implemented an on-device AI solution.
  • Selected an NPU (Neural Processing Unit)—a cost-effective alternative to GPUs that could be installed in existing PCs.
  • Optimized the deep learning model for NPU execution using quantization, ensuring high efficiency and real-time performance.

Result

  • AI inspection was deployed at 86% lower investment costs compared to replacing the existing PCs.

Full Story

The Challenge of Detecting Unpredictable Contaminants
Manufacturing facilities employ protective measures to prevent external contaminants from entering production lines. However, complete prevention is impossible. Defects caused by foreign particles are particularly challenging, as their shape and location are highly unpredictable. Traditional rule-based inspection systems—which operate on predefined conditions—struggle to detect these anomalies effectively.

To address this, deep learning-based inspection was necessary, capable of identifying a wide range of irregularities beyond fixed-rule detection. Among various AI models, an Anomaly Detection model proved ideal, as it detects deviations from learned normal patterns.

Why Anomaly Detection?

  • Unsupervised learning model, requiring only normal data for training.
  • 75% faster deployment compared to other AI models.
  • Capable of identifying previously unknown defects, reducing undetected failures in a short time.

The Hardware Bottleneck: GPUs Were Not an Option

Running deep learning models typically requires GPU acceleration, as GPUs perform AI computations significantly faster than CPUs.

However, a major obstacle for our client was the PCs in the facility which lacked the necessary GPU capability. The AI solution typically requires high-performance GPUs—which in turn require PCI x8 or x16 slots. Unfortunately, the existing PCs had no slots available to add GPUs, meaning that implementing the solution would require replacing nearly 200 PCs, resulting in an astronomical investment cost.

computer motherboard pcie, industrial ondevice ai, npu chip

On-Device AI: A Cost-Effective Alternative

To overcome this challenge, AHHA Labs proposed an on-device AI solution, enabling AI-powered inspection without the need for costly GPU upgrades.

1. NPU-Based AI Acceleration

We selected an NPU (Neural Processing Unit) that could fit into the available x4 PCI slots of existing PCs. Unlike GPUs, NPUs are designed specifically for AI computations, offering:

  • Cost efficiency – significantly cheaper than GPUs.
  • Energy efficiency – lower power consumption.
  • Seamless integration – compatible with existing hardware.

2. AI Model Optimization with Quantization

Since NPUs are generally slower than GPUs, running AI inspections in real time required model optimization and lightweighting. AHHA Labs applied its proprietary quantization technique to maintain model performance while ensuring real-time execution.

What is Quantization?

Quantization reduces numerical precision to accelerate computation while minimizing memory and power consumption.

  • Standard deep learning models store weights as 32-bit floating points.
  • Quantization converts them to 8-bit integers, increasing speed while reducing memory usage.
  • However, finetuning is crucial to prevent accuracy loss—an area where AHHA Labs’ expertise comes into play.

By integrating an NPU-based accelerator and applying advanced quantization, we successfully developed a highly efficient on-device AI inspection system.

By integrating an NPU-based accelerator and applying advanced quantization, we successfully developed a highly efficient on-device AI inspection system. Image Credit: AHHA Labs

By integrating an NPU-based accelerator and applying advanced quantization, we successfully developed a highly efficient on-device AI inspection system. Image Credit: AHHA Labs

86% Cost Savings While Doubling Performance

As a result, the AI inspection system exceeded the required performance benchmarks, achieving defect detection in 100ms—twice as fast as the 200ms takt time requirement.

By leveraging on-device AI, the client was able to implement real-time AI inspection with 86% lower costs compared to a full-scale PC replacement. This success was made possible by a strategic combination of cost-effective AI acceleration (NPU) and lightweight deep learning models, proving that high-performance AI can be achieved without excessive investment.