close

Xailient Outperforms
YOLO - Case Study

Xailient Outperforms YOLOv3 in Object Detection by 98.7%

Synopsis

Xailient’s Detectum performs both Localization and Classification of objects in images and video and has been demonstrated to outperform the industry-leading YOLOv3 by 98.7%.

YOLO v3 is the leading AI architecture for computer vision. Traditionally, YOLO and object detection algorithms, in general, are run on Cloud infrastructure. Cloud costs, Bandwidth costs, and Network latency issues are driving industry to innovate ways to process Computer Vision at the Edge.

 

Key Outcomes

1Xailient has proven the Detectum software performs Computer Vision 98.7% more efficiently without losing accuracy.

2Customers can train the Detectum software with their training data, creating a unique solution.

3They can deploy it at the Edge, in the Cloud, or in a hybrid. This provides cost advantages and technical flexibility never before available to Computer Vision innovators.

Problem Statement

Computer Vision is rapidly becoming the primary format for computer input. Amazon Go is replacing checkout systems; Autonomous Cars are replacing drivers, Access Control is using Face Recognition, and security surveillance is increasingly automated and real-time. These innovations are held back by network latency, data transmission costs, and Cloud computation.
Traditional AI models, like YOLO v3, are computationally intensive and require powerful servers that are too large for Edge deployments. Industry efforts to produce compressed versions of AI that can run at the Edge have required significant accuracy trade-offs. Tiny-YOLO is a community-powered open-source project in that effort. The Detectum is Xailient’s answer to the challenge.

Activity

01

Using an open-source training dataset, Xailient trained both a traditional YOLO3 object detection neural network and a Tiny-YOLO using standard methods. These provided the baseline of state-of-the-art in both Cloud and Edge-optimized AI and measured the compromise in accuracy required in shrinking AI to fit at the Edge. The AI was trained to see cars, trucks, and pedestrians (the objects of interest).

02

Xailient trained 2 YOLO neural nets; a standard Cloud-based YOLO3 neural net and an Edge ready Tiny-YOLO neural net, and compared these against a Detectum neural net. All neural nets used the same training data and test data.

03

The images used in the test data were all different, with no data in common. The test data was not sequential frames from a related video.

04

The hardware used was the same in all tests. All neural nets were run on Google Cloud Platform and fed pre-annotated test data as input. The neural nets were measured for accuracy (mAP) and performance (inference time). The Baseline YOLO3 Cloud Baseline had an accuracy of 66.8% mean average precision (mAP), and Tiny-YOLO Edge Baseline had an accuracy of 40.6%.

Results

The results were unprecedented; Xailient achieved the same accuracy 76x faster than the Cloud Baseline and was 8x faster than the Edge Baseline without the accuracy penalty.

Xailient has proven a new way for Edge AI that does not require compromises in accuracy.

Next Steps

Until today, Computer Vision faced a fundamental barrier; more accuracy required exponentially more computation. Breaking through this frontier, the “AI Sound Barrier” will unlock an explosion in the adoption of Computer Vision by improving the profitability of everyday and innovative use cases.

Xailient has broken this barrier.

The Computer Vision market is expected to reach US$35 Billion by 2023, an analyst projection based on the old assumptions about cost structures. A 100:1 improvement in costs will liberate many business cases that were formerly uneconomical, catalyzing an unimagined explosion in the adoption and application of Computer Vision.

The Detectum Neural Networks work best when trained to the specific purposes for which they are deployed. Training data and configuration parameters can all be adjusted to fit to purpose. The results obtained in this pilot project, while dramatic, are just a starting place in how Xailient can help Computer Vision innovators.

Discussion

Significance – Former State-of-the-Art – The AI “Sound Barrier.”

In the highly-cited 2017 CVPR paper “Speed/accuracy trade-offs for modern
convolutional object detectors,” the authors outlined the relationship between accuracy and Inference time. While the nominal time required can be improved by adding computation hardware, it was thought, until now, that the cost of accuracy was fundamental: more accuracy needed more time.

Xailient has Broken this Barrier.

Xailient has broken the Accuracy-Computation barrier for the first time, delivering greater accuracy at faster speeds.

The images used in the test data were all different, with no data in common. The test data is not sequential frames from a related video. The hardware used was the same in all tests.

Commercial Impacts for Customers.

Xailient enables customers to deploy Computer Vision systems that are so efficient they can run at the Edge, even on existing hardware. Customers only transmit results and can build a Cloud-focused on their business logic.

The cost-savings impact of Detectum goes beyond a 76x improvement and can approach 100% reduction. For example, the Detectum can skip the transmission and computation cost of empty images. Customers eliminate waste and optimize accuracy, cost, and speed by focusing only on what matters, both across the view and across time.

When deploying new hardware, the Detectum software allows for smaller chips and can reduce capital expense by 90% or more. The Detectum is delivered by software update for customers with existing IP cameras — No hardware to purchase or deploy.

Every Xailient customer is different, and Xailient provides the ability to train a Detectum on custom datasets. Each resulting Detectum is unique; customers retain ownership of their training data, and Xailient does not distribute a custom Detectum to anyone else.

Press releases

March 14, 2022

Xailient specializes in extremely efficient low-power computer vision. Intel's OpenVINO specializes in maximizing the performance and speed of computer vision AI workloads. OpenVINO improved Xailient FPS 9.5x on Intel hardware to 448 FPS. Together, Xailient-Intel outperforms the comparable MobileNet_SSD by 80x. Even after Intel worked the OpenVINO magic on MobileNet_SSD, Xailient-OpenVINO is 14x faster.

November 29, 2021

Xailient’s Face Recognition enables high-speed edge AI processing with low-power consumption using Sony’s IMX500 – a chip so small it can fit on the tip of your finger.

Explore our blogs

We see things differently in the dynamic field of computer vision AI

Get started with Xailient

We empower companies to bring computer vision AI products to market
faster and with less investment

OnEdge Newsletter

A weekly newsletter with the best hand-picked resources about Edge AI and Computer Vision

OnEdge is a free weekly newsletter that keeps you ahead of the curve on low-powered Edge devices and computer vision AI.

 

You’ll get insights and resources into:

  • Edge computing use cases.
  • Market trends in Edge computing.
  • Computer Vision AI at the Edge.
  • Machine learning at the Edge.
Cookie Policy