Building AI vision from the ground up

Building AI vision from the ground up

Author :
Alex Virgona

With 100,000 hours of AI vision development time under its belt, Presien’s hindsight is 20/20. Head of Product, Alex Virgona, shares how the team has built an award-winning system from scratch. 

The Artificial Intelligence (AI) vision market is growing rapidly, driven by the increasing demand for automation, enhanced edge-to-cloud capabilities, advances in AI research, and more.

While the promise of AI vision is enticing (think enhanced machine functionality and recurring revenue opportunities), the technology is complex. Despite exciting advances in AI research over recent years, building and deploying a system that works robustly in the real world is still a significant challenge.

Over the past eight years, the AI team at Presien has channeled its experience in field robotics, sensing and perception, computer vision and machine learning to build a platform that’s winning industry awards and investment. 

So what have we learned along the way? Put simply, good AI vision performance is no easy feat. But with the right team behind you, the promise of object detection is possible. 

Start with defining the problem 

The first step in creating AI vision for heavy industry is to define the problem that needs to be solved. For us, it started with improving safety. Worldwide, there are around 374 million occupational accidents each year. Heavy industries disproportionately contribute to these statistics. To try and address this problem, we set off to develop a system that detects potentially dangerous situations and alerts workers.

From there, it was essential to understand the constraints of heavy industry. That's why our technology was built on real sites, using data collected over more than half a decade, to ensure real-world AI performance.  

Since embarking on our journey, performance has been central to all our decision-making. AI vision is one of the most cost-efficient and flexible technologies available to address the issue of safety in heavy industry. This combination makes it ideal for deployment at scale. In fact, as we move into custom solutions we're seeing even more examples of how AI vision will revolutionise heavy industry. 

Designing the perception pipeline

Next, we needed to ensure the perception pipeline was engineered to work on our chosen edge device. We’ve gone through this process multiple times and have gained significant experience integrating these into embedded systems. 

Both product requirements and hardware constraints need to be considered here. There are several different ways to predict what is happening in an image, including classification, object detection and segmentation. At Presien, we’ve found that object detection satisfies our most important requirements while also being able to be run quickly on edge hardware. How quick? Less than 200 ms from image to in-cab operator alert to be precise!

The detection pipeline includes several steps, including preprocessing and tracking, but at the heart of it is the DCNN (deep convolutional neural network) based object detector, which is able to detect the type, location and size of an object of interest. From several architectures available for object detection, we adapted one that can give us excellent accuracy at an acceptable frame rate and latency, while running on embedded hardware. While our system is world-leading, we are constantly reviewing the latest in object detection in our efforts to deliver the best possible solutions.

Collect data and keep it coming 

Data is the fuel that drives AI vision, and it’s crucial to collect high-quality, varied and accurately annotated data that is relevant to the problem being solved. For many companies, access to quality data is the first barrier to entry in building AI vision. Entry to dynamic environments like construction sites for example comes with a lot of red tape.

Presien was spun out of multinational construction and engineering company Laing O'Rourke, giving us access to hundreds of worksites, in addition to data retrieved from customer devices. From a data pool of 50 million frames, tens of thousands of images were carefully selected and manually annotated with bounding boxes and class labels (from approx. 50 classes). This training dataset allows the technology to learn what people, machines, vehicles and objects look like. It’s then able to detect them with a high level of accuracy, minimising false alarms and avoiding alert fatigue. 

Fast forward to 2023 and we now have hundreds of devices operating in the field. This is where our active learning pipeline comes in. Our technology reviews images provided by in-field devices and filters them for ones that cause uncertain or inaccurate detections. This data may be particularly useful for training. With tens of thousands of real-world images being processed by our Active Learning AI system every month, our model is continuously improving and adapting. Put bluntly, AI without high-quality training data is useless. 

Prepare, train, test. Repeat.

Data preparation is a key step to ensure the images and annotations are ready for training. This includes combining sub-datasets, grouping categories, filtering annotations and checking for invalid or corrupt data. We also employ a purpose-built blending process to increase the number of instances of specific objects that have caused false predictions in the past. 

Our model can then be trained on the prepared dataset, which is the process of teaching the AI model to recognize patterns in the data and make predictions based on those patterns. The resultant model goes through several stages of evaluation and testing: evaluation to compute metrics, regression testing, and QA testing. We have separate datasets for each of these.

We have invested a lot of effort into our world-class MLOps infrastructure. This platform automates all steps from preprocessing data and training to evaluating and testing. We use a combination of tools to structure, automate, track and version the entire process in order to minimise manual intervention and maximise accessibility and reproducibility.

Deploy the model

After the model has been trained and tested, it is ready to be deployed. Deployment involves updating the model in the existing system and sending it to customer units via an over-the-air (OTA) update.

Once deployed, the performance of the model is monitored to ensure that it is functioning as expected. This stage includes an initial testing process on select field devices.

Eye on the prize 

The right AI vision system can enhance OEMs technology leadership, machine functionality and recurring revenue opportunities. However, as outlined, AI vision is a complex field. 

There are many considerations at play - obtaining the all-important training data, ensuring it is prepared carefully and training a detection model that is refined over many deployment cycles. A system that is accurate, reliable and scalable requires expertise, plus the right infrastructure to produce models in an efficient and repeatable way.

If you want to turn vision into real-world intelligence with Presien, we’re ready to partner and share our experience.  

Put progress in plain view

Empower your team with smart insights that lift the standards of heavy industry. From safety, productivity and quality, we can build insights that change lives.