Last Updated on
With theaters and other public venues slowly reopening around the world, it is crucial that everyone adhere to mask-wearing to avoid further lockdowns. But how could we verify whether people really are following this rule? We could employ ushers or security personnel to act as enforcers — or we could build a computer vision application to check if everyone is wearing masks at all times, and which signals to us whenever someone isn’t.
This scenario is one of many examples of the uses of computer vision systems. In this article, we will show you how you can use APIs to embed computer vision in your own applications.
What is Computer Vision?
Computer vision is a machine-learning discipline that aims to teach computers to “see,” and crucially, to make sense of what they’re seeing. Of course, computers aren’t capable of human vision — but thanks to intricate algorithms, as well as large datasets and the computational power to process them, they can do a pretty good job at mimicking it.
How Does Computer Vision Work?
Virtually all contemporary computer vision applications are based on deep neural networks. Convolutional neural networks, which comprise a popular deep learning architecture, were designed specifically to process two- and higher-dimensional grid-like input, which is how images are represented digitally.
In fact, many students new to deep learning often start by building a network that recognizes low-resolution grayscale images of handwritten digits: the famous MNIST dataset.
What Are Some Applications of Computer Vision?
Optical Character Recognition (OCR)
Imagine you have hard copies of books you’d like to examine with computational methods. How do you make these texts processable? A technique that might save you hours of typing is optical character recognition (OCR). It converts an image (e.g. a scan from a book) to written text.
OCR is easily one of the oldest computer vision technologies; the first applications were developed by the beginning of the 20th century. Today, OCR is also available for handwritten language, the MNIST task is just one such example.
The advent of face detection and recognition is apparent from looking at how Facebook went from asking us to tag our friends in pictures, to completing the task on its own. A face-detection algorithm uses cues such as the presence of eyes, and a nose and mouth to determine whether it’s looking at a face.
While robust face-detection algorithms have been around for a while — the Viola-Jones algorithm being the most well-known — the task of facial recognition is much tougher, and one that presents considerable risk.
As an illustration of the current limitations of facial recognition software, an over-reliance on its accuracy by law enforcement has resulted in false charges against otherwise innocent citizens. Due to imbalanced datasets, this is a problem that has been reported to disproportionately put people of color at risk.
Remember how those ubiquitous CAPTCHAs went from scanned words to photos of fire hydrants and airplanes? Their purpose has shifted over the years, and the results of the new reCAPTCHA tasks are now used to collect data for automated driving or other autonomous systems. Correctly identifying objects is clearly a key skill for a machine to master before it can start roaming the streets.
What Are Computer Vision APIs?
API is short for application programming interface. It describes a point of interaction between programs. Nowadays, it’s often used to describe the entire product behind an API — but is really just the checkpoint through which two applications can talk to each other.
Why Use a Computer Vision API?
A successful machine learning application has two main components that can make or break it: code and data. In both cases, you don’t need to write or collect everything from scratch. Instead, you can use APIs. Especially in the case of data, large companies like Google or Facebook will most likely have more resources than you. So why not make use of them?
What are the Most Popular Computer Vision APIs?
With Google’s cloud-based API for computer vision, you can engage Google’s comprehensive trained models for your own purposes. It detects objects and faces out of the box, and further offers an OCR functionality to find written text in images (such as street signs). The API follows the REST standard, facilitating its integration into your product.
This is another example of a cloud-based API “as a service,” something which allows you to harness the power of large pre-trained models in your own applications. However, this one specializes in facial recognition. Not only can it detect people in images and videos, but it also classifies them by gender, age, ethnicity, and emotional state.
In addition to using Kairos as a service, you can host it on your own server and retain control over your data. Notably, the company has incorporated its own value-based approach to facial recognition into its identity. Its founder Brian Brackeen refuses to collaborate with government agencies, and openly discusses racial bias in computer vision datasets.
OpenCV is the go-to open-source library for computer vision tasks, with APIs in Python and C++, among others. Chances are that you’d use this API, and not one of the cloud-based services we mentioned earlier, in your initial personal projects. You can combine OpenCV with deep-learning libraries, available datasets or pre-trained computer vision models like YOLOv3 to build your own classifiers for object detection and other tasks.
How to Use a Computer Vision API
We’ll now use Google Cloud Vision AI API to detect faces in a photograph. The code is from this tutorial on the service’s website. In order to use it, you need to set up a Google account, activate the API, and add your billing information. This is the image we’re going to examine:
And here’s the code:
To create a copy of our image with the annotated faces, we’ll run the script from the Python IDE and get the following result:
Looks like we got exactly what we wanted!
In this article, we covered the use of APIs in computer vision, going over the technology’s applications, benefits, and pitfalls. We discussed some of the most popular computer vision APIs and went through a practical example together, using the Google Cloud Vision AI API to detect faces in a photograph.
If this type of work interests you, consider enrolling now in our Nanodegree program to gain new skills in computer vision!