Eye, Robot: A Guide to AI for Image Recognition

A robot hand points at a projection, next to a robot with a dot-matrix face.

What if I told you that, today, there are machines that can view the outside world in greater detail than you, a human? It’s true–as artificial intelligence has expanded in its scope and capabilities over the last century, it has brought us to a stage where machines can read images and the world around them just as well as, if not better than, we can.

In this article:

AI in image recognition: definitions

First of all, what is AI for image recognition? The way it works is that, through the use of cameras, algorithms, and machine learning programs, computers can “read” and recognize visual data, e.g. images and objects. The ultimate goal of this process is to give machines the ability to see the world as humans do or see it in even greater detail. It’s a subfield of a much larger area of artificial intelligence called computer vision, which includes using computers to process, classify and reconstruct images, among other tasks, all of which are interlinked. Image recognition is arguably one of the most important parts of computer vision, as it is the basis on which most other elements are built. Some examples you might have come across include:

  • IDing number plates,
  • telling copies of images from the real thing,
  • diagnosing illnesses;[1]

All of these, and more, make image recognition an important part of AI development. So, let’s dive into how it has evolved, and what its significance is today.

Like this article and want to know more? Check these out:

AI in image recognition: early days

The early 2000s saw the rise of what Oren Etzioni, Michele Banko, and Michael Cafarella dubbed “machine reading”. In 2006, they defined this idea of unsupervised text comprehension, which would ultimately expand into machines “reading” objects and images. 

What we recognize as image recognition arguably began that same year, when Fei-Fei Li, began the process of creating Imagenet, a huge database of categorized images that would, in theory, allow machines to learn the relationships between multiple objects. By 2010, over 3 million images were held on Imagenet, and 2010 saw the Imagenet Large Scale Visual Recognition Challenge, where teams of AI experts would compete to see whose work could make the best use of the database.

Graph comparing Alexnet’s 15.3% error rate in 2012 with the 25% average in 2012.

 In 2012, the event was won by Alexnet, created by a team from the University of Toronto, whose error rate was 15.3%, compared to over 25% for the rest of the participants in the first year. [2]This remarkable achievement arguably paved the way for other advances, which we’ll explore below.

In 2012, Andrew Ng and Jeff Dean at Google developed a neural network that could detect cat images without background context. The following year, Carnegie Mellon University created NEIL (Never Ending Image Learner), which bills itself as “[a]computer program that runs 24 hours per day and 7 days per week to automatically extract visual knowledge from Internet data.”.[3][4]

This sort of technology has now found homes all over the world, everywhere from renewable energy groups to passport photo checking tools (Passport Photo Online makes use of such technology to check your passport photos), making it a strong addition to the development of AI. 

In addition to editing passport photos, the ability of AIs to read images has allowed social media channels like Facebook, YouTube, and Instagram to censor pictures and videos that are inappropriate for general audiences, e.g. gory or explicit content. 

Graph showing that YouTube’s AI removed 70% of inappropriate videos before they got one view.

In the first quarter of 2019, YouTube’s AI alone succeeded in identifying and removing over 6 million videos. 70% of these were removed before receiving even a single view.[5]

In the second half of the 2010s, machine reading has taken on greater roles across all social media channels. Since 2015, Facebook has used AI to flag suicide or self-harm-related posts to provide help and, in 2017, YouTube began using AI to flag terrorism-related videos to block them from even being uploaded.

Graph showing that YouTube’s AI blocked 83% of extremist videos before the human team saw them.

In September of that year, YouTube’s AI blocked 83% of violent extremist videos before the human team even saw them.[6] The same month YouTube announced its initiative, Instagram began censoring the most explicitly hateful comments through AI, later expanding that to ask users if they were sure before submitting a potentially abusive comment.[7]

In addition, medical diagnostics have benefited from advances in this area. 

Graph showing that Etemadi’s AI correctly identified early-stage lung cancer 94% of the time.

An AI system created by Mozziyar Etemadi in 2019 proved itself by correctly identifying early-stage lung cancer 94% of the time, a higher score than 6 radiologists, all long-standing members of the field. Elizabeth Svoboda notes the importance of this advance, citing the fact that 70% of lung cancers are diagnosed too late for treatment. She declares that: “Using AI to find tumours early can effectively double the amount of time oncologists have to treat a patient.”[8]

AI in image recognition: a rapid rise

These image reading systems have been gradually developing over the first two decades of the 21st century. 

Graph showing the increase in object identification accuracy from 50% to 99%.

The world has seen a particularly rapid period of growth, with the accuracy of object identification increasing from 50% to 99% in less than a decade.[9] New AIs are benefiting from the image reading capabilities of existing products. NEIL was explicitly designed to be a continually growing resource for computer scientists to use to develop their own AI image recognition examples.

Graphic showing that NEIL studied 3 million images and identified 3000 relationships in four months.

Between its launch in July of 2013 and November of that same year, NEIL was able to study 3 million images and, through analyzing the content, learned 3000 relationships (e.g., “zebra can be found in Savanna”).[10]

Graph showing the relative accuracy of Google Vision, Amazon Rekognition and Microsoft Azure.

A study by Perficient Digital into four leading image recognition AIs tested their ability to recognize image tags, revealing that, in 2019, many AIs are coming closer to human levels of skill. Google Vision, for example, scored 81.7% accuracy on its test, which was only 6% behind the human control. Vision is no mere outlier, either, with Amazon Rekognition and Microsoft Azure also scoring highly (77.7% and 75.8%, respectively).[11]

Image recognition in AI: the future

Even with all these advances, we’re still only scratching the surface of what AI image recognition technology will be able to do. 

Graph showing computer vision market’s potential growth from $8.4 billion to $150.6 billion.

According to research by Market Watch, the global market for computer vision AI was valued in 2021 at $8.4 billion. They predict that this will reach $150.6 billion by 2030. That is an increase of 1792.86%, a truly enormous rise.[12] Verified Market Research gets a similar result, predicting that the global computer vision market size will expand from $7.04 billion in 2020 to $144.46 billion in 2028.[13] Allied Market Research is even more optimistic, predicting that the market size will increase to $207.09 by 2030.[14]

There are several potential reasons for this increase. Writing for Forbes, Naveen Joshi, Founder and CEO of the engineering company Allerin, says that: “Not only will computer vision technologies be easier to train but also be able to discern more from images than they do now.”[15] Facciolo argues that future innovation “relies on deep learning [algorithms]… [which] work…by building deep neural networks that simulate the mechanism of the human brain and then interpreting and analyzing data, such as image, video, and text.”[16] Sathish B, meanwhile, predicts that it is the widespread adoption of facial recognition tools that will bring it new success in the future, driven by “increasing usage of mobile devices and demand for strong fraud detection and prevention”.[17]

Having seen the rate at which NEIL has developed its knowledge, it’s logical to expect it (and similar databases) to help increase the rate of AI’s advancement. The original engineers and computer scientists who began to make image recognition AI had to start from nothing, but designers today have a wealth of prior knowledge to draw on when making their own AIs. After all, we’ve already seen that NEIL was originally designed to be used as a resource in this way. 

With AI in general likely to expand in scope over the next few years, it’s likely that image recognition AIs will be able to benefit from this. Steyn cites a survey from Forrester which states that over 80% of organizations expect their AI use cases to increase, which he argues will in no small part include computer vision.[18] We can see the potential for development by combining computer vision AI with other forms of AI. Outside of the world of business, Joshi suggests that image recognition software could be used, alongside natural language processing software to help visually impaired people by interpreting their surroundings for them.[19]

When asked about his opinion on the future of image recognition AI, Head of SEO in Passport Photo Online, Leszek Dudkiewicz, said: “AI’s ability to recognise objects in a photo opens up the potential for a wide range of different applications. In particular, the medical arena will benefit hugely from AI integration, for example, the creation of medical applications that can recognise skin changes, e.g. to spot melanoma in the early stages.” 

On the subject of how AI image recognition would affect Passport Photo Online, he added: At the moment, we use AI when we’re analysing the uploaded photos, but ultimately we would like to provide a more interactive experience where we can give hints at the stage of taking the photo, i.e. AI would have the ability to analyse, in real time, the way the user is positioning themselves in the photo.”

AI for image recognition: conclusion

22 years is a relatively short space of time, but we’ve seen huge leaps in image recognition technology during those two decades. With the aid of databases like NEIL and Imagenet, computer scientists have created a base from which every future image recognition AI system can be built and developed. The foremost thinkers in AI have gone from simplistic AIs that can identify objects, and the relationships between them, to more complex tools that can identify content in videos which means they should be blocked. 

Who knows where image recognition will go in the future? There are a number of possibilities, but really, the sky’s the limit. At Passport Photo Online, of course, we’re most grateful for our AI photo checkers – that’s what allows us to give you the best chance of getting your applications approved.


[1] H. Bhardwaj et al., ‘Principles and Foundations of Artificial Intelligence and Internet of Things Technology’, in G. Kaur et al. (eds.), ‘Artificial Intelligence to Solve Pervasive Internet of Things Issues’ (2021), pp. 377-392.

[2] ‘Image recognition: from the early days of technology to endless business applications today.’, Trendskout, https://trendskout.com/en/solutions-en/image-recognition-technology/ (Accessed: 26 April 2022).

[3] M. Rangaiah, ‘History of Artificial Intelligence’, Analytic Steps (2021), https://www.analyticssteps.com/blogs/history-artificial-intelligence-ai (Accessed: 21 April 2022).

[4] D. Ardila et al., ‘End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography’, Nature Magazine (2019), 25, pp. 954–961.

[5] K. Kaur, ‘The Politics of YouTube’s AI’, Towards Data Science (2019), https://towardsdatascience.com/the-politics-of-youtubes-ai-289148c14e38 (Accessed: 28 April 2022). 

[6] R. Bharadwaj, ‘AI for Social Media Censorship – How it Works at Facebook, YouTube, and Twitter’, Emerj (2019), https://emerj.com/ai-sector-overviews/ai-social-media-censorship-works-facebook-youtube-twitter/ (Accessed: 26 April 2022).

[7] K. Hao, ‘Instagram is using AI to stop people from posting abusive comments’, MIT Technology Review (2019), https://www.technologyreview.com/2019/07/09/65590/instagram-is-using-ai-to-stop-people-posting-abusive-comments/ (Accessed: 26 April 2022).

[8] E. Svoboda, ‘Artificial intelligence is improving the detection of lung cancer’, Nature.com (2020), https://www.nature.com/articles/d41586-020-03157-9 (Accessed: 25 April 2022) 

[9] ‘Computer Vision: What it is and why it matters’, SAS (2022), https://www.sas.com/en_in/insights/analytics/computer-vision.html (Accessed: 25 April 2022).

[10] Huffington Post UK Writers, ‘NEIL Never Ending Image Learner Computer Is Learning Common Sense’, Huffington Post (2013), https://www.huffingtonpost.co.uk/2013/11/26/neil-never-ending-image-learner-computer_n_4342688.html (Accessed: 26 April 2022).

[11] E. Enge, ‘Image Recognition Accuracy Study’, Perficient (2019), https://www.perficient.com/insights/research-hub/image-recognition-accuracy-study (Accessed: 25 April 2022).

[12] ‘AI in Computer Vision Market Revenue, Price, Growth Rate, Forecast To 2030’, Market Watch (2022), https://www.marketwatch.com/press-release/ai-in-computer-vision-market-revenue-price-growth-rate-forecast-to-2030-2022-04-13?mod=search_headline (Accessed: 26 April 2022).

[13] Verified Market Research, ‘AI in Computer Vision Market size worth $ 144.46 Billion, Globally, by 2028 at 45.64% CAGR: Verified Market Research®’, Verified Market Research (2021), https://www.globenewswire.com/news-release/2021/08/19/2283644/0/en/AI-in-Computer-Vision-Market-size-worth-144-46-Billion-Globally-by-2028-at-45-64-CAGR-Verified-Market-Research.html (Accessed: 27 April 2022).

[14] A. Savekar and V. Kumar, ‘AI in Computer Vision Market By Component (Hardware and Software), Function (Training and Interference), and Application (Industrial and Non-industrial), and End Use (Automotive, Consumer Electronics, Healthcare, Agriculture, Transportation & Logistics, Retail, Security & Surveillance, Manufacturing, and Others): Global Opportunity Analysis and Industry Forecast, 2021–2030’, Allied Market Research (2021), https://www.alliedmarketresearch.com/ai-in-computer-vision-market-A13113 (Accessed: 27 April 2022).

[15] N. Joshi, ‘The Present and Future of Computer Vision’, Forbes (2019), https://www.forbes.com/sites/cognitiveworld/2019/06/26/the-present-and-future-of-computer-vision/?sh=5813a00e517d (Accessed: 26 April 2022).

[16] C. Facciolo, ‘The future of image recognition technology is deep learning’, Technical.ly (2019), https://technical.ly/software-development/image-recognition-technology-artificial-intelligence/ (Accessed: 27 April 2022).

[17] S. B, “Future Impacts of AI on Image Recognition”, Tech Affinity (2021), https://techaffinity.com/blog/impact-of-ai-on-image-recognition/ (Accessed: 27 April 2022)

[18] N. Steyn, ‘The Future Is Computer Vision – Real-Time Situational Awareness, Better Quality and Faster Insights’, CIO (2022), https://www.cio.com/article/305671/the-future-is-computer-vision-real-time-situational-awareness-better-quality-and-faster-insights.html (Accessed: 26 April 2022).[19] N. Joshi, ‘The Present and Future of Computer Vision’, Forbes (2019), https://www.forbes.com/sites/cognitiveworld/2019/06/26/the-present-and-future-of-computer-vision/?sh=5813a00e517d (Accessed: 26 April 2022).