We decided to experiment with Google’s exciting new Cloud Vision tool by building a simple treasure hunt app. The results were both exciting and hilarious.
Just recently, Google released Cloud Vision API beta. This fancy tool enables developers to send image data to Google’s servers and then get analysis of the image’s content in return.
What exactly makes this cool? With Cloud Vision, a piece of software is able to recognise faces and expressions, people, animals, text, objects, landmarks and such. The tool is accurate to the point where, if you’re the kind of paranoid soul that worries about the science fiction -esque uprising of the machines, is almost a bit scary. Here’s a demo of the API used with a Raspberry Pi robot:
Inspired by this new service, our very own Joni Juup and Ossi Väre set out to perform an experiment with the API by building a simple treasure hunt application.
Treasure Hunting with images
Joni Juup: Cloud Vision seemed so awesome that we knew we wanted to try it out straight away by developing a small experimental project with it. The challenge was to design a concept that would be quite simple and fast to implement – yet it had to be something that could not be done without the image analysis technology.
I talked with Ossi and finally we came up with a concept for a simple treasure hunt game. The way it works is as follows: the player is presented with a set of random objects. The goal is to find those objects from one’s surroundings and take a photo of them. Cloud Vision then analyses the image and tells the player whether it found any of the objects in it or not. If all of the objects are not found in the photos within six tries, the game is over.
Ossi Väre: I decided to develop the application for iOS and with Swift, since I’ve been mostly developing for Android and haven’t yet got a chance to try Swift in a real project.
Developing with Cloud Vision API
OV: Since this was my first proper iOS project, I actually had more problems on that front than I did with the Cloud Vision API itself.
The API’s biggest weakness is, understandably, its occasional lack of accuracy.
Once I got to the point where I could take photos with the iPhone’s camera, it didn’t take long to set up the integration with the Vision API. I set up a Django -based server that relays annotation requests from the phone to the API, since Google has provided a Python library for easy authentication with their cloud services.
After that, the API itself was very easy to use. I just send in a JSON request with the image as a base 64 -encoded string and the options for which annotations I want. In our case, we ask for label annotations, i.e. we want the API to attempt to name the objects shown in the picture.
The API’s biggest weakness is, understandably, its occasional lack of accuracy. It works well enough that you can easily finish a treasure hunt in our application, but it does have problems recognizing certain objects. For example, we have yet to manage to have it successfully recognize a book, although it did call one a “display device”, which I guess is in a way technically correct. Taking photos around the office, many times I got back results saying we were either in a luxury yacht or in a Boeing 767 airliner. People working at their computers often got recognized as “control towers”, or – in one case – a “natural phenomenon”.
Potential usage of image analysis
JJ: Although this application was just a quick proof of concept we came up in a couple of days, it’s really fun and kind of magical to use (at least when the image analysis is spot on). With a little more development and a point-system we could have in our hands a fun tool for children and adults studying english vocabulary, for example.
I’m really excited to see what kind of opportunities services like this will open in near future, for both leisure and business applications.
Ladies and gentlemen, without further ado, here is our Treasure Hunt app in action: