Follow along: building an app that recognises beers 🍺
Follow me along the journey of building a mobile app that recognises different bottles of beers. This blogpost will be updated every time there is a breakthrough.
Follow me along the journey of building a mobile app that recognises different bottles of beers. This blogpost will be updated every time there is a breakthrough.
Introduction
So a year ago I created "beerificial intelligence". At the time I was really intrigued with the concepts of neural networks and wanted to see what I could make with it. The TensorFlow for Poets Tutorial explained how you could train a classifier to classify images of flowers. I went through the tutorial and after a few days I created the classifier, but with beers instead of flowers. I put it on my website (it's not available anymore) and people could upload pictures of bottles of beer, and the algorithm would guess which beer it was (it was trained on 8 types of beers). Pretty cool stuff!
This is how it worked back then:
The idea for this side-project is to create a mobile app that can classify beers as well. However, there are a few requirements:
- The app has to be native, so no cordova, react-native, etc. This will make the development much harder and longer as I've got almost no experience with Swift/Xcode and Kotlin/Android Studio. From a "business perspective" not the best choice, but it's not business, so... 😀
- The machine learning has to be performed on the device. No APIs, no whatever. The machine learning model will be placed on the device and every classification will be made locally on the device itself. This way I don't need a server, the app would work offline and there's no problems with privacy and image uploading to the server and stuff.
- The neural network has to know the difference between a Carolus Classic and Carolus Tripel 🍺
This project has three big components: The neural network, the iOS app and the Android app.
The neural network
What needs to be done:
- gather training images of the beers
- pick a neural network architecture
- train the neural network
- make sure it performs well (if it doesn't, go back to point 1)
I've gathered over 70000 images of 55 brands of beer, which is around 1300 images per brand. I've got no idea if this is nearly enough, but in the worst case I'll just start with only 10 brands, and obtain some more images for each of the 10 brands.
The most popular neural network architecture to use is Inception v4, so I'll start with this one and see if the performance on mobile is okay. It uses quite a lot of features so it might be to slow on the Android phone of the average Joe.
The model (with Inception v4) is trained on a DigitalOcean machine learning droplet. It's CPU only, which is okay for Inception v4, as we'll only retrain the last layer. If the performance is not okay, I'll probably go with AWS and use a GPU EC2 instance.
The iOS app
What needs to be done:
- create an app in XCode
- when the app opens, it should show the camera as the first screen
- user can take a picture
- when a picture is taken, we use the classifier to see which beer it is
- display the brand of the beer
Luckily, I have a Macbook Pro, so I can develop for iOS! Xcode is installed, and the app is made.
To make it somewhat custom, I decided to go with SwiftyCam to create a custom Camera.
The Android app
I'll start this after the iOS app is ready.
Update April 28, 2018
After brainstorming for a while about it, and the release of Tensorflow.js I decided to take a different route.
I bought a desktop with a 1080GTX TI GPU, so I can play around more with machine learning in my spare time. The only problem that's left now is finding good data.
As I've been thinking a lot about beerificial, I don't think there's enough value in just an app that allows you to classify around 15 types of beers. It's maybe fun for a minute, but that's it.
In order to make it more sticky, I decided to make it a game, and it'll work like this: When you open the web app (more about this later), you'll see 9 blurry beer bottles. Which beers are they? That's for you to find out by using the camera to scan beer bottles in your vicinity. To make it less random, there are hints you can get.
So, let's talk about this web app thing. Google recently launched Tensorflow.js, which allows developers to train models and use them with JavaScript. And where's JavaScript primarily used? In the browser! Do you see where it's going?
So instead of spending a lot of time coding the two apps, I can just make it as a web app in JavaScript. This means people won't have to download and install it, which makes the whole process smoother.