Deep learning has become the technology du jour of late and few companies have advanced the field as much across as many areas or integrated the technology as completely into their operations as Google and its Alphabet affiliates. In keeping with Google’s push to externalize its innovations, the company’s Next ’17 cloud conference featured a number of AI-related announcements and a general theme of democratizing access to the world’s most powerful deep learning systems.
In recent years Google and its sister companies have become synonymous with advancing the AI revolution at a frenzied pace and infusing deep learning across the company’s services. Perhaps most famously, last year Deep Mind’s AlphaGo became the first machine to beat a top Go player, while Waymo’s driverless cars have become symbols of the autonomous driving revolution. But, it has been the quiet AI revolution, shaping everything from Google Translate to Google Search that has had the greatest impact on Google itself, bringing the power of automated reasoning to bear on almost everything the company does. As it has built up the massive infrastructure to train and run these AI systems, Google has begun bringing these same tools to the masses.
Some companies have built their own AI research units and need to build highly customized models for specific applications. Yet, in doing so they quickly run up against the immense hardware requirements of building large deep learning models, often requiring entire accelerator farms for rapid iteration. In Google’s case it offers a hosted deep learning platform called Cloud Machine Learning Engine that takes care of the hardware needs of deep learning development, allowing companies to focus on building their models and offload the computing requirements to Google. After all, few companies have invested so much in AI that they have built their own custom accelerator hardware like Google did with its Tensor Processing Units (TPUs).
Of course, while algorithmic and hardware advances play a significant role in the AI revolution, it is difficult to make true progress in the field without data. Current AI systems require vast volumes of data to learn a new concept. Whereas a human can see a single picture of a new object and instantly recognize it from there forward, a similar AI system requires a tremendous corpus of images depicting the object from many angles to properly build a robust internal representation of it. This means that companies like Google have a significant advantage in being able to muster hundreds of millions of images to build a visual representation of the planet for applications like geolocation.
In short, the deep learning revolution is powered by data and few companies have as much data as Google. This means that when it comes to deep learning systems, it is easy to find tools, but hard to find pretrained models that you can actually use. I personally experienced this in my pursuit of a system that was robust enough to catalog global news imagery – after trying countless systems over the last several years, I found many that offered some really incredible technology, but none that offered rich prebuilt cataloging with tens of thousands of labels and which worked well on imagery sourced from the non-Western world, until I came across Google’s Cloud Vision system.
In fact, this is a common need of many companies – they are interested in building services for their customers, not conducting AI research. In following its externalization trend, Google has risen to this challenge by releasing many of its internal AI systems as public cloud APIs. Cloud Vision accepts any arbitrary image and catalogs objects and activities, OCRs text, recognizes the location depicted, estimates the emotion of human faces and even flags whether the image depicts violence. All with a single API call and with results returned in just a few seconds and infinitely scalable. Cloud Speech performs live speech to text in over 80 languages and, unlike legacy speech transcription systems, requires no training and is incredibly robust to noise. Cloud Natural Language accepts arbitrary text in English, Spanish and Japanese and outputs a robust dependency parse tree, recognizes key entities and even performs sentiment analysis. At Next ’17, Google expanded this lineup with its latest tool, Cloud Video Intelligence, which takes a video and segments it into scenes and identifies the major topics and activities in each scene, allowing one to take a massive video archive and instantly index it to make it topically searchable.
What makes these APIs so powerful is that they are exposed as a simple API and really do “just work” right out of the box. You simply make an API call with your data and after a few seconds get back the results of pretrained algorithms built by some of the top AI researchers in the world. The massive complexity of deep learning is all hidden behind a simple API call and you can even string API calls together to build unimaginably complex workflows with just a few lines of code.
Teowaki’s Javier Ramirez offers a glimpse at just how easy it is to rapidly build an entire workflow (and an immensely powerful one at that) from these APIs with just a few minutes of time and a few lines of code. In his tutorial, he takes a YouTube video of British Prime Minister Theresa May’s inaugural speech and feeds it through the Cloud Speech API to instantly generate a high quality textual transcript. He then feeds that transcript through Cloud Natural Language to extract the primary mentioned entities (along with links to their Wikipedia pages for more information) and calculate a general sentiment of the speech. It took just a few lines of code making a few API calls to take a YouTube video, transcribe it, extract key entities and sentiment code it. Even more amazingly, the entire workflow could be scaled up to run across millions of videos without a single change. This is the power of cloud.
Putting this all together, just as Google is externalizing its services and security models, it has also been opening the doors to its incredible AI advances, offering both a hosted AI environment for companies looking to build their own models and an ever-growing array of pretrained models that “just work” out of the box and allow companies to build complex applications with just a few API calls. It was clear from the number of sessions at Next that involved AI and its heavy presence in the keynotes that Google is betting big on rolling AI into the enterprise. In the end, Google has effectively democratized access to some of the world’s most advanced AI algorithms by making them so easy to use (literally just an API call away) that even the smallest businesses can now leverage the full power of deep learning to revolutionize how they do business.