Source – ibc.org
AI is a technology on the cusp. While nothing actually new in itself, a recent convergence of increased computational power with a mushrooming of large datasets and the refinement of existing understanding of the techniques involved has seen it become an important differentiator in the industry.
Following a breakthrough year in 2015 when its use, particularly by Google for image processing and translation, jumped markedly, 2017 has seen it rapidly spread into many different niches.
Of all the zeitgeist-friendly products on the showfloor at IBC2017, TVU Networks would have won a buzzword bingo competition with its Smart Captions subtitling software.
Since renamed TVU Transcriber, it marries AI routines with voice recognition — another fecund key technology segment in Amsterdam — to provide an audio to text transcribing service.
R&D dollars
As the existence of Alexa, Google Assistant and Siri can all attest, there have been lots of R&D dollars thrown at this field over recent years, so there is little surprise in seeing it being rolled out successfully in the broadcast landscape.
“The system uses an AI-based voice recognition system that is context-based,” explains TVU Chief Executive Paul Shen.
“Transcriber is also capable of recognising multiple languages. We are offering it as an optional service on tens of thousands of deployed TVU Transceivers in the market.”
It is being pitched at several different industry sectors, both live broadcast and streaming, and it is also an increasingly popular choice for generating metadata. Indeed, Shen says that using AI routines has resulted in a more accurate system than one using skilled human operators. “We have seen less errors with this system than what we have seen in lots of live TV programmes,” he says.
“AI is here to stay and will play increasingly important role in the TV production workflow” – Paul Shen
The compute resources are in the cloud, which does introduce the potential at least for lag to affect the workflow. To date though this does not seem to have been a problem.
“The recognition is very fast,” says Shen. “In fact, the output of closed-caption and video is perfectly synchronised.”
Shen says that the company has a rich pipeline of forthcoming products and services which use AI engines to power them.
“We feel that AI is here to stay and will play increasingly important role in the TV production workflow,” he says.
Cloud-specialist, increasingly AI-specialist, Amagi was meanwhile demonstrating machine learning in two different contexts at IBC.
“The first use case we have for machine learning is logical segmentation, which is done by training the ML systems with existing video segments to determine best spots to take an ad break,” explains company co-flounder, K.A. Srinivasan.
“This training allows the system to determine logical break points based on multiple factors such as heightened background audio levels representing crucial moment in the programme. The system populates these ad-break points suggestions that are then sent to a human expert for validation.
Spotting ads
“The second use case is detection of advertisements themselves in a linear feed. This can be achieved by continuously scanning multiple linear feeds for detection of advertisements, and then storing their signatures for other systems.
”This may allow TV networks to seamlessly detect ads and replace them programmatically when they convert their linear feeds in OTT. Other uses could be making an ad-less clean feed for Pay-TV, to adhere to regulations, or to regionalise ads without having to manually detect them.”
Multiple trials have been undertaken and the company is hoping to offer the service under a new Amagi Tornado banner with the first deployments rolling out in 2018. Indeed, the company sees it as a larger strategic direction for the buinesss, and one that is likely to become a service layer that cuts across all products and offerings in future.
Srinivasan says that not only can other computational methods be too precise for the unpredictability of video processing, but that, unlike machine learning, traditional computing systems need to be programmed over and over again to make adjustments for changing trends.
“AI or machine learning has to be seen in context of other developments in the industry,” he says. “We now have more content, and more number of ways to consume it.
“In future the two key challenges would be to process large volumes of content, and to deliver it in hundreds of different ways, making it more modular. These are real issues that need to be addressed, and the use of AI could be of significance in this context.”
While it is tempting to think of AI as being wholly cloud-based due to the sheer amount of compute power required for its operation, there are many pressing reasons for manufacturers to be driving it into a self-contained, on-device format.
Google Clips, the palm-sized AI-powered camera that Google introduced almost as an afterthought during the Pixel 2 smartphone launch, exemplifies this and also hints at how AI can become useful at the production end of the industry.
The camera is a set and forget device. You turn it on and it waits until it sees a face it recognises — this can be human or animal — and then it will take a picture.
It trains itself to recognise faces it sees often and to ignore ones it doesn’t. Initial prototypes didn’t even have a shutter button for manual override, but this tested poorly with users outside of Google’s increasingly AI-centric environment and so one was added.
Crucially, in a bid to offset arguments about privacy, it does all this on-device in an encrypted manner with no automatic syncing to the cloud or even a paired phone.
This, of course, will limit its use in broadcast environments almost as much as its ability to only take 15fps bursts of video, but it’s not hard to see the immediate attraction of such devices to reality shows at the very least.
One of the important things to note about AI is that development in the sector is liable to progress faster than the already dizzying ambient speed of the rest of the computer industry.
Google’s Tensor Processing Unit, an application specific integrated circuit especially designed for machine learning, was said by the company to deliver 15-30x higher performance than the GPUs that had dominated the field before its development.
A second generation is in the wings. Intel, meanwhile, recently unveiled its Nervana Neural Network Processor, which it says is a step along the road to its aim of delivering a 100x reduction in the time taken to train a deep learning model by 2020.
Progress is necessary too. A recent study on the use of voice recognition for medical note-taking concluded that the best systems only managed to achieve 18.3% accuracy, so there is work to be done.
There is a potential wrinkle ahead though, in that the forthcoming EU General Data Protection Regulations (GDPR) have specific concerns about AI.
Article 22 of the GDPR states that “The data subject shall have the right not to be subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly significantly affects him or her.”
This could be especially problematic when it comes to the growing field of using AI to analyse viewer data.
Unpacking the consequences of the GDPR legislation is turning into a mini industry all of its own, but it seems that, as well as broadcasters needing to obtain consent from customers to collect and process their personal data, they will also have to share information with them regarding the logic involved and the significance and envisaged consequences for the individual.
All of which could get very tricky once Deep Learning routines start processing new data in new ways without any human agency involved.
When Google launched Google Clips at its October 4th hardware event, it spent much of the first part of a ritzy presentation banging the drum for AI, even going to far as to state that Moore’s law was dead and that the future resides at the intersection of ’AI + software + hardware’.
Humans, it seems, still want to be able to write the rules that govern that intersection too.