A Standard Attempts to Cut Through Thicket of Neural Network Hardware

After an artificial neural network has been trained on enormous amounts of data but before it can be installed in an autonomous car to navigate a stretch of highway it may have never taken before, companies need to translate it. The neural network assumes a new file format so that the car’s computer can figure out what it has been taught.

Software engineers educate these algorithms using tools like Amazon’s MxNet and Google’s TensorFlow. Every software framework exports data in a different format, forcing companies like Qualcomm and Intel to build software programs that import the actual algorithm into their computer processors. That is a huge task, since every processor needs a separate importer for every software tool it supports.

But that could change with a new standard that represents the data used by neural networks in a common format. The Khronos Group recently released such a standard in an effort to cut through the growing thicket of software tools and computer chips for deep learning. Once it is officially finished, the standard could give engineers more freedom to choose suppliers.

“Any successful standard enables two different communities to communicate effectively,” said Neil Trevett, president of the Khronos Group, the industry group that released a provisional version of the Neural Network Exchange Format (NNEF) last month. “This standard allows neural network authors to talk to inferencing hardware through a single, agreed upon file format.”

The standard was made to resolve the fragmentation in the market for deep learning. Over the last two years, internet companies have introduced a wide range of new software frameworks, making it necessary to build many proprietary exporters and importers to the growing number of custom chips built for inferencing tasks.

Nvidia leads the market for chips used in the training phase of deep learning. The firm’s graphics chips contain thousands of cores that divide and conquer these programs faster than traditional ones. The company has reinforced its market position by continuing to support all the major software frameworks for deep learning. That way, engineers can keep using Nvidia’s silicon as other software fades out of use.

There is another battle brewing over custom chips for inferencing, which include FPGAs installed in data centers to improve internet searches, GPUs embedded in vehicles to avoid obstacles, SoCs squeezed inside smartphones to enable voice recognition, and extremely efficient architectures from start-ups like Mythic and Cambricon.

Khronos began talking about the possibility of a standard to reduce the threat of fragmentation about three months before it was officially announced in October 2016. The concept came from Khronos member AImotive, an automotive start-up trying to sell an entire software stack for autonomous driving as well as the custom chips to run it.

“As the number of development frameworks expands, and the range of execution platforms grows and diversifies, the ability to freely move network topologies and weights from one environment to another is essential for innovation and freedom of supplier choice,” Marton Feher, AImotive’s head of hardware engineering, said in a statement.

Trevett said that most companies can only afford to import the most popular software frameworks. But semiconductor start-ups with custom architectures and tight budgets could be forced to sell chips with support for only a single format, running the risk of losing customers that shift to other frameworks with different capabilities.

Khronos faces competition from another format called the Open Neural Network Exchange (ONNX), which makes it easier for software engineers to transfer algorithms between PyTorch and Caffe. In September, Facebook and Microsoft partnered with Amazon to release the open source standard, which is also supported by companies like Qualcomm and Intel, among others.

Facebook’s format differs from the Khronos Group’s standard, which is exclusively focused on porting algorithms to computer chips, not the interchange between training frameworks. Both formats work roughly the same way, converting the interconnected nodes and weights stored inside a neural network into a computational graph.

Unlike ONNX, the NNEF format supports compound operations, which can be used to improve the efficiency of chips running inference tasks. Khronos built its standard more for hardware companies, which is why it’s strange that Nvidia extended its proprietary inference engine to support ONNX – but not NNEF.

Trevett, who also works as Nvidia’s mobile ecosystem vice president, said in an interview that the graphics chipmaker has been trying to create something like NNEF to offer its automotive customers. But he believes that Nvidia will support NNEF in the future: “We haven’t made a decision either way but personally I think it will happen.”

Khronos plans to continue working with its members to improve the standard, which could be finished in three to six months. Trevett said that Khronos wants to work with other firms to extend the standard to other software frameworks like Chainer and Torch. “The value is not the file format, the value is in agreeing to all use the same one,” he said.

Until other companies support it, the standard will works with OpenVX, an inference engine that Khronos originally built for accelerating computer vision algorithms. The industry group recently released a syntax parser and validator as well as example exporters for TensorFlow and Caffe, two of the most popular software frameworks for deep learning.