Machine Leaarning
, also known as Cognitive computing
(CC) describes technology platforms that, broadly speaking,
are based on the scientific disciplines of artificial intelligence and signal processing.
These platforms encompass machine learning, reasoning, natural language processing,
speech recognition and vision (object recognition), human–computer interaction, dialog and narrative generation, among other technologies.
There are a lot of different deep learning libraries. Different libraries have different strengths and weaknesses.
- Torch: A Lua based API, used by Facebook, Twitter, NYU and until recently, DeepMind
-
Theano: The main
Python
library before TensorFlow. Still superior in some ways. - TensorFlow: open source software library for machine learning from Google
-
Caffe: A popular image-processing library with a
Python
API -
Keras: An intuitive
Python
wrapper for Theano and TensorFlow -
Deeplearning4j: The main deep learning library for
Java
andScala
- Weka: Collection of machine learning algorithms for data mining tasks
1 Lua frameworks
Lua
is a powerful, efficient, lightweight, embeddable scripting language.
It supports procedural programming, object-oriented programming,
functional programming, data-driven programming, and data description.
1.1 Torch and PyTorch
Torch is a computational framework with an API written in Lua
that supports machine-learning algorithms.
Some version of it is used by large tech companies such as Facebook and Twitter,
which devote in-house teams to customizing their deep learning platforms.
Lua
is a multi-paradigm scripting language that was developed in Brazil in the early 1990s.
Torch7, while powerful, was not designed to be widely accessible to the Python
-based academic community,
nor to corporate software engineers, whose lingua franca is Java
. Deeplearning4j was written in Java
to reflect our focus on industry and ease of use. We believe usability is the limiting parameter
that inhibits more widespread deep-learning implementations. We believe scalability ought to be
automated with open-source distributed run-times like Hadoop and Spark. And we believe that a
commercially supported open-source framework is the appropriate solution to ensure working tools and building a community.
A Python
API for Torch, known as PyTorch, was open-sourced by Facebook in January 2017.
PyTorch offers dynamic computation graphs, which let you process variable-length inputs and outputs,
which is useful when working with RNNs, for example.
Other frameworks that support dynamic computation graphs are CMU’s DyNet and PFN’s Chainer.
1.1.1 Pros and Cons
- (+) Lots of modular pieces that are easy to combine
- (+) Easy to write your own layer types and run on GPU
- (+)
Lua
. ;) (Most of the library code is inLua
, easy to read) - (+) Lots of pretrained models
- (+) PyTorch
- (-)
Lua
- (-) You usually write your own training code (Less plug and play)
- (-) Spotty documentation
2 Python Frameworks
Much of the deep-learning community is focused on Python
.
Python
has great syntactic elements that allow you to add matrices together without creating explicit classes,
as Java
requires you to do.
Likewise, Python
has an extensive scientific computing environment with native extensions like Theano and Numpy
.
2.1 Theano and Ecosystem
Many academic researchers in the field of deep learning rely on Theano, the grand-daddy
of deep-learning frameworks, which is written in Python
. Theano is a library that handles
multidimensional arrays, like Numpy.
Used with other libs, it is well suited to data exploration and intended for research.
Numerous open-source deep-libraries have been built on top of Theano, including Keras, Lasagne and Blocks. These libs attempt to layer an easier to use API on top of Theano’s occasionally non-intuitive interface. (As of March 2016, another Theano-related library, Pylearn2, appears to be dead.)
2.1.1 Pros and Cons
- (+)
Python
+Numpy
- (+) Computational graph is nice abstraction
- (+) RNNs fit nicely in computational graph
- (-) Raw Theano is somewhat low-level
- (+) High level wrappers (Keras, Lasagne) ease the pain
- (-) Error messages can be unhelpful
- (-) Large models can have long compile times
- (-) Much “fatter” than Torch
- (-) Patchy support for pretrained models
- (-) Buggy on AWS
- (-) Single GPU
2.2 TensorFlow
- Google created TensorFlow to replace Theano. The two libraries are in fact quite similar. Some of the creators of Theano, such as Ian Goodfellow, went on to create Tensorflow at Google before leaving for OpenAI.
- For the moment, TensorFlow does not support so-called “inline” matrix operations, but forces you to copy a matrix in order to perform an operation on it. Copying very large matrices is costly in every sense. TF takes 4x as long as the state of the art deep learning tools. Google says it’s working on the problem.
- Like most deep-learning frameworks, TensorFlow is written with a
Python
API over a C/C++ engine that makes it run faster. Although there is experimental support for aJava
API it is not currently considered stable, we do not consider this a solution for theJava
andScala
communities. - TensorFlow runs dramatically slower than other frameworks such as CNTK and MxNet.
- TensorFlow is about more than deep learning. TensorFlow actually has tools to support reinforcement learning and other algos.
- Google’s acknowledged goal with Tensorflow seems to be recruiting, making their researchers’ code shareable, standardizing how software engineers approach deep learning, and creating an additional draw to Google Cloud services, on which TensorFlow is optimized.
- TensorFlow is not commercially supported, and it’s unlikely that Google will go into the business of supporting open-source enterprise software. It’s giving a new tool to researchers.
- Like Theano, TensforFlow generates a computational graph (e.g. a series of matrix operations such as z = sigmoid(x) where x and z are matrices) and performs automatic differentiation. Automatic differentiation is important because you don’t want to have to hand-code a new variation of backpropagation every time you’re experimenting with a new arrangement of neural networks. In Google’s ecosystem, the computational graph is then used by Google Brain for the heavy lifting, but Google hasn’t open-sourced those tools yet. TensorFlow is one half of Google’s in-house DL solution.
- From an enterprise perspective, the question some companies will need to answer is whether they want to depend upon Google for these tools.
- Caveat: Not all operations in Tensorflow work as they do in
Numpy
.
2.2.1 Pros and Cons
- (+)
Python
+Numpy
- (+) Computational graph abstraction, like Theano
- (+) Faster compile times than Theano
- (+) TensorBoard for visualization
- (+) Data and model parallelism
- (-) Slower than other frameworks
- (-) Much “fatter” than Torch; more magic
- (-) Not many pretrained models
- (-) Computational graph is pure
Python
, therefore slow - (-) No commercial support
- (-) Drops out to
Python
to load each new training batch - (-) Not very toolable
- (-) Dynamic typing is error-prone on large software projects
2.3 Caffe
Caffe is a well-known and widely used machine-vision library that ported Matlab’s implementation of fast convolutional nets to C and C++ (see Steve Yegge’s rant about porting C++ from chip to chip if you want to consider the tradeoffs between speed and this particular form of technical debt). Caffe is not intended for other deep-learning applications such as text, sound or time series data. Like other frameworks mentioned here, Caffe has chosen Python
for its API.
Both Deeplearning4j and Caffe perform image classification with convolutional nets, which represent the state of the art. In contrast to Caffe, Deeplearning4j offers parallel GPU support for an arbitrary number of chips, as well as many, seemingly trivial, features that make deep learning run more smoothly on multiple GPU clusters in parallel. While it is widely cited in papers, Caffe is chiefly used as a source of pre-trained models hosted on its Model Zoo site.
2.3.1 Pros and Cons:
- (+) Good for feedforward networks and image processing
- (+) Good for finetuning existing networks
- (+) Train models without writing any code
- (+)
Python
interface is pretty useful - (-) Need to write C++ / CUDA for new GPU layers
- (-) Not good for recurrent networks
- (-) Cumbersome for big networks (GoogLeNet, ResNet)
- (-) Not extensible, bit of a hairball
- (-) No commercial support
- (-) Probably dying; slow development
2.4 Caffe2
Caffe2 is the long-awaited successor to the original Caffe, whose creator now works at Facebook. Caffe2 is the second deep-learning framework to be backed by Facebook after Torch/PyTorch. The main difference seems to be the claim that Caffe2 is more scalable and light-weight. Like Caffe and PyTorch, Caffe2 offers a Python
API running on a C++ engine.
2.4.1 Pros and Cons:
- (+) BSD License
- (-) No commercial support
2.5 Chainer
Chainer is an open-source neural network framework with a Python
API, whose core team of developers work at Preferred Networks, a machine-learning startup based in Tokyo drawing its engineers largely from the University of Tokyo. Until the advent of DyNet at CMU, and PyTorch at Facebook, Chainer was the leading neural network framework for dynamic computation graphs, or nets that allowed for input of varying length, a popular feature for NLP tasks. By its own benchmarks, Chainer is notably faster than other Python
-oriented frameworks, with TensorFlow the slowest of a test group that includes MxNet and CNTK.
2.6 DSSTNE
Amazon
’s Deep Scalable Sparse Tensor Network Engine, or DSSTNE, is a library for building models for machine- and deep learning.
It is one of the more recent of many open-source deep-learning libraries to be released, after TensorFlow and CNTK, and Amazon
has since backed MxNet with AWS, so its future is not clear. Written largely in C++, DSSTNE appears to be fast, although it has not attracted as large a community as the other libraries.
- (+) Handles Sparse encoding
- (-)
Amazon
may not be sharing all information necessary to obtain the best results with its examples - (-)
Amazon
has chosen another framework for use on AWS.
2.7 Keras
Keras is a deep-learning library that sits atop Theano and TensorFlow, providing an intuitive API inspired by Torch. Perhaps the best Python
API in existence. Deeplearning4j relies on Keras as its Python
API and imports models from Keras and through Keras from Theano and TensorFlow. It was created by Francois Chollet, a software engineer at Google.
- (+) Intuitive API inspired by Torch
- (+) Works with Theano, TensorFlow and Deeplearning4j backends (CNTK backend to come)
- (+) Fast growing framework
- (+) Likely to become standard
Python
API for NNs
3 Java Frameworks
Even most of deep-learning community is focused on Python
Java
and Scala
- have several advantages.
-
Java
remains the most widely used language in enterprise. It is the language of Hadoop, ElasticSearch, Hive, Lucene and Pig, which happen to be useful for machine learning problems. Spark and Kafka are written inScala
, another JVM language. -
Java
andScala
are inherently faster thanPython
. Anything written inPython
by itself, disregarding its reliance on Cython, will be slower. -
Java
’s lack of robust scientific computing libraries has been solved by with ND4J. ND4J runs on distributed GPUs or GPUs, and can be interfaced via aJava
orScala
API. -
Java
is a secure, network language that inherently works cross-platform on Linux servers, Windows and OSX desktops, Android phones and in the low-memory sensors of the Internet of Things via embeddedJava
. While Torch and Pylearn2 optimize via C++, which presents difficulties for those who try to optimize and maintain it,Java
is a “write once, run anywhere” language suitable for companies who need to use deep learning on many platforms.
3.1 Deeplearning4j
Deeplearning4j is distinguished from other frameworks in its API languages, intent and integrations. DL4J is a JVM-based, industry-focused, commercially supported, distributed deep-learning framework that solves problems involving massive amounts of data in a reasonable amount of time. It integrates with Kafka, Hadoop and Spark using an arbitrary number of GPUs or CPUs, and it has a number you can call if anything breaks.
DL4J is portable and platform neutral, rather than being optimized on a specific cloud
service such as AWS, Azure or Google Cloud. In speed, its performance is equal to Caffe
on non-trivial image-processing tasks on multiple GPUs, and better than Tensorflow or Torch.
Deeplearning4j has Java
, Scala
and Python
APIs, the latter using Keras.
4 Licensing
Licensing is another distinction among these open-source projects: Theano, Torch and Caffe employ a BSD License, which does not address patents or patent disputes. Deeplearning4j and ND4J are distributed under an Apache 2.0 License, which contains both a patent grant and a litigation retaliation clause. That is, anyone is free to make and patent derivative works based on Apache 2.0-licensed code, but if they sue someone else over patent claims regarding the original code (DL4J in this case), they immediately lose all patent claim to it. (In other words, you are given resources to defend yourself in litigation, and discouraged from attacking others.) BSD doesn’t typically address this issue.