The Rapid Rise of Neural Networks for Defense: A Cautionary Tale

By Dr. David A. Johannsen, Dr. Jeffrey L. Solka, and Dr. John T. Rigsby


Fueled by rapid increases in computer storage capacity and processing power (principally through the use of graphical processing units) and the widespread availability of powerful software for designing and implementing neural networks (such as Google’s TensorFlow), the application of artificial neural networks to significant problems in the real world has seen tremendous growth over the past several years. During this time, neural networks have demonstrated successes on problems as varied as automatic recognition of handwritten digits, automated image captioning and indexing, and have even beaten human masters in the game of Go. In fact, the field has reached the state of maturity that a person with only casual knowledge of computer programming and equipped with a modest PC can implement a neural network for

whatever problem he or she might have at hand.

Given this climate of success, there is growing interest in fielding neural networks in Department of Defense (DoD) systems. In this brief note, we will discuss the nature of neural networks in a language that can be broadly understood, especially in the context of the unique environment within DoD, so that Navy leaders can be better informed about the strengths and limitations of this technology as it impinges ever more frequently on the DoD. We will first attempt to explain to nonspecialists what an artificial neural network is. We will then discuss some of the inherent limitations of this class of machine learning tools and some of the ways that we and other members of the DoD are studying these limitations. Finally, we will discuss the consequences of these limitations.

What Is a Neural Network?

Rather than giving a precise mathematical definition of a neural network, we will begin by giving a functionally oriented definition. Thus, we describe a neural network as a nonlinear function from the space of inputs to outputs. The particular function is chosen from a broad class of nonlinear functions through a process known as training. Often, in current practice, the choice of nonlinear function is underdetermined; that is, the function contains more parameters to be learned than the number of observations that one has at hand for training the algorithm.

For example, in the context of image captioning, the space of inputs is the collection of all possible digital images, and the output space is the collection of all meaningful captions. The neural network accepts a digital image as input, and produces a caption. Along the way, hidden from the end-user, the computer treats the image as a mathematical object, performs mathematical operations on it, and then produces a numerical output (which is often a vector of probabilities of membership in the various classes). This vector of probabilities is then converted to a caption that is presented to the user.

Figure 1: An example neural network with four layers.

We will now be a bit more precise in defining a neural network. The formulation of neural networks as a method of machine learning was motivated by analogy with the functioning of neurons in the human brain. Thus, neural networks consist of a set of nodes (neurons) with edges between them. Outputs of the nodes are multiplied by the weights associated with the edges and fed forward to the nodes in the next layer of the network. This information is adjusted by a threshold function associated with the node and then propagated through the neural network. In Figure 1 we present a four-layered neural network with an input layer, two hidden layers, and an output layer. Following our discussions above, one might imagine that each node in the output layer provides the probability of the input belonging to one of three classes.

A Virtue Is a Vice

Statistical pattern recognition has historically involved a somewhat standard pipeline, which is illustrated in Figure 2. The first three steps of this process (i.e., data collection, data processing/cleaning, and feature extraction) are often time consuming. If possible, a practicing statistician is well served to spend his or her time on the steps contained in the dotted box. The “extract features” and “dimension reduction” steps can be particularly daunting. The process of feature extraction and selection is usually the domain of experts and often will require significant time and experimentation to determine what features should be selected and utilized for the task at hand. Neural networks promise to revolutionize this pipeline by incorporating these two steps directly into the model building step without the need for experts. The virtue of artificial neural networks is that one can train a network to perform complex machine learning tasks such as interpolation, classification, regression, etc., without having to go through the process of feature generation and dimensionality reduction. In our discussions below we will focus on the consequences of automating these two steps of the pattern recognition process.

Before getting too far along, we should note that in some settings it may be impossible to generate features specifically tailored for each of the possible classes. For example, in automatic image captioning, as the image may be of anything in the world, there are virtually a limitless number of different classes. It is therefore not possible to specify optimal features to be extracted for each of the classification tasks. Thus, it is indeed a benefit of neural networks that they free scientists and engineers from the necessity of crafting specific features for each possible class. In many problem domains, however, neglecting to fully understand the processes that gave rise to the data (i.e., the training data) and then not actively participating in feature selection yields a model of which we have no understanding. In the following paragraphs we will briefly describe some of the implications of this aspect of neural networks.

Figure 2: Pattern Recognition Pipeline.

The “Black Box”

If one reads the scientific literature on neural networks, one will quickly see that they are often described as “black boxes.” What is meant by this? The complexity of a network trained to tackle nontrivial “real-world problems” is very high. That is, the internals of the network hide an incredible mathematical complexity. In fact, the complexity is so great that one cannot interpret how the input features provide a basis for the output. This is known in the statistics field as a nonattributable model. In the context of an application such as image captioning, this is of some concern. For example, if the neural network errs and gives the label “dog” to an image of a cat, the designer is troubled by not knowing exactly what features in the image caused the misclassification and therefore is unable to alter the algorithm to prevent future misclassifications.

This lack of insight into the relation between input and output is much more troubling for DoD applications, where the consequences of misclassification are often much more serious. For example, if one is designing an autonomous vehicle, one would like to be able to predict how the vehicle will react to a given input from its sensors. With a neural network, it is generally impossible to know the output of the network prior to presenting the input to the system and observing the output.

Figure 3: Simple linear model with white noise.


Generalizability is the ability of the model to produce reasonable output when presented with an input that is different from the data used to train the network. The issue of generalizability is a central concern in DoD applications: how will a fielded system perform with subtle changes to the environment? As we do not know much about either the processes that gave rise to the training data nor much about the “black box” nature of neural networks, it is impossible to predict the output. There are no guarantees of the behavior of the neural network, even when presented with inputs that do not vary substantially from the training data. In fact, there is currently no body of theory that governs the behavior of neural network outputs.

The class of functions that a neural network model is chosen from has tremendous richness and great power to approximate highly nonlinear behavior. Though this expressive power is useful for learning from training, model richness is a potential problem when one wants to predict/classify a new observation. The figure below represents how a model (green line) created with far too much complexity suffers from overfitting and does not generalize to the simple linear model (black line) that generates the data points (black circles).

Sufficient Training for Realistic Tasks?

As neural network models are potentially highly nonlinear, their predictive power can degrade quickly when presented with an input that differs significantly from the training data. In fact, because of the highly nonlinear dependence of network outputs on the inputs, significant degradation can sometimes occur when presented with an input that is nearly indistinguishable from one used in training the network. This problem is a significant one when one considers military applicability—a system that may be required to perform highly consequential tasks in an environment that may be at any location in the world. There is no way that one can accumulate enough training data to ensure that one is never presented with an input that differs from the training data.

In fact, in any real-world application, the complexity of the environment guarantees that the sensor will encounter inputs that differ significantly from the data used to train the network. As evidence of this phenomenon, consider the difficulties encountered with self-driving cars. Staying on a reasonably well-delineated road and interacting with objects that are frequently obeying a fairly rigid rule-set still contains enough “surprises” that we have recently witnessed several catastrophic failings. How much more difficult is the problem of deciding friend or foe in a jungle at night and in the rain?

Mitigating These Problems 

Numerous organizations have begun programs to better understand these limitations. The Defense Advanced Research Projects Agency (DARPA) has started the Explainable Artificial Intelligence (XAI) program, which seeks to develop methods to better understand the decisions made by AI systems. DARPA also has begun the Lifelong Learning Machines (L2M) program, which seeks to develop machine-learning-based systems that provide the capability to train themselves in the field in the face of new environmental or mission-based conditions.

Our own organization, the Naval Surface Warfare Center Dahlgren Division (NSWCDD), also has begun efforts to help better understand these shortcomings. Our ongoing effort, “Neural Networks for Manifold Discovery,” seeks to apply advanced mathematical methodologies to better characterize the fragility of neural networks and other machine-learning methodologies. Our new start effort, “Adversarial Learning for Robust AI,” seeks to use recent research in “adversarial examples” to better understand how we can make neural-network-based systems more robust to environmental or enemy precipitated changes to operational environments. Both of these efforts were funded under the Naval Innovative Science and Engineering program, which is designed to serve as a major innovation catalyst for the naval surface warfare centers.

Final Comments

We hope that we have presented a fairly objective overview of artificial neural networks. We have tried to describe both the strengths of this class of machine learning algorithms, as well as illustrating some of their current limitations, especially in the unique environment of the DoD. We acknowledge the demonstrated successes of neural networks and believe that there are settings where the technology works very well; for example, developing AI for wargaming, planning, or training seems a very good use of the technology. In situations of complex environments where system performance errors have the potential for tremendous fiscal cost and potential to endanger lives, we need to be very cautious. We remain optimistic that programs at DARPA, NSWCDD, and other organizations can help better understand and ultimately mitigate these shortcomings. Until theory can catch up with practice, is a system whose outputs we can neither predict nor explain really all that desirable?

About the authors:

Dr. Solka is the chief scientist of Naval Surface Warfare Center Dahlgren Division. Dr. Johannsen and Dr. Rigsby are members of the applied mathematics and data analytics group at the Naval Surface Warfare Center Dahlgren Division.

About Future Force Staff