Data Science to the Rescue

By Robert Beaton

Data-Science

The “fog of war” is an often-used metaphor for the uncertainty, ambiguity, and lack of information that impedes military decision making on the battlefield. Reducing that fog is one of the most significant advantages we can give to warfighters. Over the years, the United States has made huge investments in strategic and tactical systems to overcome the fog of war, but the capabilities of those systems are not adequate to deal with the quantity, quality, and detail of information that will become available in future military conflicts. This is resulting in the rise of “data science,” a new way of conceptualizing how we manage information that combines how data are represented, organized, processed, shared, and interpreted under relevant context and with necessary assurance.

THE INFORMATION EXPLOSION AND BIG DATA 

 Often described as “the information explosion,” the dramatic growth in data available to our forces is truly without parallel. By 2020, the average Navy ship should be able to deploy with the capability to store more than 1,000 terabytes of data. That capacity will be quickly filled up by data collected from the ship’s combat and information systems, from a wide range of supporting unmanned vehicles and sensors, and from huge libraries of information brought with the ship when it deploys. All of that will be augmented by vast amounts of data collected and provided by national sources.

Over the past decade, key data technologies have revolutionized access to knowledge and information. These data technologies go by the names of “big data” and “semantic Web.” Big data technologies, first popularized by Google’s search engine, make it possible to search for information across geographically distributed databases, while leveraging the power of thousands of computers simultaneously analyzing and delivering search results. Through this technology, as most of us commonly experience, Google delivers millions of responses in less than a second.

Recently, the World Wide Web Consortium led the development and standardization of semantic Web technologies to provide a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. Because data have been controlled by applications, sharing and analyzing remain difficult. Anyone with Department of Defense experience will understand the ongoing challenge of correlating data from across military departments or agencies.

These key data technologies are leading to a major paradigm shift when it comes to data. The old paradigm was to figure out how to manage and use internal organizational data to solve problems. The new paradigm involves solving problems by augmenting internal data with the massive amount of data being created by communities throughout the world. In the old paradigm, each community developed its own closed solution and did not put much weight on integrating its data with the wider world. In the new paradigm, communities make their data available so it can become part of a much larger “data ecosystem.” 

DATA ECOSYSTEMS

A data ecosystem is a large number of interconnected, distributed data sets provided by a large number of diverse communities interconnected and/or aligned to extend the collective knowledge of all. It includes all the infrastructure, support tools, and processes needed to add data to the ecosystem, align and interconnect it, and support the end user’s use of the data. It is important to understand that a data ecosystem has characteristics of both big data (large volumes of heterogeneous data) and semantic Web (large variety of data/communities).

In 2010, the Department of the Army initiated the Unified Cloud Data (UCD) model technology architecture to establish the pilot Army Intelligence Big Data strategy, which, in turn, would inform Army programs of record and the data ecosystem across the service. UCD converges semantic Web big data technologies to improve radically intelligence and analytics and, by extension, cross-service warfighting capability. Through the UCD ecosystem, the Army is correlating all appropriate command-and-control and fire-support data sources to deliver relevant applied information to battle commanders in near real time. The Office of Naval Research’s Command, Control, Communications, Computers, Intelligence, Surveillance, and Reconnaissance Department is experimenting with the UCD ecosystem data technologies to develop a Naval Tactical Cloud (NTC) ecosystem to support integrated fires.

NTC incorporates software and tools to organize disparate data from many different communities into a single big-data environment, so that the data are fully integrated, accessible, and useful to users. A single big-data environment results in information interoperable across organizations, enabling the sharing of data and analytic tools. As a data ecosystem, NTC consists of a set of representation and semantic tables, plus the software tools, processes, and best practices that have been developed to bring disparate community data into the NTC ecosystem, run analytics on the data to generate extracted knowledge, and provide tools for end users to search, access, and use the data and extracted knowledge.

NTC combines semantic Web technologies with big data technologies while applying data science to ensure operational effectiveness to Army Distributed Common Ground System users. One of the most important concepts in the NTC ecosystem is using semantic Web technology to create data graphs as triple statements to support cross-community data analytics. Representing data in this way creates the flexibility and adaptability required for interconnecting and aligning data sets from widely disparate communities. This is why the NTC ecosystem could help migrate from a world of stove-piped data systems to a world of interconnected data ecosystems.

NAVAL DATA SCIENCE CHALLENGES

The evolution of big data and semantic Web technologies has now matured to the point where they are no longer considered major science and technology challenges. From a science and technology perspective, the time has come to move beyond this foundation and begin concentrating on the data science challenges that lie ahead. First and foremost, we must develop the underlying data science constructs that will enable the naval community to pull data out of today’s stove-piped systems and integrate them into an ecosystem that will support cross-warfare area data sharing and analytics. There are additional challenges that must be addressed:

Distribution of Data over a Tactical Force: In tactical situations, it is generally not possible to move data to a central site for processing. In a naval task force, most of the data generated and collected by each ship will have to be kept on-site during the ship’s deployment. Only when the task force returns to port will there be sufficient network capacity to offload all of the data. Such tactical situations will require data scientists to determine how best to distribute data within a force.

Prioritizing Data Movement in Constrained Network Conditions: Data generated or collected by a tactical unit generally will reside at the tactical unit. There always will be high-value information, however, that is needed by other units in the force. In such cases, it is important that that information be replicated to other tactical units to allow them to have more direct access to the data and ensure it is available to other units when the original unit is disconnected from the network.

Representation of Data for Efficient Movement across Tactical Networks: In many cases, the information content in data isn’t a single binary package. Consider a three-minute video of an enemy destroyer that needs to be sent from a collecting unit to an attacking unit. In the best case it would be desirable to send the full video clip. When that is not possible, there are less-costly alternatives, such as selecting the most useful 30-second portion or sending one screen capture taken from the video. An even smaller data set would be to send a chip from the full image. The smallest would be to send only the geospatial coordinates and current heading of the enemy destroyer. Data scientists need to develop data representations that support variable data resolution to account for the variation in network capacity that will occur in tactical environments.

Prioritizing Data Retention in Constrained Storage Conditions: One of the driving assumptions behind traditional big data environments is that storage is infinitely inexpensive and elastic. This assumption is not valid for tactical units operating under constrained space and power conditions. In such situations, there is an upper limit on available storage, and although huge improvements in storage densities and power consumption make that limit extremely high, storage is ultimately a finite resource aboard ships and Marine combat operations centers. Data scientists must now prioritize data retention in constrained conditions. Determining which data to retain and which to discard needs to be driven by operational priorities, but data scientists must organize and structure the data to provide the hooks for making retention decisions.

To fully realize the Navy’s information dominance vision requires that significant strides be made in developing a naval data science foundation that enables integrated cross-warfare area operations. Significant challenges lie ahead, but with hard work and creative thinking, we may be able to assist military decision makers in dispersing some of the fog of war that makes their jobs so difficult.

About the Author:

Robert Beaton is contractor supporting the Office of Naval Research’s Naval Tactical Cloud project and has spent the past 30 years working on information, command and control, and intelligence, surveillance, and reconnaissance systems for the Department of Defense.