Investing in Social Cybersecurity

By Lt. Col. David M. Beskow, USA, and Dr. Kathleen M. Carley

THE SOCIAL MEDIA ENVIRONMENT CAN BE EVERYTHING FROM SILLY TO VICIOUS—AND OFTEN IT CAN BE HARD TO TELL THE DIFFERENCE. MACHINE LEARNING TOOLS ARE NOW HELPING TO FIND THE BOTS AND MEMES THAT HAVE MALICIOUS INTENT, AS WELL AS WHO IS USING THEM.

The modern information environment has created an entirely new warfare domain: cybersecurity. Much has been written about traditional cybersecurity, which focuses on humans using information systems to hack other information systems—but much less has been made about the capabilities required for social cybersecurity, which focuses on humans who use the same information systems to hack other humans. While “information operations” have existed since antiquity, the modern age has allowed them at a scale, complexity, distance, and impact unheard of even 50 years ago. As a response to this emerging threat, social cybersecurity allows a democratic society to continue to exist while retaining its core values. The National Research Council consequently has recognized it as a key computational social science area of relevance to the intelligence community.1 To accomplish this, social cybersecurity professionals need multidisciplinary science and appropriate technology to quickly identify and neutralize modern disinformation threats that are taking aim at the core tenets of society.

Social media are the main weapons of disinformation operations. State and non-state actors execute disinformation operations across multiple social media platforms hoping to overflow into traditional media and grassroots movements. Within social media, actors attempt to manipulate the narrative as well as the network. Together, this manipulation forms an information campaign, where sophisticated actors develop multiple lines of effort that combine to support strategic goals. These campaigns are deployed in social media through curated actors (bots, memes, cyborgs, sock-puppets, etc.) and creative content (memes, videos, written propaganda, etc.). Research and acquisition efforts that support social cybersecurity must aid in identifying threat actors and content at the lowest level, and then aggregate this into a common operating picture of the threat campaign lines of effort and their strategic intent. We will discuss some of our teams’ efforts to chip away at this important national security science and technology requirement.

These are examples from the Bot Field Guide, which gives detailed information about a variety of well-known bots.

Building a Framework

National security leaders require a framework within which to understand information warfare forms of “maneuver.” We have developed such a framework, known as BEND. BEND creates information forms of maneuver similar to the forms of maneuver often used to classify offensive ground combat operations. This framework is discussed in detail in a March-April 2019 Military Review article as well as a separate article in this issue of Future Force (see page 22). The forms of maneuver encapsulated in BEND are an essential contribution to the science of social cybersecurity and are a starting place for all national security leaders trying to understand this emerging threat. In addition to building the framework, we are developing metrics that assist in detecting these forms of maneuver in social media streams. These metrics are available in ORA-PRO, a network analysis and visualization tool available from Netanomics, and in a future web version of ORA-PRO.

The national security establishment also must have the technology to outline a threat operation, associated narratives, targeted networks, and measures of impact. This tool must digest social media streams connected to emerging events and extract the threat situational template. This includes identifying the actors and content, target audience and networks, and likely desired end states. We are pioneering novel social cybersecurity techniques that would extract information campaign elements and measures of impact from curated social media streams. These techniques blend artificial intelligence and dynamic network analysis to address social cybersecurity concerns ranging from bot detection to the spread of disinformation.

Types of Actors

A bot is any social media account that allows a computer to execute basic social media activities (such as tweet, retweet, friend, follow, like, reply, etc.). A savvy computer programmer can automate most of these activities with only a few lines of code. Researchers often try to classify accounts as either bots or humans, but many accounts are hybrids combining the activities of both. These cyborg accounts often have a human conducting nuanced two-way dialogue while the computer conducts activities at scale in the background. Bots can be positive, neutral, or malicious. Positive bots include personal assistants and accounts that warn people of impending natural disaster. Neutral bots generally focus on spam, proliferating content that ranges from commercial advertising to adult content. Malicious bots are involved in intimidation, propaganda, slander, etc. Troll accounts have human operators that specialize in aggravation as an end in itself, where divisive actions are initiated for the sole purpose of building or widening fissures in a society in an attempt to make it less cohesive. Sock-puppets are the false identities attached to troll, bot, and cyborg accounts to make them fit in with their target audience/network. The artificially intelligent assistants discussed below can assist analysts in differentiating types of actors.

BotHunter

Researchers have developed sophisticated machine learning algorithms to detect bots. This has resulted in a cat-and-mouse cycle in which bot puppet masters develop increasingly sophisticated bots to stay ahead of increasingly sophisticated anti-bot algorithms. We have developed a machine learning tool known as BotHunter to assist in finding malicious bots.2 This is a supervised machine learning tool that has been trained on multiple bot training data and can detect bots at various data granularities. BotHunter is different from other detection algorithms in that it is designed to scale while conducting prediction on existing data. The unique focus of this algorithm is its ability to render a quality prediction on researchers’ own data at a scale that is not feasible in other bot-detection approaches. In the past, social cybersecurity researchers were required to sample their data for bot detection because existing models did not scale. With BotHunter, these researchers can conduct bot detection on all their data without sampling. For example, we were able to run BotHunter using a single computer processor on 60 million tweets associated with a large world election event and received results within 24 hours (it can process approximately 4.5 million tweets per hour per thread). This same prediction would have taken months with other algorithms. In addition, BotHunter can run on existing data.

Researchers often collect data associated with a world event, and then think about executing bot detection only later. Current bot-detection algorithms often “re-scrape” the data, which is time consuming, possibly out of date, and unable to get detection data on accounts that are suspended or otherwise shut down (which are often the most interesting accounts). By running on existing data, BotHunter overcomes these limitations. The primary production BotHunter algorithm is trained on approximately 20,000 accounts that attacked NATO and the Atlantic Council’s Digital Forensic Research Lab, ensuring that these predictions are relevant to national security analysts. BotHunter, in conjunction with MemeHunter (discussed below), provides state-of-the-art machine learning algorithms to analysts to assist in sifting through large social media streams.

MemeHunter

Internet memes, often thought of as humorous and harmless artifacts of the digital age, are increasingly used in information warfare. Since almost all are anonymous, increasingly political, and require sophisticated multimedia and multimodal machine learning to dissect, memes are becoming a mainstay of propaganda and disinformation operations. Memes offer the combination of an image and witty text to connect a propaganda message with a target audience, often appealing to existing biases. In addition, memes propagate in a different manner than normal viral content. Memes, as originally envisioned by evolutionary biologist Richard Dawkins in his book the Selfish Gene in 1976, propagate through mutation and evolution. This means they can be introduced in anonymous platforms such as 4chan and Reddit, hop into mainstream social media outlets such as Facebook and Twitter, and then move quickly to other places on the internet.

We have developed a multimodal meme detection algorithm that takes into consideration the image, text, and faces in an image to determine if it is a meme. To make this possible we also have developed a meme-specific optical character recognition (OCR) process. Traditional OCR tools often fail when used with memes. Our meme-specific OCR preprocesses meme images so that traditional OCR algorithms can be used to extract text. We also have developed graph learning techniques to take meme embeddings and cluster them to discover the evolutionary tree that maps the mutation of memes. MemeHunter is a deep learning algorithm that can classify roughly 7,000 images per hour per thread. It automatically detects and uses available cores, and on a medium-sized server (38 cores) can process approximately 250,000 images per hour.

This visualization shows how 56,000 probable bots (red dots) were identified out of a dataset of more 330,000 unique Twitter accounts.

Describing Bots—A Field Guide

Even though bots are extremely prolific, most humans struggle to identify them. To make identification easier, we’ve developed a bot field guide that—like an animal field guide—provides many examples and descriptions of various malicious bots that we’ve found. This field guide provides a brief description and screen capture of the accounts, provides some descriptive visualizations to understand the accounts’ behaviors, and offers metrics to understand how we can identify the accounts as well as what type of messages the accounts are sending or amplifying. The draft field guide has 11 sections:

  • Normal users (personal, commercial, and government accounts)
  • Amplifier bots
  • Cyborg bots
  • Chaos bots
  • Coordinated bots
  • Social influence bots
  • News bots
  • Overt bots
  • Intimidation bots
  • Russian and Iranian bots
  • Random string bots

By walking through the field guide, analysts, journalists, and others can learn how to recognize these accounts. They find out how to look for high volumes, high retweet counts, odd friend/follower ratios, anonymity, and other distinguishing characteristics. They also will begin to understand how these accounts are used, who the targets or benefactors are, and in what conversations they are participating. They also can identify telltale signs of a bot puppet master trying to leverage a single account in multiple conversations (for example, using an account for the US election cycle and then pivoting to an anti-EU campaign in Italy).

Use in Intelligence and Public Affairs The science and technology capabilities discussed above can provide advantages to specialists across the Department of Defense, including intelligence and public affairs professionals. As information operations are increasingly used either as an end in themselves or as shaping operations, intelligence analysts will be increasingly required to detect, monitor, map, and analyze these campaigns. Without science and technology investments in social cybersecurity, these analysts will spend most of their time looking for needles in haystacks. By using machine learning, these analysts can spend more of their time making sense of the patterns and preparing their analyses.

Public affairs offices also need to have some basic social cybersecurity techniques to see how many of their followers and retweeters are bots, cyborgs, or just dormant accounts. They need to understand what an intimidation campaign looks like and when actions should be undertaken to counter these subversive attacks. These social cybersecurity tools will help public affairs personnel monitor the threat narrative and strategic aims to make sure their message creates an appropriate counter narrative and is not being manipulated in social media.

For analysts, public affairs officers, and many others to be successful, defense leaders must set appropriate policy to enable access to the right data by the right people. Application programming interfaces are the access point for both offensive and defensive social cybersecurity. Some specially trained individuals in the intelligence and public affairs disciplines must have the required authorities to access data and conduct analysis. Intelligence and PAO analysts arguably require “pull” authorities, whereas information operations analysts arguably require both “pull” and “push” authorities. What is meant by this push/pull information relationship is how information is curated. For the intelligence community, pull and push information is from and to a desired audience. The PAO community’s primary mandate, however, is to push information to stakeholders.

Currently, much of the access to the data streams is provided by commercial tools and served to analysts. While these tools undoubtedly provide value, they do not provide all necessary data and analysis. In addition, the government is left with no actual data—which is owned and maintained by commercial entities—to incorporate into workflows and tools.

What Can These Tools Tell Us?

We examined influence campaigns in Twitter by looking at 1.6 million tweets from 330,000 unique accounts, each of which either mentions, replies, or retweets overt Russian propaganda outlets such as Russia Today and Sputnik. We ran BotHunter on the entire dataset and found that 56,000 accounts had a bot probability greater than 65 percent. These bots are visualized in the Russian propaganda conversational network below.

We then ran MemeHunter and extracted 1,616 unique memes. Visual analysis of these memes illuminated a worldwide campaign to discredit Western powers, with a focus on the United States, France, and the United Kingdom. Below we have sampled some of the memes taking aim at the United States in particular. Notice that some of these are trying to sow doubt particularly in the minds of young military service members.

Putting the Tools into Action

Our team has tested BotHunter and MemeHunter in multiple case studies and research initiatives. This includes monitoring and identifying external manipulation in multiple election events, including the 2018 elections in Sweden and the 2019 elections in the Philippines and Canada. Our team also has used these techniques to monitor anti- NATO actors and actions surrounding the 2017 and 2018 Trident Juncture exercises in Europe. We continue to monitor multiple actors manipulating information in the Middle East, often pitting pro-Saudi Arabian versus pro- Iranian information operations. We have monitored several intimidation attacks such as a 2017 attack against NATO and the Atlantic Council’s Digital and Forensic Research Lab, as well as a 2017 intimidation attack against journalists in Yemen. We have monitored ongoing manipulation in Ukraine as well as global efforts by Russian and pro- Russian proxies. Finally, we have used these tools to assist defense and joint public affairs officers to understand their audience and followers better, highlighting the presence of bot, cyborg, and dormant accounts. In all cases, the tools discussed here allowed rapid triage of large and messy information streams in order to identify malicious actors and content.

National security in the 21st century will require investments in social cybersecurity. This will involve basic research into the interaction between technology and social behavior and beliefs. It will necessitate increasing investments into appropriate tools for identifying and neutralizing external manipulation of open and free societies. We also need accompanying policy changes that reflect the technical complexity of the modern information environment while remaining true to our national values. In the end, the appropriate research investments coupled with wise policy with a whole-of-government approach will ensure our nation and society continue unchanged in their essential forms with democratic institutions.

References

1 National Academies of Sciences, Engineering, and Medicine. A Decadal Survey of the Social and Behavioral Sciences: A Research Agenda for Advancing Intelligence Analysis (Washington, DC: The National Academies Press, 2019).

2 David M. Beskow and Kathleen M. Carley, “Introducing Bothunter: A Tiered Approach to Detection and Characterizing Automated Activity on Twitter,” in H. Bisgin, A. Hyder, C. Dancy, and R. Thomson (eds.), International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation (2018); David M. Beskow and Kathleen M. Carley, “Bot Conversations are Different: Leveraging Network Metrics for Bot Detection in Twitter,” International Conference on Advances in Social Networks Analysis and Mining (2018): 176-183; David M. Beskow and Kathleen M. Carley, “It’s all in a name: detecting and labeling bots by their name,” Computational and Mathematical Organization Theory 25 (2019): 24-35.

About the authors:

Lt. Col. Beskow is a doctoral candidate in the School of Computer Science at Carnegie Mellon University.

Dr. Carley is a professor of societal computing in the School of Computer Science at Carnegie Mellon University.

About Future Force Staff