Machine learning system can detect foreign social media influence campaigns using content alone

A research team led by Princeton University has developed a technique for tracking online foreign misinformation campaigns in real time, which could help mitigate outside interference in the 2020 American election.

The researchers developed a method for using machine learning to identify malicious internet accounts, or trolls, based on their past behavior. Featured in the journal Science Advances, the model investigated past misinformation campaigns from China, Russia, and Venezuela that were waged against the United States before and after the 2016 election.

The team identified the patterns these campaigns followed by analyzing posts to Twitter and Reddit and the hyperlinks or URLs they included. After running a series of tests, they found their model was effective in identifying posts and accounts that were part of a foreign influence campaign, including those by accounts that had never been used before.

They hope that software engineers will be able to build on their work to create a real-time monitoring system for exposing foreign influence in American politics.

Their findings show that content-based features such as a post’s word count, webpage links, and posting time can act like a digital fingerprint for such influence campaigns, which could help social media companies, users, or investigators prevent the spread of misinformation and election interference. Previous attempts to detect coordinated disinformation efforts have focused on simpler approaches, such as detecting bots or comparing the follower/friendship networks of posters. However, these approaches are often foiled by posts from human agents or from new accounts, and are often platform-specific.

The researchers hypothesized that large, online political influence operations use a relatively small number of human agents to post large amounts of content quickly, which would tend to make these posts similar in topic, word count, linked articles, and other features. To test this, the researchers created a machine learning system trained on datasets of early activity from Russian, Chinese, and Venezuelan influence campaigns on Twitter and Reddit. They found the system could reliably identify those campaigns’ subsequent posts and distinguish them from regular posts by normal users. The system was less reliable when it was trained on older data and when the campaign in question was more sophisticated, indicating that such a system would not be a comprehensive solution. The authors suggest that, while widespread use of such machine learning systems could drive bad actors to change their approach and avoid detection, it could also force them to adopt tactics that are more costly or less influential to do so.

News Source: Eurekalert

Related News: