Somewhat unexpectedly, Twitter announced today that a dataset of tweets, accounts, and media associated with state-sponsored efforts on its platform would be made public. Unfortunately, the vast majority of the data is available only to researchers.
Both the more recent Iran-linked accounts believed to be meddling in the U.S. midterms, as well as Russia-backed Internet Research Agency (referred to commonly as a “troll farm”) tweets meant to sway the 2016 presidential election are included in the data. In total it comprises the work of just over 4,600 accounts, with tweets ranging from the obvious political bait (shares from Breitbart and retweets of Donald Trump Jr) to baffling and mundane (“#AllWentWrongWhen Nirvana stop.”)
Twitter, however, took the precaution of hashing the data associated with accounts that had accrued under 5,000 followers, meaning they’re fairly worthless as public disclosures go. This cutoff metric seems arbitrary, so we’ve requested justification from Twitter and will update when we receive a response. Researchers are able to request unhashed versions, and according to the company the full data has already been shared with “a small group of researchers with specific expertise in these issues.”
More than anything, Twitter’s electioneering dataset points to the total lack of standardization when it comes to disclosures of this kind. Most social platforms are believed to have been manipulated in some way by foreign powers, and where Tumblr merely made public the user handles of associated accounts, Reddit suspended and archived its state-sponsored accounts (albeit promising only to archive them temporarily, without specifying a timeframe.) Facebook built a tool to search for political ads on its platform—though the ads themselves believed to have been purchased by Russian operatives were released instead by the Senate Democrats.
Download the available data or request the unhashed version here.