VoterFraud2020: a Multi-modal Dataset of Election Fraud Claims on Twitter
The wide spread of unfounded election fraud claims surrounding the 2020 election had resulted in undermining of trust in the election, culminating in violence inside the u.s. capitol.
Under these circumstances, it is critical to understand discussions surrounding these claims on twitter, a major platform where the claims disseminate.
To this end, we collected and release the voterfraud2020 dataset, a multi-modal dataset with 7.6 million tweets and 25.6 million retweets from 2.6 million users related to voter fraud claims.
To make this dataimmediately useful for a wide area of researchers, we further enhance the data with cluster labels computed from the retweet graph, user suspension status, and perceptual hashes of tweeted images.
We also include in the dataset aggregated information for all external links and youtube videos that appear in the tweets.
Preliminary analyses of the data show that the ban actions mostly affected a specific community of voter fraud claim promoters, and exposes the most common urls, images and youtube videos shared in the data.
Anton Abilov, Yiqing Hua, Hana Matatov, Ofra Amir, Mor Naaman