Skip to content

Releases: ucinlp/covid19-data

COVIDLies v1.0

01 Mar 08:32
125e68a
Compare
Choose a tag to compare

To facilitate research in automatic COVID19 misinformation detection, we introduce the COVID-Lies dataset for misconception detection on Twitter. We have collected a dataset of 62 common misconceptions about the disease along with related tweets, identified and annotated by researchers from the UCI School of Medicine. Given a tweet, our data identifies whether any of the known misconceptions are expressed by the tweet, and if so, whether the tweet propagates the misconception (agree/pos), is informative by contradicting it (disagree/neg), or is neither misinformative nor informative (no stance/na).

COVIDLies v1.0 consists of 6591 misconception-tweet pairs with expert annotated stance labels. This is an evolving dataset as annotation is ongoing.

Note, that the following changes to the misconceptions have been made in this release with re-annotations performed where needed.

  • Removal:
    • Political: Misconceptions pertaining to the actions of particular political parties, governments, religious groups, or ethnicities, were removed. Eg. 'Trump is fulfilling his promise to hit Iranian cultural sites, if Iranians took revenge for the US airstrike that killed of Quds Force Commander Qasem Soleimani.'
    • Multi-modal: Misconceptions about non-textual modalities, such as, images and videos were removed. Eg. 'Coronavirus is a state-supported "a bioweapon that went rogue" and also fake videos alleging that Chinese authorities are killing citizens to prevent its spread.'
    • Duplicates: De-duplication of misconceptions was performed. Eg. 'Holy communion cannot be the cause of the spread of coronavirus' was removed while 'Coronavirus cannot be spread by practicing holy communion.' was kept.
  • Compound to atomic: Compound misconceptions were split into atomic misconceptions. Eg. Avocado and mint tea, hot whiskey and honey, essential oils, vitamins c and d, fennel tea and cocaine cure coronavirus. --> 'Avocado and mint tea cures coronavirus.', 'Essential oils cure coronavirus.', 'Vitamin C cures coronavirus.','Vitamin D cures coronavirus.', 'Fennel tea cures coronavirus.', and 'Cocaine cures coronavirus.'
  • Corrections: Eg. 'There were more than 50000 cremations in Wuhan for 4th Quarter, 2020.' --> 'There were more than 50000 cremations in Wuhan for 4th Quarter, 2019.'
  • Edits: Eg.'Chloroquine was used to cure over 12,000 covid-19 patients.' --> 'Chloroquine can cure coronavirus.'

To comply with Twitter’s Terms of Service , we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.

COVIDLies v0.2

13 Jan 02:38
56b226c
Compare
Choose a tag to compare

To facilitate research in automatic COVID19 misinformation detection, we introduce the COVID-Lies dataset for misconception detection on Twitter. We have collected a dataset of 86 common misconceptions about the disease along with related tweets, identified and annotated by researchers from the UCI School of Medicine. Given a tweet, our data identifies whether any of the known misconceptions are expressed by the tweet, and if so, whether the tweet propagates the misconception (agree/pos), is informative by contradicting it (disagree/neg), or is neither misinformative nor informative (no stance/na).

COVIDLies v0.2 consists of 8937 misconception-tweet pairs with expert annotated stance labels. This is an evolving dataset as annotation is ongoing.

To comply with Twitter’s Terms of Service , we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.

COVIDLies v0.1

23 Nov 22:15
d45f928
Compare
Choose a tag to compare

To facilitate research in automatic COVID19 misinformation detection, we introduce the COVID-Lies dataset for misconception detection on Twitter. We have collected a dataset of 86 common misconceptions about the disease along with related tweets, identified and annotated by researchers from the UCI School of Medicine. Given a tweet, our data identifies whether any of the known misconceptions are expressed by the tweet, and if so, whether the tweet propagates the misconception (agree/pos), is informative by contradicting it (disagree/neg), or is neither misinformative nor informative (no stance/na).

COVIDLies v0.1 consists of 6761 misconception-tweet pairs with expert annotated stance labels. This is an evolving dataset as annotation is ongoing.

This version was used for our paper at the EMNLP 2020 Workshop on NLP for COVID-19 (Best Paper Award). For a full description of the data, and associated models, please see our paper :

  • Detecting COVID-19 Misinformation on Social Media
    Tamanna Hossain, Robert L. Logan IV, Arjuna Ugarte, Yoshitomo Matsubara, Sean Young, Sameer Singh
    (PDF, Discussion Forum)

To comply with Twitter’s Terms of Service , we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.