Despite many efforts to automatically identify toxic comments online (including sexual harassment, threats, and identity attacks), modern systems fail to generalize to the diverse concerns of Internet users. This dataset consists of 107,620 social media comments annotated by 17,280 unique participants, and was collected to understand how user expectations for what constitutes toxic content differ across demographics, beliefs, and personal experiences. The dataset is encrypted – please contact Deepak Kumar for the password.
Designing Toxic Content Classification for a Diversity of Perspectives
USENIX Symposium on Usable Privacy and Security (SOUPS) 2021
- Deepak Kumar, Patrick Gage Kelley, Sunny Consolvo, Joshua Mason, Elie Bursztein, Zakir Durumeric, Kurt Thomas, Michael Bailey
- Deepak Kumar
107,620 social media comments labeled by five annotators each.
|File Name||MetaData||SHA-1 Fingerprint||Size||Updated At|