UMass SLANG Lab: Police Killings Extraction

Summary

In this project, we propose a new, socially-impactful task for natural language processing: from a news corpus, extract names of persons who have been killed by police. We provide a newly collected police fatality corpus, and have developed an EM-based, distantly supervised model for the problem, by combining web news text with historical data from the excellent Fatal Encounters crowdsourced project. Systems can be evaluated on this corpus to aid further development of automated fatality extraction methods.

More details in in the paper:

Identifying civilians killed by police with distantly supervised entity-event extraction. Katherine A. Keith, Abram Handler, Michael Pinkham, Cara Magliozzi, Joshua McDuffie, and Brendan O'Connor. Proceedings of EMNLP 2017. [pdf]

Presentations

EMNLP, August 2017 [slides].
KDD Data Science and Journalism workshop, August 2017 [slides].
Bloomberg Data for Good Exchange, September 2017.

Press coverage

This AI reads the news to keep tabs on US police shootings. New Scientist, September 22, 2017.

Code and Datasets

We've released two versions of the data. If you use them in research, please cite the paper. Thanks!

PoliceKillingsExtraction-ments-v1.zip, (23.1 MB) of sentence segmented, mention-level, distantly labeled data used in experiments.
PoliceKillingsExtraction-html-v1.zip (48.3 GB) of all HTML documents scraped in 2016.

Supporting code, including the evaluation script, is available here.