Police Killings Extraction

model diagram


In this project, we propose a new, socially-impactful task for natural language processing: from a news corpus, extract names of persons who have been killed by police. We provide a newly collected police fatality corpus, and have developed an EM-based, distantly supervised model for the problem, by combining web news text with historical data from the excellent Fatal Encounters crowdsourced project. Systems can be evaluated on this corpus to aid further development of automated fatality extraction methods.

More details in in the paper:


  • EMNLP, August 2017 [slides].
  • KDD Data Science and Journalism workshop, August 2017 [slides].
  • Bloomberg Data for Good Exchange, September 2017.

Press coverage

Code and Datasets

We've released two versions of the data. If you use them in research, please cite the paper. Thanks! Supporting code, including the evaluation script, is available here.