Chapter 2 Proposal

2.1 Research topic

Behind the beautiful skyline and bustling streets, criminal scenes may happen on every corner at any moment. The consecutive murder incidents that happened this summer and the shooting in the East Village right after the Fall semester starts caused panic among the citizens. Public safety has always been a concern, especially for the ones who just moved to New York City. Though these shooting incidents seem to be random at first glance, data analysis and visualization help reveal the hidden correlation and provide hints about public safety concerns.

The New York Police Department records reported crimes and releases data sets related to police enforcement and criminal activity, aiming to increase transparency and foster collaboration. With the data provided by NYPD, we are interested in the citywide crime statistics, specifically shooting incidents in 2022. It is worth studying these up-to-date shooting incidents when living in NYC, thereby we can have a clear picture of the crimes around us to avoid possible dangers. Specifically, we are interested in questions relating to the characteristics of victims and suspects, and the possible correlation between the victims/suspects’ characteristics with respect to the incident’s geography or occurrence time.

2.2 Data availability

The datasource of this project is from the data published and maintained by the New York Police Department. There were many other data sources that we considered when deciding to analyze crime-related information about New York City, however, the New York Police Department is the most reliable source as their data is collected based on the reported crimes they record. Therefore, we considered collecting the citywide crime data directly released by the New York Police Department.

NYPD provides quarterly-updated crime incident datasets such as arrest and complaint, among which we are most interested in and therefore decide to analyze the shooting incidents this year. We may also consider referring to statistics of other crimes or shootings in former years for comparison.

The New York Police Department provides the shooting incidents data in multiple formats on their official website, among these formats, we found csv to be the most convenient to be analyzed in R. The next steps would be importing the data from the NYPD website and performing data pre-processing such as combining datasets and cleaning by using either Python or R.

Overall, the data is reliable and of good quality. However, there are null values for certain columns, for example, LOCATION_DESC and information about the perpetrator like PERP_AGE_GROUP and PERP_SEX. Though the missing information about the perpetrators is understandable as the police may not be able to arrest the perpetrator in the crime scene, we will check these missing values specifically with a missing value analysis. We will pay special consideration regarding missing values in different situations in our project, for some tasks we may need to neglect these missing values as the information is not provided to us, while for some tasks about the perpetrator we may regard these missing values as a separate group and perform analysis on this group.