Overview

 

Beginning my examination of the ‘fatal-police-shootings-data’ dataset in Python, I’ve initiated the process of loading the data to inspect its various variables and their respective distributions. Notably, one variable that stands out is ‘age,’ which is a numerical column providing insights into the ages of individuals who tragically lost their lives in police shootings. Furthermore, the dataset includes latitude and longitude values, enabling us to pinpoint the precise geographical locations of these incidents.

 

During this initial evaluation, I’ve come across an ‘id’ column, which seems to have limited relevance to our analysis. As a result, I’m contemplating its exclusion from our further investigation. Delving deeper, I’ve conducted a scan of the dataset for missing values, revealing that several variables contain null or missing data, including ‘name,’ ‘armed,’ ‘age,’ ‘gender,’ ‘race,’ ‘flee,’ ‘longitude,’ and ‘latitude.’ Additionally, I’ve examined the dataset for potential duplicate records, uncovering only a single duplicate entry, noteworthy for its lack of a ‘name’ value. As we progress to the next phase of this analysis, our focus will shift towards exploring the distribution of the ‘age’ variable, a pivotal step in extracting insights from this dataset.

 

In today’s classroom session, we acquired essential knowledge on computing geospatial distances using location information. This newfound expertise equips us to create GeoHistograms, a valuable tool for visualizing and analyzing geographical data. GeoHistograms serve as a powerful instrument for identifying spatial patterns, locating hotspots, and discovering clusters within datasets associated with geographic locations. Consequently, our understanding of the underlying phenomena embedded within the data is greatly enhanced.

Leave a Reply

Your email address will not be published. Required fields are marked *