R-Studio

Welcome to my second blog, this time we take a quick look at R.
R is a powerful programming language and environment for statistical computing and graphics. R-Studio is a more simplified version of R that allows the users to run R programming in a more user friendly environment. It is open source program and therefore free and available for download by anyone to experiment and write code and make graphical presentation of statistics. Various version of R can be downloaded from the following website www.rstudio.com.

Before going anywhere though, one has to become familier with the basic commands of R, and one of the best places to start with that is to take a quick eight chapter course on the website tryr.codeschool.com . Code schools helps you by allowing you to write your own code and by correcting your code if you make mistakes while explaining the different terminologies. and So that’s where I started too.
r-completed
r_completed

After finishing some “try r” exercises It was time for me to search for some data frame to use with r. After a long search on the internet I decided to use the USA Police fatal shootings csv file by the Washington Post found on kaggle.com which is a database of every fatal shooting in the United States by a police officer in the line of duty since January 1, 2015. I used the following commands to import my data frame into R-studio

>usaFatalPoliceShootings = read.csv(“usaFatalPoliceShootings.csv”, header = TRUE)

# I used the command below to view my data frame.

>usaFatalPoliceShootings

I then became interested in seeing the proportion of males shot by police in comparison with females to see was there a significant difference. I used the following commands to see how many males vs females made up the victims.

>usaFatalPoliceShootings$gender
>gender = usaFatalPoliceShootings$gender
>gender.freq = table(gender)
>gender.freq

and I got the following results.

gender
F           M
79       1772

To display the results in a bar graph form and show the significant difference, I used the command >barplot(gender.freq)

femalesvsmales_nocolor

I was happy with the graph but it wasn’t appealing for the reader to look at so I had to add some colors and descriptions

>colors = c(“pink”,”blue”, “violet”)
# this selected the colors to use for my graph

after running the command for barplot again, this is what we have

femalesvsmaleshootings_with-color

Straight away we have a graph that is easier to read and we can see that American males are much more likely to be victims of police fatal shootings than their female counterparts. But is it the whole of America that has gun crazy Cops?  So I asked R to plot shootings by state and use rainbow colors for easier visualization.

>barplot(table(usaFatalPoliceShootings$state), xlab =””, ylab=”Number of Shootings”, main = “Police Fatal Shootings by State”, col = rainbow(50))

policefatalshootings_bystate

Nice, but I didn’t like the abbreviated form of the different states and therefore I had to go back to my .csv file and change these to full names, and after re importing the data, I had full names of the states which was easier to understand for the average reader.  However I also noticed that not all state names are showing on the graph and therefore it makes it difficult to tell what states the other data relates to. So I went searching for a way to fix this and I came across the code below to fix the map.  Due to the length of the state names, I had to display them vertically to make them fit instead of the horizontal original format. I achieved this by adding las=2 at the end of my code as shown below.

>barplot(table(usaFatalPoliceShootings$state), xlab =””, ylab=”Number of Shootings”, main = “Police Fatal Shootings by State”, col = rainbow(50), las=2)

# by adding the mtext() command below, I could then style the x-axis description to be blue and not interfere with the state names

mtext(side = 1, “States”, line = 9, col = “blue”,)

The result of all this was a much better looking graph.

rplot2

It appears that not all states have high numbers of police shootings and so I began to wonder whether is there really a pattern of shootings, could there be a correlation between gender and or race. My data showed that while there were more shooting on whites than blacks, given the white vs black demographic ratios, black males were at a higher risk of getting fatally shot by an officer.

uspoliceshootings_byrace

My experience with R was getting  a lot more interesting and I wanted to do a lot more analysis of the situation and the data using R, but my time was up and Darren was waiting for the assignment.  I had to stop for a while and submit the assignment. But maybe in another blog I will be able to examine the data further and identify the most dangerous state for each race and maybe find some data to explain the reasons. For now, I think its safe to say until my research is completed, I will not be holidaying to Arizona, California, Florida or Texas.  Thank you for reading, look out for my next blog.

Sources

Quantitative Data – http://www.r-tutor.com
ggplot2 barplots : Quick start guide – http://www.sthda.com

Leave a Reply

Your email address will not be published. Required fields are marked *

Skip to toolbar