WBS- Part3 - lappy to make Individual file of Each Topic

WBI is large set of data. In last two blog we discussed how to do cleaning and manipulation along with make beautiful visualization.
But some time, we don't have enough resources or its gets boring to run same code time again and again. What if we had data that was cleaned (ready for any analysis) and arrange topic wise so that we can easily access when we need, without having to do all dirty work in R. May be you don't have R in other computer and want to run some analysis on tableau or excel on any particular topic. Its always good to have cleaned data.
We will take big chunk of data and do all cleaning and manipulation than produce csv for each topic and save it for further access.

Lets get started
First part is similar  to old tutorial so I will just paste the code there.

###download world bank data "http://data.worldbank.org/products/wdi" 
#>> "Data catalog downloads (Excel | CSV)">> "CSV"
##unzip and keep in directory of your choice my is "M:/R_scripts/Combine"
#################load required package

##if (!require("dplyr")) install.packages('dplyr') 
# if you are not sure if package is installed
suppressPackageStartupMessages(require("dplyr"))
suppressPackageStartupMessages(require("tidyr"))
suppressPackageStartupMessages(require("reshape2"))
suppressPackageStartupMessages(require("readr"))
suppressPackageStartupMessages(require("googleVis"))
currentDate = Sys.Date()

#########Set the file directory
setwd("M:/")
filepath=getwd()
setwd(paste(filepath, "R_Script/Combine", sep="/"))

#####readfile from your directory
wdi = read_csv("WDI_Data.csv")
country = read_csv("WDI_Country.csv")
i_name= read_csv("WDI_Series.csv")

#### create subset of above data, select only required row
## required col from wdi
wdi_sub = wdi[ , c(1,3,5:60)]

##lets run anysis on country name only; 
#country name in wdi file has other names like summary of region
country_sub = subset(country, country$`Currency Unit`!="" ,
select = c("Table Name", "Region")) # if currency unit is blank its not country
colnames(country_sub) <- c("Country Name", "Region")

Now lets make list of all Topic from i_name

# lets make list of topic
i_name_sub = as.data.frame(table(i_name$Topic))
i_name_sub = as.character(i_name_sub[,1])
Now we are all set. Let run loop to get subset of each topic's indicator name than we will left join with required data frame.

###let used lappy on each topic lapply(i_name_sub, function(x){
## take each list as temp and get Indicator Name related to it
temp = as.character(x)
temp = subset(i_name, i_name$Topic==temp, select="Indicator Name")
##left join to get only those Indicator data and country
wdi_sub_temp = left_join(country_sub, wdi_sub)
wdi_sub_temp = left_join(temp, wdi_sub_temp)
##gather date and expand Indicator Name
wdi_sub_temp = gather(wdi_sub_temp, "years", "sample", 4:59)
colnames(wdi_sub_temp) <- c("Indicator.Name", "Country.Name","Region" ,"years", "Value")
wdi_sub_temp = dcast(wdi_sub_temp, Country.Name+years+Region~Indicator.Name, value.var = "Value", na.rm = T )
##make years as date
wdi_sub_temp$years = paste(wdi_sub_temp$years,"-01-01", sep="")
wdi_sub_temp$years=as.Date(wdi_sub_temp$years, "%Y-%m-%d")
##let make unique ID in each dataset if we want to join later on for any analysis
wdi_sub_temp$ID_for_join = paste(wdi_sub_temp$Country.Name, wdi_sub_temp$years, sep="-")
##save file
setwd(paste(filepath, "R_script/Output", sep="/"))
csvname = paste(gsub(":",",",x),".csv",paste=" ") #file name cant have ":"
write.csv(wdi_sub_temp, file=csvname, row.names = F)
setwd(filepath)
})
###total of 91 file will be produced
###You can find all 91 file #here https://www.dropbox.com/sh/sk7f7uoz9t7mb38/AACxA8gGTXZJV90CycB4uT_Ka?dl=0
##download anyfile you need and play around.
#happy coding
Advice: Don't used 'for', 'while' loop, try to avoid them as much as possible, (for if used ifelse). I know you are used to with for loop but it too slow in R. Always used apply family as far as possible no matter how small loop is. If you want to be good at R, You will have to know apply. Don't try to find other option. ( I used 'for' loop in R for long time but I had go to basic of lappy and learn it, it inevitable) 

We have all the file ready for analysis, All 91 file are available at Dropbox_WBS, You can download any file and play around.

WBS - Part 2 - Data Visualization(Public Sector: Defense & arms trade)

In part 1 we did some cleaning and data manipulation , now let make some good used of that data.
In last tutorial we had 1477 row of data of Public Sector: Defense & arms trade--(topic).
Lets built some visualization that is interesting and lucid.
We will make google motion chart (like Hans) and dashboard in Tableau.

This data.frame is still not ready as input for visualization. Lets make it.
Things to remember
1. Always make date in narrow format, it will make many thing easy for all type of visualization(pivot table, tableau , motion chart), as date are mostly used as filter, its a good practices.
2.  Expand those factors which can be made into individual column and we can analysis each column separately (like indicator name here, we can make column of each indicator so that we can analysis only one indicator at a time). This will give as more flexibility for analysis. This is good practices in any type of analysis (pivot, tableau, others)



####lets gather all data into narrow form wdi_sub = gather(wdi_sub, "years", "sample", 4:59)
#### Rename for easy handling
colnames(wdi_sub) <- c("Indicator.Name", "Country.Name","Region" ,"years", "Value")
#### lets put indicator name in column name
wdi_sub = dcast(wdi_sub, Country.Name+years+Region~Indicator.Name, value.var = "Value", na.rm = T )

One important requirement of google motion chart is it need one variable as date, so lets make years to data variable in r
###add month and day
wdi_sub
$years = paste(wdi_sub$years,"-01-01", sep="")
wdi_sub$years=as.Date(wdi_sub$years, "%Y-%m-%d")

Now we are ready
Few function input has to be define other can be change from chart itself gvisMotionChart -- parameters
###google motion chart
M
= gvisMotionChart(wdi_sub, idvar = "Country.Name", timevar = "years", colorvar = "Region",
options = list(width = 700, height = 600), chartid= "country_gdp")
plot(M)
###save file
setwd(filepath)
dir.create("Output")
setwd(paste(filepath, "Output", sep="/"))
write.csv(wdi_sub, file="Public Sector: Defense & arms trade.csv")
setwd(filepath)
rm(list=ls())

This is the plot you will get as output: Public Sector: Defense & arms trade.html

Play around with it, change axis, color, size var as you like.

This is format of data used for google motion chart (same file will be used to make tableau dashboard, and you can even use it to make pivot table)


Tableau
Import the saved file to Tableau and change the format of all numerical filled to numeric ( as data has NA these field will be marked as text by tableau). You can download the workbook from Military_Expenditure(Defense arms trade) and follow along.





What I like in twitter

Contact Form

Name

Email *

Message *