Alcohol and radar plots in R with ggplot2

Radar plots may be an unusual way to represent data, but under the right circumstances they can provide meaningful visualizations. In this post I will present how to create and customize some basic parameters of radar plots in R programming language.

Make sure you have the following packages:

require(ggplot2)
require(tibble)

The easiest way to generate good-looking  yet simple data visualization is through the syntax of Grammar of Graphics, cleverly implemented by Hadley Wickham in his ggplot2 package. The tibble package, created by this same author, provides an alternative to the use of dataframes. In this post we’ll give it a try.

The database used in this example can be found at The World Health Organization Global Health Observatory data.

Radar plots are generally used to represent higher dimensional data in two dimensions. It does so by plotting each variable into a separate axis resembling polar coordinates. Each axis is arranged radially from the center at an equi-angular distance from each other. Then each observation is potted according to the value presented on each category, usually joined by a line forming a polygon.

Nevertheless, this way of representing data can be misleading. For instance, the use of many axis can interfere with the visualization. Furthermore, the use of different units among axis (and unit separations) is regarded as inappropriate. A general critique of radar plots can be found here.

Taking in account the possible downsides when using radial plots, we proceed to obtain and clean the data with the following code:

Cleaning is also important in the process of data exploration. In this case we get rid of some useless variables and add some rows concerning the average data.

Now we continue with the visualization. At first we plot all the observations we have. Then we move on to select specific cases to show how can radar plots can contribute to a better understanding of the data.

As a result, the following plots are generated:

general_radial

In this plot we can see the 53 different countries in a radar form. The radar plot is not a good alternative to graph many observations. The shapes generated for each observation are indistinguishable  from each other. For this reason, it was decided to use the same color except in the average data. So at the end, this plot is merely illustrative.

rend

The previous graph made obvious the need to reduce the number of observations. In this case 5 countries were randomly selected. Now we are able to see the observations and their consumption pattern.

select

To understand the nature of this radar plots, sometimes it is useful to use data you know. For this reason we selected some countries from which their alcoholic consumption preference are obvious. As stereotypes dictate, France shows a strong preference toward wine consumption, Germany toward beer and Russia toward spirits. We can clearly see and compare the consumption patterns for each country.

ir

At a glance, the presentation of data in this format is really useful. The radar plot points toward the direction with more consumption per country. We can think of the shape of this plots as a “preference polygon” for each country. In this view the plots don’t overlap so we have a clean visualization. It provides an overall understanding of the consumption pattern of alcohol by country.

 

At the end, using radar plots is a tricky task. Not all data fits in the format, dimensions or number of observations that can make a radar plot interesting. As a rule of thumb, I would suggest only to use radar plots when you are confident that they can provide accurate meaning. Otherwise, the advise is to avoid them at all.

 

For further information:

This post was inspired by the publication From Parallel Plot to Radar Plot by Erwan Le Pennec.

Useful perspective offered by Graham Odds in A Critique of Radar Charts.

 

 

 

 

Advertisements

Download and plot financial data in R

This post covers some basic procedures for downloading and plotting financial data in the R programming language.

 

Before starting

  • It is highly recommended to work with the IDE RStudio for R.
  • For aesthetic reasons, the package ‘ggplot2’ will be used for the data visualization process. The package ‘tidyr’ will also be required. To download, run the following code in the RStudio’s console:
    install.packages('ggplot2')
    install.packages('tidyr')
  • It is necessary to locate the financial data of your interest within an online database (suggested online database here) with an API that allows to download the data set. In this post the USD/EUR, SGD/EUR and CHF/EUR exchange rates will be used as an example, obtained from the European Central Bank (ECB) database in Quandl. The use of other databases may need some adjustments to the code presented in this post.

Code…

Defining the time interval

First thing to do is to identify the time interval for the financial data. This will be stored as a vector named inter.t. The function Sys.Date() gives the actual date registered in your system. As it can be seen, in this example the time interval is one year long (365 days).

# time interval
 inter.t <- c((Sys.Date()) - 365, (Sys.Date()))
# libraries 
library(ggplot2)
require(tidyr)

Downloading, reading and manipulating the data

We proceed to indicate the “links” for downloading the data. Note that Quandl’s R package can be used in this step.

  • url: vector containing the urls for the download.
  • nam: vector for storing the names of the “to be downloaded” data as .csv
  • variables: vector containing the name of the financial data.
# URLs for downloading data:
url <- numeric(); nam <- numeric(); 
variables <- c("EURUSD", "EURSGD", "EURCHF")

# exchange rate USD per EUR
url[1] <- paste("https://www.quandl.com/api/v3/datasets/ECB/EURUSD.csv?start_date=", 
 inter.t[1], "&end_date=", inter.t[2]); nam[1] <- "eurousd.csv"

# exchange rate SGD per EUR
 url[2] <- paste("https://www.quandl.com/api/v3/datasets/ECB/EURSGD.csv?start_date=", 
 inter.t[1], "&end_date=", inter.t[2]); nam[2] <- "eursgd.csv"

# exchange rate CHF per EUR
 url[3] <- paste("https://www.quandl.com/api/v3/datasets/ECB/EURCHF.csv?start_date=", 
 inter.t[1], "&end_date=", inter.t[2]); nam[2] <- "eurchf.csv"

Now the data is actually going to be downloaded into your working directory as a .csv (with the function download.file()). We proceed to read back the data to the environment with read.csv() function in a for loop to avoid writing the instruction for each variable. The order() and as.Date() functions will be used within the same for loop to avoid any data discrepancy.

  • le: number of financial variables.
  • fin.data: list that contains dataframes (financial data).
  • number.data: number of days each financial data has.
le <- length(url); fin.data <- list()

for (i in 1:le) {
 download.file(url[i], nam[i])
 fin.data[[i]] <- read.csv(file = nam[i], header = TRUE, sep = ",", na.strings = TRUE)
 fin.data[[i]] <- fin.data[[i]][order(fin.data[[i]][, 1]), ]
 fin.data[[i]][, 1] <- as.Date(fin.data[[i]][, 1])
 colnames(fin.data[[i]]) <- c("Date", "Value", rep("Value", ncol(fin.data[[i]])-2))
}
names(fin.data) <- variables

# general information about the database
number.data<- lapply(fin.data, nrow)
number.data

The list number.data has the number of observations (daily rates) we have for the exchange rates. A general view of the data for each rate can be done with the head() function.

lapply(fin.data, head)

So now we have the financial data in a list named fin.data. Inside this list there are three dataframes, each one containing the historical information (date and value) of an exchange rate. Furthermore, in the dataframe we can find two columns. The first one being Date (object class Date) and the other being Value (object class numeric).

This particular arrangement of the data is quite intuitive and useful. It can be handy while trying to manipulate data in order to do operations or analysis. Nevertheless, in the following step we are going to create a general dataframe, another useful way to store this kind of data.

  • gen.findat : dataframe containing the financial data
  • dff: an auxiliary variable.
gen.findat <- data.frame(Date = character(0), Exchange.rate = character(0), Value = numeric(0))
for (i in 1:le){
 dff <- gather(fin.data[[i]], Exchange.rate, Value, -Date)
 dff$Exchange.rate <- rep(variables[i], nrow(dff))
 gen.findat <- rbind(gen.findat, dff)
}

Plotting

Once having done all the previous code, plotting the exchange rates will be quite easy.

ggplot(gen.findat, aes(x = Date, y = Value, group = Exchange.rate)) + 
 geom_line(aes(colour = Exchange.rate), size = 0.8) + 
 ggtitle("Foreign currencies per EUR") + 
 ylab("Monetary units") + 
 theme_light()

Result

At the end we are left with a messy environment and some useful variables. As it has been said before, the variables of interest are the ones that contain the financial data. In this case fin.data and gen.findat. Feel free to clean the environment and leave the variables you want.

EURI

The package gglpot2 comes with many editing options. I encourage you to explore them and make a more elegant presentation for the plots.

If you consider that there is a better way of doing a process shown in this post, feel free to mention it in the comments. I will be glad to read your recommendations, it’s always good to learn from other perspectives.

 

You can download this code or others in github.

For writing readable code, check the Google’s R Style Guide here.