A short introduction to R for fisheries scientists

Laura Uusitalo, January 2006
laura.uusitalo@iki.fi

Valid HTML 4.01 Transitional

Valid CSS!

Basic knowledge of the R language is required in order to be able to use FLR. This one-day course aims at giving the required knowledge as well as references to study material in the web.

Contents:

How to get started

Install R (Windows, all versions). You may also want to install a handy editor, Tinn-R; see FLR project's tips on how to install it. However, this is optional. Tinn-R is especially handy when working with large files.

In these pages, I present example code but not the results; you should run the code in R and see what happens.

Basics of R

Syntax

Launch R. you'll get html help pages by typing:

help.start()

You can create a variable by assigning it a value:

x<-6

Note that R is case sensitive! That is, e.g. x and X are different variables. You can inquire the value of a variable by writing its name, or request a list of variables by writing

ls()

Experiment with:

x 	#after the hash sign (#) you can write your comments; 
	#they will not be interpreted by R
x<-7 	#adding a new value; the old ne will be erased
x<-x+2 	#add to the previous value
6->z	#you can also assign values this way 
a<-1;b<-2;c<-3	#several commands on one line can be separated by a semicolon (;)
a
a+2
b;c

Note that if you write an artihmetic operation without assigning it to any variable (e.g. a+2 instead of d<-a+2), the result will be displayed but not saved.

Booleans and strings

In addition to numeric values, R objects can include Boolean values TRUE and FALSE, and text strings. Strings must be in quotes, e.g. vari<-"punainen".

#Boolean:
totta<-T
valetta<-F
 
# ! is the negation sign;
!totta

#Character string:
color<-"red" 

Vectors

The basic data types in R are vectors, matrices, lists, and data frames. Vectors can be created with command c():

y<-c(1,3,5,7,9,11)	#create a vector
y[3]    		#Third element of y, as counting starts from 1
y[3]<-55
y

All operations on vectors will be performed element-wise; for example:

y+2
x<-c(1,2,3,4,5)
z<-x^2
z
x+z

##Try also:
a<-c(1,2)
b<-c(1,2,3,4)
c<-c(1,2,3,4,5)
a+b	#what happens?
a+c	#what happens now?

You'll notice that R is very flexible and performs calculations some other software products would perhaps refuse to perform. Note also that scalar values are technically vectors with just one element.

Vectors can be concatenated easily:

d<-c(a,b)

You can refer to a part of a vector's elements:

a[1]
d[2:4]
d[-(4:5)]
d[d>3] 

R functions include (but are not restricted to):

log(x)		#natural logarithm
log10(x)	#base 10 logarithm
logb(x, n)	#base n logarithm
exp(x)		#exponential function
x^2		#rising to power (of two in this case)
sqrt(x)		#square root
exp(x)		#exponential function, e^x
sin(x)		#sine, trig
max(x)		#maximum value in vector x
min(x)
range(x)	#min and max
length(x)	#length of x
mean(x)		#mean	
var(x)		#variance
median(x)
sd(x)		#standard deviation

Sequences can be created easily:

a<-1:100
a

b<-seq(0,30,by=0.3)
b

Vectors can contain only one type of objects, i.e. only numbers, only text, etc, and not a combination of those. For example,

a<-c(1,5,7,9,11)
is.numeric(a[1])
a<-c(1,"five",7,9,11)
a
is.numeric(a[1])
is.character(a[1])

#Note that if you write text without quotation marks (""),
#it will be interpreted as a variable:
five<-500
a<-c(1,five,7,9,11) 
a

names()-function can be useful with vectors.

ihmisia<-c(12,4,3)
names(ihmisia)<-c("aikuisia","lapsia","vanhuksia")
ihmisia
ihmisia["aikuisia"]

Matrices

Matrices can be created by reshaping or combining vectors. Let's look at reshaping first.

y<-c(1.2,3,5.6,4,8,7.2,8.8,2.1,5.6,0,7.1,5)
is.vector(y)
M1<-y
#set the demensions: vectors is reshaped into matrix
##Note that the order of the dimensions is rows, columns
dim(M1)<-c(2,6)
#Note that the vector is "read" into a matrix by column 
M1
is.matrix(M1)

#Or using array()
M2<-array(y,c(3,4))
M2
is.matrix(M2)

#Or using matrix()
M3<-matrix(y,ncol=6)
M4<-matrix(y,ncol=6,byrow=T)

What happens if you try to reshape a vector into a matrix but there's wrong number of elements in the vector? You'll find that the same thing happens as above with vectors: R fills in the matrix by starting with the beginning of the vector again. Try:

#create vector z, which is the same as y but with one more element, 5.
z<-c(y,5)
M4<-matrix(z,ncol=6)

Matrices with more that two dimensions are also possible:

p<-c(1,2,3,4,5,6,7,8,9)
p2<-2*p
p3<-3*p
pp<-c(p,p2,p3)
P<-array(pp,c(3,3,3))

Vectors can also be combined to form matrices:

a<-c(1,2,3,4,5)
b<-c(6,7,8,9,0)
A<-rbind(a,b)	#vectors form rows in the new matrix
B<-cbind(a,b)	#vectors form columns in the new matrix
C<-cbind(A,c(1,2))	#What happens here?

Lists

Lists are ordered collections of objects, known as its components. The components don't have to be of same type.

fishery<-list(place="Baltic",species="Herring",TAC=c(570,490,372,260,203.349,204.549),
years=c(1999,2000,2001,2002,2003,2004))
fishery
fishery$species
fishery$years[3]

List components are always also numbered and can be referred to by them:

fishery[[2]]	#second component
fishery[[4]][1]	#fourth component, first element

Data frames

Data frames consist of vectors of the same length. They differ from matrices in that they can contain elements of different types, i.e. numbers and characters and Boolean values. Data frames are a common way to present data.

kids<-data.frame(name=c("Ann","John","Max","Susan"),age=c(8,7,6,7),girl=c(T,F,F,T))
kids
#Also:
kids.names<-c("Ann","Grace","John","Max","Philip"," Susan","Theo")
kids.ages<-c(8,5,7,6,9,7,6)
kids.girls<-c(T,T,F,F,F,T,F)
kids.data<-data.frame(name=kids.names,age=kids.ages,girl=kids.girls)

#Logical subsetting:
?subset
subset(kids.data,girl==T)

#Also:
kids.data[1,]	#What happens?
kids.data[,2]	#What happens now?
#Make sure that you understand what happens here:
##(The previous two rows help)
kids<-kids.data[(kids.data[,3]==T),]

Loops and conditional structures

Loops and conditional execution of commands are crucial in any programming. Conditional execution is available through if() command:

if (expr_1) expr_2 else expr_3

where expr_1 must have Boolean value; if it is true then expr_2 is executed, if false, then expr_3. For example:

totta<-T
if (totta) print(ihmisia) else print(a)

#Or in several lines (practical if expressions 2 and 3 are long):
if(totta){
print(ihmisia)
}else print(a)

#Or:
if(totta){
print(ihmisia)
}else{ 
print(a)}

#Boolean value can also be the result of a comparison:
a<-1
if(a==2) print("The value of 'a' is two.") else print("The value of 'a' is not two!")

Loops can be created using for(), repeat() and while() commands.
for (years in 1997:2004){
print(1997:years)}

year<-1
while (year<5){
print(year)
year<-year+1}

#Loop inside a loop
##every year, we go through all the years so far
for (years in 2000:2004){
for(i in 2000:years){
cat("Now it is year ",years,", handling data from year  ",i,"\n",sep="" )}}

Reading in data files

Save copies of datafiles tigerprawn.txt, tigerprawn2.csv and lengths.txt on your computer. Take a brief look at these files.

There are several functions for reading in data from files in R. We'll take a look at read.table() and scan() functions.

If the data in the original file is in a table format, read.table() function is the most convenient:

prawn<-read.table("YOURPATH/tigerprawn.txt",header=T,skip=3)
#header=T means that there are headers in the file
#skip tells how many rows will be skipped from the beginning
is.data.frame(prawn)

prawn2<-read.table("YOURPATH/tigerprawn2.csv",header=T,skip=3,sep=";")
#sep specifies the separation symbol

If the data is not in table format, read.table() cannot be used. scan() is a very flexible function in which a wide varietry of parameters can be defined. In its simplest form, it can be used like this:

lengths<-scan("YOURPATH/lengths.txt",skip=7) 

Symbol of missing data is NA. is.na() -function recognizes missing values.

a<-c(1,2,NA,4,5)
is.na(a)
#mean cannot be computed while there are missing values:
mean(a)
#missing values can be filtered away before calculation:
mean(a[!is.na(a)])

Data can be written to file using write.table() function.

Generating data

Data can be generated using R's built-in distributions etc. This is useful in simulation models and when setting up model structures even if real data will be used later on.

rnorm(10,5,2)	
#generates 10 random numbers from normal distribution with mean 5 and variance 2
rnorm(1)	#default mean=0,var=1
rlnorm(1)	#lognormal; default mean=0,var=1
runif(1,4,5)	#uniform distribution; default min=0,max=1

random<-rnorm(5,10,2)
#->a vector or five random numbers from distribution N(10,2)

See also ?rbeta, ?rgamma, etc.

Graphics

R is popular partly because it has very good and flexible graphics options. We'll take a brief look at the most important plotting functions.

plot() function draws scatter plots and lines. boxplot() draws boxplots, hist() draws histograms, and pie() draws pie charts. Type demo(graphics) or take a look at Paul Murrell's Introduction to R Graphics for further information on graphics.

plot(lengths)
plot(lengths,type="l")
boxplot(lengths)
hist(lengths)

plot(prawn$spawn,prawn$recruit)
plot(prawn$spawn,prawn$recruit,xlab="SSB",ylab="R")

#Following data from FGFRI's "taskutilasto", p. 17
frozenfish<-c(13141,2458,971,460)
names(frozenfish)<-c("Herring","Rainbow trout","Salmon","Other")
pie(frozenfish)

Writing your own functions

You can write your own functions in R to automatize routines. We'll take a quick look at that possibility by defining a couple of small functions.

#This function simply returns the larger of two input values:
larger=function(a,b){
    if(a>b)
    return(a)
    return(b) 
}

#The function is called like this:
larger(5,9)

The input variables a and b only exist inside the function; they cannot be accessed outside of it, and they don't clash with variables of the same names outside of the function.

Variables that are defined within a function can be made visible outside of the function, but this is not recommended.

edit() command can be used to open a separate edit window.

larger<-edit(larger)

We can define defauilt values that will be used if no imput values are given:

larger=function(a=1,b=0){
     if(a>b)
     return(a)
     return(b)  
}

#This can be called by:
larger()	#defaults will be used
larger(2,3)	#input values used
larger(b=7)	#input value used in b, default in a

Functions can also be called recursively:

#Compute n:th Fibonacci number recursively
fibo<-function(n){
if(n<1)
return("Give a positive integer!");
nn<-as.integer(n)
if(nn==1)
return(1);
if(nn==2)
return(1);
return(fibo(nn-2)+fibo(nn-1));
}

See also

Functions to look at if there's time:

?dump
?source
?history