Basic knowledge of the R language is required in order to be able to use FLR. This one-day course aims at giving the required knowledge as well as references to study material in the web.
Contents:
- How to get started
- Basics of R: syntax, data types etc.
- Loops and conditional structures
- Reading in data files
- Generating data using built-in distributions and functions
- Graphics
- Writing your own functions
- See also
How to get started
Install R (Windows, all versions). You may also want to install a handy editor, Tinn-R; see FLR project's tips on how to install it. However, this is optional. Tinn-R is especially handy when working with large files.
In these pages, I present example code but not the results; you should run the code in R and see what happens.
Basics of R
- freely distributed
- syntax highly similar to S(-plus)
- interpreted, not compiled
- large amount of functions available in contributed packages; users can write more in FORTRAN or C/C++
- provides large amount of statistical & graphical tools and is extensiable
- object oriented
Syntax
Launch R. you'll get html help pages by typing:
You can create a variable by assigning it a value:
Note that R is case sensitive! That is, e.g. x and X are different variables. You can inquire the value of a variable by writing its name, or request a list of variables by writing
Experiment with:
x #after the hash sign (#) you can write your comments; #they will not be interpreted by R x<-7 #adding a new value; the old ne will be erased x<-x+2 #add to the previous value 6->z #you can also assign values this way a<-1;b<-2;c<-3 #several commands on one line can be separated by a semicolon (;) a a+2 b;c
Note that if you write an artihmetic operation without assigning it to any variable (e.g.
a+2
instead of d<-a+2
), the result will be displayed but not saved.
Booleans and strings
In addition to numeric values, R objects can include Boolean values TRUE and FALSE, and text
strings. Strings must be in quotes, e.g. vari<-"punainen"
.
#Boolean: totta<-T valetta<-F # ! is the negation sign; !totta #Character string: color<-"red"
Vectors
The basic data types in R are vectors, matrices, lists, and data frames. Vectors
can be created with command c()
:
y<-c(1,3,5,7,9,11) #create a vector y[3] #Third element of y, as counting starts from 1 y[3]<-55 y
All operations on vectors will be performed element-wise; for example:
y+2 x<-c(1,2,3,4,5) z<-x^2 z x+z ##Try also: a<-c(1,2) b<-c(1,2,3,4) c<-c(1,2,3,4,5) a+b #what happens? a+c #what happens now?
You'll notice that R is very flexible and performs calculations some other software products would perhaps refuse to perform. Note also that scalar values are technically vectors with just one element.
Vectors can be concatenated easily:
d<-c(a,b)
You can refer to a part of a vector's elements:
a[1] d[2:4] d[-(4:5)] d[d>3]
R functions include (but are not restricted to):
log(x) #natural logarithm log10(x) #base 10 logarithm logb(x, n) #base n logarithm exp(x) #exponential function x^2 #rising to power (of two in this case) sqrt(x) #square root exp(x) #exponential function, e^x sin(x) #sine, trig max(x) #maximum value in vector x min(x) range(x) #min and max length(x) #length of x mean(x) #mean var(x) #variance median(x) sd(x) #standard deviation
Sequences can be created easily:
a<-1:100 a b<-seq(0,30,by=0.3) b
Vectors can contain only one type of objects, i.e. only numbers, only text, etc, and not a combination of those. For example,
a<-c(1,5,7,9,11) is.numeric(a[1]) a<-c(1,"five",7,9,11) a is.numeric(a[1]) is.character(a[1]) #Note that if you write text without quotation marks (""), #it will be interpreted as a variable: five<-500 a<-c(1,five,7,9,11) a
names()
-function can be useful with vectors.
ihmisia<-c(12,4,3) names(ihmisia)<-c("aikuisia","lapsia","vanhuksia") ihmisia ihmisia["aikuisia"]
Matrices
Matrices can be created by reshaping or combining vectors. Let's look at reshaping first.
y<-c(1.2,3,5.6,4,8,7.2,8.8,2.1,5.6,0,7.1,5) is.vector(y) M1<-y #set the demensions: vectors is reshaped into matrix ##Note that the order of the dimensions is rows, columns dim(M1)<-c(2,6) #Note that the vector is "read" into a matrix by column M1 is.matrix(M1) #Or using array() M2<-array(y,c(3,4)) M2 is.matrix(M2) #Or using matrix() M3<-matrix(y,ncol=6) M4<-matrix(y,ncol=6,byrow=T)
What happens if you try to reshape a vector into a matrix but there's wrong number of elements in the vector? You'll find that the same thing happens as above with vectors: R fills in the matrix by starting with the beginning of the vector again. Try:
#create vector z, which is the same as y but with one more element, 5. z<-c(y,5) M4<-matrix(z,ncol=6)
Matrices with more that two dimensions are also possible:
p<-c(1,2,3,4,5,6,7,8,9) p2<-2*p p3<-3*p pp<-c(p,p2,p3) P<-array(pp,c(3,3,3))
Vectors can also be combined to form matrices:
a<-c(1,2,3,4,5) b<-c(6,7,8,9,0) A<-rbind(a,b) #vectors form rows in the new matrix B<-cbind(a,b) #vectors form columns in the new matrix C<-cbind(A,c(1,2)) #What happens here?
Lists
Lists are ordered collections of objects, known as its components. The components don't have to be of same type.
fishery<-list(place="Baltic",species="Herring",TAC=c(570,490,372,260,203.349,204.549), years=c(1999,2000,2001,2002,2003,2004)) fishery fishery$species fishery$years[3]
List components are always also numbered and can be referred to by them:
fishery[[2]] #second component fishery[[4]][1] #fourth component, first element
Data frames
Data frames consist of vectors of the same length. They differ from matrices in that they can contain elements of different types, i.e. numbers and characters and Boolean values. Data frames are a common way to present data.
kids<-data.frame(name=c("Ann","John","Max","Susan"),age=c(8,7,6,7),girl=c(T,F,F,T)) kids #Also: kids.names<-c("Ann","Grace","John","Max","Philip"," Susan","Theo") kids.ages<-c(8,5,7,6,9,7,6) kids.girls<-c(T,T,F,F,F,T,F) kids.data<-data.frame(name=kids.names,age=kids.ages,girl=kids.girls) #Logical subsetting: ?subset subset(kids.data,girl==T) #Also: kids.data[1,] #What happens? kids.data[,2] #What happens now? #Make sure that you understand what happens here: ##(The previous two rows help) kids<-kids.data[(kids.data[,3]==T),]
Loops and conditional structures
Loops and conditional execution of commands are crucial in any programming. Conditional execution is
available through if()
command:
if (expr_1) expr_2 else expr_3
where expr_1 must have Boolean value; if it is true then expr_2 is executed, if false, then expr_3. For example:
totta<-T if (totta) print(ihmisia) else print(a) #Or in several lines (practical if expressions 2 and 3 are long): if(totta){ print(ihmisia) }else print(a) #Or: if(totta){ print(ihmisia) }else{ print(a)} #Boolean value can also be the result of a comparison: a<-1 if(a==2) print("The value of 'a' is two.") else print("The value of 'a' is not two!")
for()
, repeat()
and while()
commands.
for (years in 1997:2004){ print(1997:years)} year<-1 while (year<5){ print(year) year<-year+1} #Loop inside a loop ##every year, we go through all the years so far for (years in 2000:2004){ for(i in 2000:years){ cat("Now it is year ",years,", handling data from year ",i,"\n",sep="" )}}
Reading in data files
Save copies of datafiles tigerprawn.txt, tigerprawn2.csv and lengths.txt on your computer. Take a brief look at these files.
There are several functions for reading in data from files in R.
We'll take a look at read.table()
and scan()
functions.
If the data in the original file is in a table format, read.table()
function is the most convenient:
prawn<-read.table("YOURPATH/tigerprawn.txt",header=T,skip=3) #header=T means that there are headers in the file #skip tells how many rows will be skipped from the beginning is.data.frame(prawn) prawn2<-read.table("YOURPATH/tigerprawn2.csv",header=T,skip=3,sep=";") #sep specifies the separation symbol
If the data is not in table format, read.table()
cannot be used.
scan()
is a very flexible function in which a wide varietry of parameters
can be defined. In its simplest form, it can be used like this:
lengths<-scan("YOURPATH/lengths.txt",skip=7)
Symbol of missing data is NA. is.na()
-function recognizes
missing values.
a<-c(1,2,NA,4,5) is.na(a) #mean cannot be computed while there are missing values: mean(a) #missing values can be filtered away before calculation: mean(a[!is.na(a)])
Data can be written to file using write.table()
function.
Generating data
Data can be generated using R's built-in distributions etc. This is useful in simulation models and when setting up model structures even if real data will be used later on.
rnorm(10,5,2) #generates 10 random numbers from normal distribution with mean 5 and variance 2 rnorm(1) #default mean=0,var=1 rlnorm(1) #lognormal; default mean=0,var=1 runif(1,4,5) #uniform distribution; default min=0,max=1 random<-rnorm(5,10,2) #->a vector or five random numbers from distribution N(10,2)
See also ?rbeta
, ?rgamma
, etc.
Graphics
R is popular partly because it has very good and flexible graphics options. We'll take a brief look at the most important plotting functions.
plot()
function draws scatter plots and lines. boxplot()
draws boxplots, hist()
draws histograms, and pie()
draws pie charts.
Type demo(graphics)
or take a look at Paul Murrell's
Introduction to R Graphics
for further information on graphics.
plot(lengths) plot(lengths,type="l") boxplot(lengths) hist(lengths) plot(prawn$spawn,prawn$recruit) plot(prawn$spawn,prawn$recruit,xlab="SSB",ylab="R") #Following data from FGFRI's "taskutilasto", p. 17 frozenfish<-c(13141,2458,971,460) names(frozenfish)<-c("Herring","Rainbow trout","Salmon","Other") pie(frozenfish)
Writing your own functions
You can write your own functions in R to automatize routines. We'll take a quick look at that possibility by defining a couple of small functions.
#This function simply returns the larger of two input values: larger=function(a,b){ if(a>b) return(a) return(b) } #The function is called like this: larger(5,9)
The input variables a and b only exist inside the function; they cannot be accessed outside of it, and they don't clash with variables of the same names outside of the function.
Variables that are defined within a function can be made visible outside of the function, but this is not recommended.
edit()
command can be used to open a separate edit window.
larger<-edit(larger)
We can define defauilt values that will be used if no imput values are given:
larger=function(a=1,b=0){ if(a>b) return(a) return(b) } #This can be called by: larger() #defaults will be used larger(2,3) #input values used larger(b=7) #input value used in b, default in a
Functions can also be called recursively:
#Compute n:th Fibonacci number recursively fibo<-function(n){ if(n<1) return("Give a positive integer!"); nn<-as.integer(n) if(nn==1) return(1); if(nn==2) return(1); return(fibo(nn-2)+fibo(nn-1)); }
See also
Functions to look at if there's time:
?dump ?source ?history