Faculty of IT

Monash university

0. How to Use Markdown

Here we will practice how to use Markdown in oredr to create well represented text. By the way, double click everywhere on textx to access to the cell and see what is happening in the back.

To be able to use markdown, you need to chnage the cell type into markdown. Cells are for coding purpose by default. Let’s start with creating headings. By adding one or more hashtags to teh beginning of a line you can convert it to a heading.

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

No more!

You can use up to six hashtags to craete a level 6 heading. When you finish your typing, press shift and eneter together to run the cell. If you want to modify an exisitng markdown cell, double click on the cell.

0.1 Paragraphs

To craete a paragraph just leave and empty line between previously typed text.

Here I made a new paragraph.

0.2 Bulleted List

Start each line by a star character and a space to craete bulleted list. Also, you can create nested bullet points by give some empty spaces in the next line.

One
Two
Three
Four
- Five
- Six
Seven
Eight
Ten

You can do the same thing with a hyphen: - A - B - C - D - E - F - G

0.3 Numbered List

You can create numbered list by starting a line by a number and a dot and a space:

One
Two
Three

Practice:

Two
Three

Four
Five
Six
Seven

0.4 Bold and Italic

To make the text bold, add two stars to teh begining and the end of a piece of text. For example, the text here is bold.

To make a piece of text italic, just add one star to the begining and the end. For example, the text here is italic.

Another way to make a piece of text italic and bold is to use one or two underscores before and after the text. For example, I am making this text italic.

0.5 Creating Links

To create a link pointing somewhere else, put the link test inside a pair of square bracket and insert the URL into a pair of paranthesis. [ text ] ( link ).

For example, click here to go to R repostory.

You can provide hints for the destination of a link as well. Hover over the link below after running the cell.

click here to go to R repostory.

0.6 Acknowledgement

Most the materials here is inspired from here

1. The First Touch of R

1.1 Using R as a Calculator to Do Arithmetic

You can perform simple arithmetic by inserting numbers and an operation into a cell. Use as much as possible and appropriate paranthesis to make your expresions clear. Use # sign to add comments. Anything typed after the # symbol is ignored by R. Comments are very importanr in documenting programming.

# Addition  #comments
3+5

## [1] 8

3-5  #subtraction

## [1] -2

3*5  # multiplication

## [1] 15

3/5  #division

## [1] 0.6

#Exponentiation could be done in two ways
2**3

## [1] 8

2^3

## [1] 8

#integer division. This is different from 3/5.
3 %/% 5

## [1] 0

# pay attention to their difference
5/3

## [1] 1.666667

5 %/% 3

## [1] 1

# modulus or remainder
5 %% 3

## [1] 2

3+5/2*5-2^3 ##very hard to understand what is going on!

## [1] 7.5

3+((5/2)*(5-2))^3 # use many paranthesis to clarify what you want.

## [1] 424.875

1.2 Assignment

A variable in R is a named storage that we can have access through R commands and change its value. A valid variable name consists of letters, numbers, the dot, and underline characters. A variable name strats with a letter or the dot. However, please don’t name your variable strating with a dot in this unit! Always strat with a letter.

In R, a variable is created at the same time you assign a value to it. After you created a variable, you can perform manipulations. You can assign values into variables using $<-$ (a greater sign and a hyphen), or $=$ sign. It is recommended to use $<-$, and I am going to use this symbole. It is recommended by experts to reserve $=$ for specifying arguments to functions.

x <- 5
x #implicit printing or auto-printing

## [1] 5

print(x) #explicit printing. The differences just because of the setting of R in jupyterhub

## [1] 5

# we will learn more on the difference and about [1] before the result, when we learn about vectors.

x <- x+3
print(x)

## [1] 8

y <- 7
z <- x*y
z

## [1] 56

R is case sensitive for capital letters. Thefore a variable x and X are different.

x <- 5
X<- 7
print(x)

## [1] 5

print(X)

## [1] 7

#Scientific notation
2.54e5  #2.54 * 10 ^ 5

## [1] 254000

7456.3e-2  #7456.3 * 10^(-2)

## [1] 74.563

#rounding numbers
2/3

## [1] 0.6666667

round(2/3,4) #rounds the result of 2/3 into 4  decimal places

## [1] 0.6667

? round  # to get more information about this function

## starting httpd help server ... done

Exercise

Based on Australian Bureau of Statistics, Australian population in 2000 was 19.2 millions. If Australian population growth rate is 1.7% per year, what is the prediction of Australian population for 2020? If $P_0$ is the initial population, $r$ is the annual growth rate, and we are interested to find the population $t$ years later, $P_t$, we use the following formula \[P_t = P_0(1+r)^t\]

1.3 Managing Variables

List of Current Variables

To find the list of exisitng variables in the current environment use ls() or objects() functins.

ls()

## [1] "x" "X" "y" "z"

print(ls()) # Single and double quotes delimit character constants. They can be used interchangeably

## [1] "x" "X" "y" "z"

myVariables <- ls()  # assign existing variable to a variables

print(myVariables)

## [1] "x" "X" "y" "z"

objects()

## [1] "myVariables" "x"           "X"           "y"           "z"

Deleting Variables

You can delete any variale using rm() or remove() functions.

ls()

## [1] "myVariables" "x"           "X"           "y"           "z"

objects()

## [1] "myVariables" "x"           "X"           "y"           "z"

rm(x)  # removes a variable

ls()

## [1] "myVariables" "X"           "y"           "z"

You can delete all the variables at once. It is very useful, in particular, when you want to finish your session, and you want to clean up all the mess!

ls()

## [1] "myVariables" "X"           "y"           "z"

rm(list=ls())

print(ls())

## character(0)

2. Data Types

2.1 Main Data Classes

R has five basic or atomic classes of objects:

numeric:
double (real numbers): values like 2.3, 3.14, -5.7634 , …
integer: values like 0,1,2, -4, …
character: values like “GDDS”, ‘exe’
logical: TRUE and FALSE (always capital letters)
complex: we have nothing to do with it in this unit.

typeof(2) # numbers by default are double

## [1] "double"

typeof(2L) # to force to be integer

## [1] "integer"

typeof(3.14)

## [1] "double"

typeof(TRUE)

## [1] "logical"

typeof("TRUE")

## [1] "character"

2.2 Vectors

The most basic type of R objects is a vector. All the objects we used so far are vectors of length 1. Vectors are variables with one or more values of the same type, e.g., all are of numeric class. For example, a numeric vector might consist of the numbers (1.2, 2.3, 0.2, 1.1).

Vectors are created by c() function (concatenatation function)
Also, they ca be created by vector() function: v <- vector(“numeric”, length=5)
should contain objects of the same class
if you put objects from different classes, an implicit coercion (the calss of value would be changed) will happen
Creating variables using seq and rep functions.

v1 <- c(5,7,9) # a vector called v1 is created.

v1

## [1] 5 7 9

print(v1)

## [1] 5 7 9

#this says v1 is a vector, or a sequence of objects, and the first one is 5.

v2 <- 3:35 # a sequence of consecutive integers are put in v2. The sequence starts from 3 and goes to 35
print(v2)

##  [1]  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [24] 26 27 28 29 30 31 32 33 34 35

# the first item is 3 and the 26th item is 28.

v3 <- c("Helo", "Hi", "Bye") # a vector of characters
print(v3)

## [1] "Helo" "Hi"   "Bye"

v4 <- c(TRUE, TRUE, FALSE, TRUE, TRUE) # a vector of logical values
v4

## [1]  TRUE  TRUE FALSE  TRUE  TRUE

length(v4) # gives the length of a vector

## [1] 5

v5 <- seq(2,8) #another way of making a vector of consecutive numbers. Same as 2:8
v5

## [1] 2 3 4 5 6 7 8

v6 <- seq(from=3, to=10, by=2) # equally you can write seq(3,10,2)
print(v6)

## [1] 3 5 7 9

#learn more about seq() function by typing ? seq
?seq

v7 <- vector(mode="numeric", length=5) # another way of creating a vector
print(v7)

## [1] 0 0 0 0 0

v8 <- c(5, "a", 2) #different types, so a coercion happens. Be very careful about this.
print(v8)

## [1] "5" "a" "2"

#accessing elements of a vector
v8[1]

## [1] "5"

print(v8[2])

## [1] "a"

vv <- c(1,2,3)
vv

## [1] 1 2 3

vv[2] #prints the second item

## [1] 2

vv[2] <- 257 # changes the value stored in the second element
vv

## [1]   1 257   3

#to choose more than one element from a vector
x <- c(12.2, 52.3, 10.2, 11.1)
x[1] # only the first element

## [1] 12.2

x[c(1,3)] # the first and third elemment

## [1] 12.2 10.2

# Adding an element to the end of a list
v <- c(1,2,3)
print(v)

## [1] 1 2 3

v <- c(v, 100) # 100 is added to the end of a vector
print(v)

## [1]   1   2   3 100

# Create sequential data
x1 <- 0:10  # Assigns number 0 through 10 to x1
x2 <- 10:0  # Assigns number 10 through 0 to x2
x3 <- seq(10)  # Counts from 1 to 10
x4 <- seq(30, 0, by = -3)  # Counts down by 3

x <- c(1,3,6,9,0)
x

## [1] 1 3 6 9 0

x[-2] # all the elements except the second element

## [1] 1 6 9 0

x[3] <- 200 #modify an element
x

## [1]   1   3 200   9   0

# to delete a vector
x <- NULL
x

## NULL

x <- c(2, 9, 7)
x

## [1] 2 9 7

y <- c(x, x, 10)
y

## [1]  2  9  7  2  9  7 10

round(seq(1,3,length=10), 2)

##  [1] 1.00 1.22 1.44 1.67 1.89 2.11 2.33 2.56 2.78 3.00

seq(from = 2, by = -0.1, length.out = 4)

## [1] 2.0 1.9 1.8 1.7

x <- rep(3,4)
x

## [1] 3 3 3 3

rep(1:5,3)

##  [1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

x <- c(7,3,5,2,0,1)
y <- x[-3]
y

## [1] 7 3 2 0 1

y <- x[-length(x)] # always delets the final element
y

## [1] 7 3 5 2 0

2.3 Lists

Other basic object in R is a list. A list is very similar to a vector, but it could contain objects from different classes. You can create a list using list() function. The main functionality of lists in putting outputs of functions inside. Later we will see an important example of lm() functions.

L1 <- list(5, "a", 2)
print(L1)

## [[1]]
## [1] 5
## 
## [[2]]
## [1] "a"
## 
## [[3]]
## [1] 2

# L1 has 3 elements, and each element is considered as a vector
#pay attention to double brackets. It shows the elements of the list

L1 #auto printing

## [[1]]
## [1] 5
## 
## [[2]]
## [1] "a"
## 
## [[3]]
## [1] 2

length(L1)

## [1] 3

L2 <- list(c(1,2,3), c("One", "Two"), TRUE)
print(L2)

## [[1]]
## [1] 1 2 3
## 
## [[2]]
## [1] "One" "Two"
## 
## [[3]]
## [1] TRUE

L2

## [[1]]
## [1] 1 2 3
## 
## [[2]]
## [1] "One" "Two"
## 
## [[3]]
## [1] TRUE

L1[1]

## [[1]]
## [1] 5

print(L2[1])

## [[1]]
## [1] 1 2 3

print(L2[[1]])

## [1] 1 2 3

2.4 Numbers

Numbers in R are considerd as numeric, (as real numbers with double precision) . If you want an integer, you need to explicitly add L to the end of the number, otherwise it is a double.

Special numbers: * Inf, infinity, for $\frac{1}{0}$ * NaN, not a number, for $\frac{0}{0}$ * NA can be thought as a missing value

x <- 1
print(x)

## [1] 1

class(x)

## [1] "numeric"

typeof(x)

## [1] "double"

y <- 1L
print(y)

## [1] 1

class(y)

## [1] "integer"

typeof(y)

## [1] "integer"

c1 <- "Heloo" # character variable
c2 <- "The World!" # another character variable
paste(c1, c2)

## [1] "Heloo The World!"

print(c(c1, c2))

## [1] "Heloo"      "The World!"

sqrt(-2)  #NaN stands for not a number

## Warning in sqrt(-2): NaNs produced

## [1] NaN

2.5 Changing Class of a Value

You saw that a vector contains values of only one class. If different classes mixed together by having valuesw ith different classes in a vector, an implicit coercion happens. It means R will convert all the values to a class that are the same. However, sometimes we want to change the type of a value ourselves, so we implemenet an explicit coercion by as.SomeClass() functions. * as.numeric() to change the type into numeric if it is possibel * as.logical() to change into logical if it is possible * as.character() * as.complex() * as.integer()

Sometimes R cannot convert one type to another, and gives NA. Also, you will get warning from R.

x <- 1:5 #sequence of numbers
class(x)

## [1] "integer"

y <- as.numeric(x)
class(y)

## [1] "numeric"

z <- as.logical(x)
print(z)

## [1] TRUE TRUE TRUE TRUE TRUE

class(z)

## [1] "logical"

u <- as.character(z)
print(u)

## [1] "TRUE" "TRUE" "TRUE" "TRUE" "TRUE"

class(u)

## [1] "character"

t <- as.numeric(u)

## Warning: NAs introduced by coercion

## [1] NA NA NA NA NA

class(t)

## [1] "numeric"

#list does not have any problm with mixing data types. Very poerful!
x <- list(14, "Hello", TRUE, list(23, "Hi", TRUE, FALSE))
x

## [[1]]
## [1] 14
## 
## [[2]]
## [1] "Hello"
## 
## [[3]]
## [1] TRUE
## 
## [[4]]
## [[4]][[1]]
## [1] 23
## 
## [[4]][[2]]
## [1] "Hi"
## 
## [[4]][[3]]
## [1] TRUE
## 
## [[4]][[4]]
## [1] FALSE

print(x)

## [[1]]
## [1] 14
## 
## [[2]]
## [1] "Hello"
## 
## [[3]]
## [1] TRUE
## 
## [[4]]
## [[4]][[1]]
## [1] 23
## 
## [[4]][[2]]
## [1] "Hi"
## 
## [[4]][[3]]
## [1] TRUE
## 
## [[4]][[4]]
## [1] FALSE

#elements of list has double brackets around them. Other objects have single bracket

2.6 Factors

Categorical data in R are represented using factors. We will learn a lot about this type of data soon. Factors are stored as integers, but they are assigned labels. R sorts factors in alphabetical oredr. Factors can be ordered or unordered. R considers factors as nominal categorical variables, and “ordered” as ordinal categorical variables.

x <- factor(c("male", "fmale", "male", "male", "fmale", "male")) #create a factor object
print(x)

## [1] male  fmale male  male  fmale male 
## Levels: fmale male

levels(x) #alphabetical order

## [1] "fmale" "male"

nlevels(x)

## [1] 2

unclass(x)

## [1] 2 1 2 2 1 2
## attr(,"levels")
## [1] "fmale" "male"

table(x) #gives frequency count

## x
## fmale  male 
##     2     4

levels(x)

## [1] "fmale" "male"

summary(x)

## fmale  male 
##     2     4

#change the order of levels
#this is important in linear regression. The first level is used as the baseline level.
x <- factor(c("male", "fmale", "male", "male", "fmale", "male"), levels=c("male", "fmale"))
print(x)

## [1] male  fmale male  male  fmale male 
## Levels: male fmale

d <- c(1,1,2,3,1,3,3,2)
d[1]+d[2] # integers

## [1] 2

fd <- factor(d)
print(fd)

## [1] 1 1 2 3 1 3 3 2
## Levels: 1 2 3

fd[1]+fd[2] #factors, you will get warning

## Warning in Ops.factor(fd[1], fd[2]): '+' not meaningful for factors

## [1] NA

unclass(fd) # bring down to integer vector

## [1] 1 1 2 3 1 3 3 2
## attr(,"levels")
## [1] "1" "2" "3"

rd <- factor(d, labels=c("A", "B", "C")) # factor is as an integer vector where each integer has a label
print(rd)

## [1] A A B C A C C B
## Levels: A B C

levels(rd) <- c("AA", "BB", "CC")
print(rd)

## [1] AA AA BB CC AA CC CC BB
## Levels: AA BB CC

is.factor(d)

## [1] FALSE

is.factor(fd)

## [1] TRUE

#ordered factor variable
x1 <- factor(c("low", "high", "medium", "high", "low", "medium", "high"))
print(x1)

## [1] low    high   medium high   low    medium high  
## Levels: high low medium

x1f <- factor(x1, levels = c("low", "medium", "high"))
print(x1f)

## [1] low    high   medium high   low    medium high  
## Levels: low medium high

x1o <- ordered(x1, levels = c("low", "medium", "high"))
print(x1o)

## [1] low    high   medium high   low    medium high  
## Levels: low < medium < high

min(x1o) ## works!

## [1] low
## Levels: low < medium < high

is.factor(x1o)

## [1] TRUE

attributes(x1o)

## $levels
## [1] "low"    "medium" "high"  
## 
## $class
## [1] "ordered" "factor"

By using the gl() function, we can generate factor levels . It takes two integers as input which indicates how many levels and how many times each level. * gl(n, m, labels) * n is the number of levels * m is the number of repeatitions * labels is a vector of labels

v <- gl(3, 4, labels = c("H1", "H2","H3"))
print(v)

##  [1] H1 H1 H1 H1 H2 H2 H2 H2 H3 H3 H3 H3
## Levels: H1 H2 H3

class(v)

## [1] "factor"

2.7 Missing Values

A variable might not have a value, ot its value might missing. In R missing values are displayed by the symbol NA (not avaiable). * NA, not available * Makes certain calculations impossible * is.na() * is.nan() * NA values have class

x1 <- c(4, 2.5, 3, NA, 1)
summary(x1)  # Works with NA

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   1.000   2.125   2.750   2.625   3.250   4.000       1

mean(x1)  # Doesn't work

## [1] NA

mean(x1, na.rm=TRUE)

## [1] 2.625

is.na(x1)

## [1] FALSE FALSE FALSE  TRUE FALSE

# To find missing values
which(is.na(x1))  # Give index number

## [1] 4

# Ignore missing values with na.rm = T
mean(x1, na.rm = T)

## [1] 2.625

# Replace missing values with 0 (or other number)
# In data wrangling you will learn a lot about this.
x2 <- x1
x2[is.na(x2)] <- 0
x2

## [1] 4.0 2.5 3.0 0.0 1.0

2.8 Subsetting

[] always returns an object of the same class
[[]] is used to extract elements from a list fo dataframe. It always return a single element.
$\$$ to extract elements from a list or dataframe unsing a name

x <- c("a1", "a2", "a3", "a4", "a5", "a6")

x[1] #extracts the first item. it's a vector

## [1] "a1"

x[2:5] # extracts a sequence. it's a vector

## [1] "a2" "a3" "a4" "a5"

x <- list(prime=c(2,3,5,7), even=c(0,2,4,6), odd=c(1,3,5,7), digit=3.14)

print(x)

## $prime
## [1] 2 3 5 7
## 
## $even
## [1] 0 2 4 6
## 
## $odd
## [1] 1 3 5 7
## 
## $digit
## [1] 3.14

print(x[1]) #extracts the first element of the list, and it is a list

## $prime
## [1] 2 3 5 7

class(x[1])

## [1] "list"

print(x[[1]]) #extracts the first element and returns a vector.

## [1] 2 3 5 7

print(x[4])

## $digit
## [1] 3.14

print(x[[4]])

## [1] 3.14

x$digit

## [1] 3.14

x[c(1,4)]

## $prime
## [1] 2 3 5 7
## 
## $digit
## [1] 3.14

2.9 Vectorised Operations

Makes life much easier!! We can treat vectors as single variables in R. sometimes we want to apply a particular calculation on all the members of a vector, or between two vectors.

x <- 1:4
2*x

## [1] 2 4 6 8

y <- 2:5
print(x+y)

## [1] 3 5 7 9

x[x>2]

## [1] 3 4

print(x*y)

## [1]  2  6 12 20

print(x>y)

## [1] FALSE FALSE FALSE FALSE

# Matrices will be covered soon.
m1 <- matrix(1:4,2,2)
m2 <- matrix(2:5, 2,2)

m1+m2

##      [,1] [,2]
## [1,]    3    7
## [2,]    5    9

m1*m2

##      [,1] [,2]
## [1,]    2   12
## [2,]    6   20

m1%*%m2 #matrix multiplicatin

##      [,1] [,2]
## [1,]   11   19
## [2,]   16   28

R can perform functions over entire vectors and can be used to select certain elements within a vector. Here is a alist of more frequent functions: * max(x)
* min(x) * sum(x) * mean(x) * var(x) * sd(x) * median(x) * range(x)

3. Data Tables

3.1 Matrices

A matrix is a rectangular array of numbers. From technical perspective, it is a vector, with two additional attributes, namely, the numbers of rows and columns. Vctors we considered so far were one-dimensional. Matrices are a special type of vetor. They have dimension attribute. in other words, matrices are a multi-dimensional vectors.

m <- matrix(nrow=2, ncol=3) #empty matrix with dimension
m

##      [,1] [,2] [,3]
## [1,]   NA   NA   NA
## [2,]   NA   NA   NA

print(m)

##      [,1] [,2] [,3]
## [1,]   NA   NA   NA
## [2,]   NA   NA   NA

attributes(m)

## $dim
## [1] 2 3

dim(m)

## [1] 2 3

print(paste(dim(m)[1], " + ", dim(m)[2]))

## [1] "2  +  3"

m <- matrix(c(1,3,6,2,8,4), nrow=2, ncol=3 ) #matrices are build column-wise
print(m)

##      [,1] [,2] [,3]
## [1,]    1    6    8
## [2,]    3    2    4

str(m) # one of the most important functions

##  num [1:2, 1:3] 1 3 6 2 8 4

m[2,2]

## [1] 2

Other commonly used approaches to create matrix are cbind() and rbind().

#two other methods to creat matrices
x <- c(1,11,111)
y <- c(2,22,222)
m1 <- cbind(x,y) #column-binding
print(m1)

##        x   y
## [1,]   1   2
## [2,]  11  22
## [3,] 111 222

print("****")

## [1] "****"

m2 <- rbind(x,y) #raw-binding
print(m2)

##   [,1] [,2] [,3]
## x    1   11  111
## y    2   22  222

3.2 Data Frames

Data frames are very important object in R. When you have $m$ obsrvation with $n$ attributes, you have a dataframe of size $m\times n$. As the attributes could be of any class, a data frame is technically a list, with each component being a vector corresponding to a column in our data matrix. Therefore, dataframes are a special type of list, where every element of this list should have the same length. Dataframes can store different classes of object in each column. Matrices, should have the same class for every element.

# to create a dataframe
x <- c(1,2,3)
y <- c("a", "b", "c")
z <- c(TRUE, TRUE, FALSE)
df <- data.frame(x,y,z)
print(df)

##   x y     z
## 1 1 a  TRUE
## 2 2 b  TRUE
## 3 3 c FALSE

attributes(df)

## $names
## [1] "x" "y" "z"
## 
## $row.names
## [1] 1 2 3
## 
## $class
## [1] "data.frame"

nrow(df)

## [1] 3

ncol(df)

## [1] 3

df[2,2]

## [1] b
## Levels: a b c

z <- data.frame(c(1,2), c(3,4))
z

##   c.1..2. c.3..4.
## 1       1       3
## 2       2       4

class(z)

## [1] "data.frame"

z1 <- data.frame(cbind(c(1,2), c(3,4)))
z1

##   X1 X2
## 1  1  3
## 2  2  4

class(z1)

## [1] "data.frame"

Names

x <- c(3,5,7)
names(x)

## NULL

names(x) <- c("low", "med", "high")
print(x)

##  low  med high 
##    3    5    7

names(x)

## [1] "low"  "med"  "high"

##  low  med high 
##    3    5    7

names(x) <- NULL
x

## [1] 3 5 7

y <- list(low=3, med=5, high=7)
print(y)

## $low
## [1] 3
## 
## $med
## [1] 5
## 
## $high
## [1] 7

# Access list elements by their name
y$low

## [1] 3

print(y$low)

## [1] 3

m <- matrix(1:6, nrow=3, ncol=2)
dimnames(m)<- list(c("a", "b", "c"), c("d", "e"))

print(m)

##   d e
## a 1 4
## b 2 5
## c 3 6

colnames(m) <- c("male", "fmale")
rownames(m) <- c("ice-cream", "coffee", "cake")
print(m)

##           male fmale
## ice-cream    1     4
## coffee       2     5
## cake         3     6

print(df)

##   x y     z
## 1 1 a  TRUE
## 2 2 b  TRUE
## 3 3 c FALSE

row.names(df) <- c("f1", "f2", "f3")
print(df)

##    x y     z
## f1 1 a  TRUE
## f2 2 b  TRUE
## f3 3 c FALSE

colnames(df) <- c("rank", "character", "value")
print(df)

##    rank character value
## f1    1         a  TRUE
## f2    2         b  TRUE
## f3    3         c FALSE

names(df) <- c("r1", "r2", "r3")
print(df)

##    r1 r2    r3
## f1  1  a  TRUE
## f2  2  b  TRUE
## f3  3  c FALSE

attributes(df)

## $names
## [1] "r1" "r2" "r3"
## 
## $row.names
## [1] "f1" "f2" "f3"
## 
## $class
## [1] "data.frame"

class(df)

## [1] "data.frame"

mode(df)

## [1] "list"

typeof(df)

## [1] "list"

x<- 5
print(x)

## [1] 5

names(x)

## NULL

names(x) <- c("low")
print(x)

## low 
##   5

names(x)

## [1] "low"

attributes(x)

## $names
## [1] "low"

Matrices and dataframes are very similar to each other as both are generally two-dimensional. However, matrices are extensions of vectors, and dataframes are extensions of lists. Matrices have all the data of te same type. Therefore, when your data has different data types, use dataframes.

m1<- matrix(1:25,5,5)
m1

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    6   11   16   21
## [2,]    2    7   12   17   22
## [3,]    3    8   13   18   23
## [4,]    4    9   14   19   24
## [5,]    5   10   15   20   25

str(m1)

##  int [1:5, 1:5] 1 2 3 4 5 6 7 8 9 10 ...

is.matrix(m1)

## [1] TRUE

is.data.frame(m1)

## [1] FALSE

df1 <- as.data.frame(m1)
df1

##   V1 V2 V3 V4 V5
## 1  1  6 11 16 21
## 2  2  7 12 17 22
## 3  3  8 13 18 23
## 4  4  9 14 19 24
## 5  5 10 15 20 25

str(df1)

## 'data.frame':    5 obs. of  5 variables:
##  $ V1: int  1 2 3 4 5
##  $ V2: int  6 7 8 9 10
##  $ V3: int  11 12 13 14 15
##  $ V4: int  16 17 18 19 20
##  $ V5: int  21 22 23 24 25

#The object.size commands indicate how much memory of data take up in the computer
print(paste("the size of df1 is ", object.size(df1), " bytes and the size of m1 is ", object.size(m1), " bytes" ))

## [1] "the size of df1 is  1264  bytes and the size of m1 is  328  bytes"

3.3 Reading and Writing Data in R

Generally we read data from a file. In this unit we will focus on reading .txt (tab delimitted) and .csv (comma separated values) data files. In all cases, we will read a data file into a dataframe. That’s why being able to manipulate a dataframe is very important. You need to make sure that either the data file exists in your current working director, or you give a path to find the location of the file. Other than providing the name of the file, you would enter a sequence of parameters. please see ?read.table or ?read.csv to get an idea.

read.table() to read a .txt data file, and read.csv() for .csv files
source() to bring .r files and make the code inside the file available
write.table(), write.csv() to export data into a file.

mydata <- read.table(“c:/mydata.csv”, header=TRUE, sep=“,”, row.names=“id”)

After working with a dataset, we might like to save it.

write.table(mydata, “c:/mydata.txt”, sep=“”)

Important parameters * hearder=TRUE the first row is the header * sep=“” tab delimitted * sep=“,” *

getwd() #gives you the current working directory

## [1] "\\\\ad.monash.edu/home/User005/xiaoleig/Documents/Other/Monash work/FIT5197/Tutes_new/Week 1"

#pay attention to the way that a directory is represented in your OS

dir() # a list of files and folders

##  [1] "airfoil_self_noise.txt" "data.a1.txt"           
##  [3] "data.a2.txt"            "data.b.txt"            
##  [5] "mydata.csv"             "mydata222.txt"         
##  [7] "plot1.jpeg"             "plot2.png"             
##  [9] "saving_plot4.pdf"       "Tute_1.html"           
## [11] "Tute_1.Rmd"

data <- read.table(file='airfoil_self_noise.txt')

str(data)

## 'data.frame':    1502 obs. of  6 variables:
##  $ V1: int  1000 1250 1600 2000 2500 3150 4000 5000 6300 8000 ...
##  $ V2: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ V3: num  0.305 0.305 0.305 0.305 0.305 ...
##  $ V4: num  71.3 71.3 71.3 71.3 71.3 71.3 71.3 71.3 71.3 71.3 ...
##  $ V5: num  0.00266 0.00266 0.00266 0.00266 0.00266 ...
##  $ V6: num  125 126 128 127 126 ...

dim(data)

## [1] 1502    6

head(data)

##     V1 V2     V3   V4         V5      V6
## 1 1000  0 0.3048 71.3 0.00266337 125.201
## 2 1250  0 0.3048 71.3 0.00266337 125.951
## 3 1600  0 0.3048 71.3 0.00266337 127.591
## 4 2000  0 0.3048 71.3 0.00266337 127.461
## 5 2500  0 0.3048 71.3 0.00266337 125.571
## 6 3150  0 0.3048 71.3 0.00266337 125.201

write.csv(data, file="mydata.csv")

dir()

##  [1] "airfoil_self_noise.txt" "data.a1.txt"           
##  [3] "data.a2.txt"            "data.b.txt"            
##  [5] "mydata.csv"             "mydata222.txt"         
##  [7] "plot1.jpeg"             "plot2.png"             
##  [9] "saving_plot4.pdf"       "Tute_1.html"           
## [11] "Tute_1.Rmd"

write.table(data, file="mydata222.txt")

dir()

##  [1] "airfoil_self_noise.txt" "data.a1.txt"           
##  [3] "data.a2.txt"            "data.b.txt"            
##  [5] "mydata.csv"             "mydata222.txt"         
##  [7] "plot1.jpeg"             "plot2.png"             
##  [9] "saving_plot4.pdf"       "Tute_1.html"           
## [11] "Tute_1.Rmd"

# Split up data
a1 <- data[1:14, 1:3]  # Starting data
a2 <- data[1:14, 4:6]  # New column to add (with "Year" to match)
b <- data[15:16, ]     # New rows to add
write.table(a1, "data.a1.txt", sep="\t")
write.table(a2, "data.a2.txt", sep="\t")
write.table(b, "data.b.txt", sep="\t")
rm(list=ls()) # Clear out everything to start fresh

# Import data
a1t <- read.table("data.a1.txt", sep="\t")
a2t <- read.table("data.a2.txt", sep="\t")

3.4 Manageing your files

getwd(): to get the current working directory, inessence where you are now
setwd(): to change the working directory
dir(): gives youa list of all files and folders
ls(): list a exisiting variables

#options() # gives you the setting of R. Most of its parameters are not changeable in jupyterhub

3.5 Built-in Datasets

There plenty of interesting datasets already avaiable in R. Actually, there is a package, dataset, which is installed by default, and has many datasets inside. We will use these built-in datasets a lot.

#To see a list of the available datasets
data()

?airmailes

## No documentation for 'airmailes' in specified packages and libraries:
## you could try '??airmailes'

str(airmiles)

##  Time-Series [1:24] from 1937 to 1960: 412 480 683 1052 1385 ...

3.6 Packages

Packages are collections of R functions that are ready to use. * library() # see all packages installed * search() # see packages currently loaded * install.packages() to install a package. you don’t need this in juoyterhub. * require() to load a pckage to use it

# See current packages
search()   # Shows packages that are currently loaded

## [1] ".GlobalEnv"        "package:stats"     "package:graphics" 
## [4] "package:grDevices" "package:utils"     "package:datasets" 
## [7] "package:methods"   "Autoloads"         "package:base"

# TO INSTALL AND USE PACKAGES
# Can use menus: Tools > Install Packages... (or use Package window)
# Or can use scripts, which can be saved in incorporated in source
#install.packages("ggplot2")  # Downloads package from CRAN and installs in R

# Make package available; 
require("ggplot2")

## Loading required package: ggplot2

3.7 Frequently used functions

length(object) # number of elements or components
str(object) # structure of an object
class(object) # class or type of an object
names(object) # names
c(object,object,…) # combine objects into a vector
cbind(object, object, …) # combine objects as columns
rbind(object, object, …) # combine objects as rows
ls() # list current objects
rm(object) # delete an object
sort()

# sort is another useful function
x <- c(2,5,3,9,4,1)
x

## [1] 2 5 3 9 4 1

sort(x, decreasing = FALSE)

## [1] 1 2 3 4 5 9

sort(x, decreasing = TRUE)

## [1] 9 5 4 3 2 1

4. Controlling the Execution flow

4.1 Logical Expressions

R allows us to create logical expressions and vectors in order to manipulate logical quantities. To create logical vectors, you may use boolean vales TRUE, FALSE, or NA (for missing / not available) directly, in addition to using the condition/logic operations. Pay attention that R treats TRUE as 1 and FALSE as 0.

R Relational Operators

<
$<=$
$>=$
==
$!=$

R Logical Operators

$x \& y \hspace{0.5cm}$ for (x and y): Element-wise logical AND
$x \&\& y \hspace{0.5cm}$ : Logical AND
$x | y \hspace{0.5cm}$ for (x or y) Element-wise logical OR
$x || y \hspace{0.5cm}$: Logical OR
$!x \hspace{0.5cm}$ for (not x): Logical NOT

Operators $\&$ and $|$ perform element-wise operation producing result having length of the longer operand. But $\&\&$ and $||$ examines only the first element of the operands resulting into a single length logical vector.

2 > 3

## [1] FALSE

4 != 5

## [1] TRUE

(3 != 12) & (2.7 >= 1.9)

## [1] TRUE

y <- c(TRUE, TRUE, FALSE, TRUE, 5 > 2)
y

## [1]  TRUE  TRUE FALSE  TRUE  TRUE

sum(y)

## [1] 4

z <- 3
z>= 3 && z<7

## [1] TRUE

z<10 || z>5

## [1] TRUE

x <- c(1:10)
x[(x>8) | (x<5)]

## [1]  1  2  3  4  9 10

x <- c(1:10)
x[(x>=8) & (x>=5)]

## [1]  8  9 10

x <- 1
y <- 3
(x==1) & (y==3)

## [1] TRUE

(x==1) | (y!=3)

## [1] TRUE

x <- c(TRUE, TRUE, FALSE, TRUE, FALSE,  0, 5) # zero is considered FALSE, and nozero numbers are TRUE
y <- c(FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE)

## [1] 1 1 0 1 0 0 5

!x

## [1] FALSE FALSE  TRUE FALSE  TRUE  TRUE FALSE

x & y

## [1] FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE

x && y

## [1] FALSE

x | y

## [1]  TRUE  TRUE FALSE  TRUE FALSE  TRUE  TRUE

x || y

## [1] TRUE

x <- c(TRUE, TRUE, FALSE, TRUE, FALSE)
y <- c(TRUE, TRUE,TRUE, TRUE,TRUE)

x && y

## [1] TRUE

#ifelse function
x <- seq(10)
ifelse(x %% 2 == 0,"even","odd")

##  [1] "odd"  "even" "odd"  "even" "odd"  "even" "odd"  "even" "odd"  "even"

4.2 Control Structures

Helps you to control the flow of execution of the program * if, else: to check a condition * for: to loop for a fixed number of times * while: to loop while a condition is TRUE * break: to break a loop * next: to skip an iteration * return: to exit a function

If structure

if statement

if (test_expression) { statement }

if-else

if (test_expression) { statement1 } else { statement2 }

Nested if

if ( test_expression1) { statement1 } else if ( test_expression2) { statement2 } else if ( test_expression3) { statement3 } else statement4

x <- 2
if(x == 2){
    print("Yesss")
}

## [1] "Yesss"

if(x > 2){
    print("Greater")
} else if(x < 2) {
    print("Smaller")
} else {
    print("Equal")
}

## [1] "Equal"

any() and all() functions

x <- 1:10
if (any(x > 4)) print("Well done!")

## [1] "Well done!"

if (any(x > 12)) print("No Way!")
if (all(x > 7)) print("Another one!")
if (all(x > 0)) print("Hit the road!")

## [1] "Hit the road!"

For structure

for (val in sequence) { statement }

for (i in 1:5){
    print(i)
}

## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5

y <- c("a", "b", "c", "d")
# makes loops iterations based on length of y
for (i in seq_along(y)){
    print(i)
}

## [1] 1
## [1] 2
## [1] 3
## [1] 4

seq_along(6)

## [1] 1

for(k in c("a", "b", "c", "d")){
    print(k)
}

## [1] "a"
## [1] "b"
## [1] "c"
## [1] "d"

# nested for loop
m <- matrix(nrow=2, ncol=3)
for (i in 1:nrow(m)){
    for(j in 1:ncol(m)){
        m[i,j] <- i*j
    }
}
m

##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    2    4    6

count <- 5
while(count >0){
    print(count)
    count <- count -1
}

## [1] 5
## [1] 4
## [1] 3
## [1] 2
## [1] 1

for(i in 1:10){
    if(i %% 2==0){
        next
    }
    print(i)
}

## [1] 1
## [1] 3
## [1] 5
## [1] 7
## [1] 9

for(i in 1:20){
    if(i %% 2==0){
        next
    }
    print(i)
    if(i>10){
        break
    }
}

## [1] 1
## [1] 3
## [1] 5
## [1] 7
## [1] 9
## [1] 11

While structure

while (test_expression) { statement }

count <- 1

while (count < 6) {
   print(count)
   count <- count+1
}

## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5

5. Functions

5.1 User defined Functions

It is easy to define a functon. Here is a simple example. func_name <- function (argument) { statement }

# your function needs a name
myfunc1 <- function(n){
    n*n
} # the function will return the last value

# to call a function
myfunc1(5)

## [1] 25

t <- 1:5
myfunc1(t)

## [1]  1  4  9 16 25

x <- 10
y <- myfunc1(x)
y

## [1] 100

# the variable y has a default value. if you don't mention it, it will be 2.
f2 <- function(x,y=2){
    x+y
}

f2(5,5)

## [1] 10

f2(5)

## [1] 7

fun3 <- function(x){
    str(x)
}

fun3(f2)

## function (x, y = 2)  
##  - attr(*, "srcref")=Class 'srcref'  atomic [1:8] 2 7 4 1 7 1 2 4
##   .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x0000000017fc5648>

col.mean <- function(y, removeNA=TRUE){
    nc <- ncol(y)
    #print(nc)
    means <- numeric(nc) # a vector of size nc containing 0
    for(i in 1:nc){
        means[i] <- mean(y[,i], na.rm=removeNA) # remember maen() is sensitive for NA
    }
    means
}
#very last expresion is the return value

col.mean(airquality)

## [1]  42.129310 185.931507   9.957516  77.882353   6.993464  15.803922

dim(airquality)

## [1] 153   6

make.power <- function(n){ #a function returns a function
    pow <- function(x){
        x^n
    }
    pow
}

cube <- make.power(3)

cube(2)

## [1] 8

drawFun <- function(f){
    x <- seq(-5, 5, len=1000)
    y <- sapply(x, f)
    plot(x, y, type="l", col="blue")
}

drawFun(cos)

R Built-in Functions

To use R’s built-in functions we need to follow their arguments. A function takes arguments as input and returns an object as output.

x <- 1:10
sum(x)

## [1] 55

length(x)

## [1] 10

median(x)

## [1] 5.5

? seq

#Type the name of the function without any parentheses or arguments
seq

## function (...) 
## UseMethod("seq")
## <bytecode: 0x00000000174ba060>
## <environment: namespace:base>

#if you see UseMethod, there are multiple methods (functions) 
#associated with the seq function
### somefunctions might be hidden!

methods(seq)

## [1] seq.Date    seq.default seq.POSIXt 
## see '?methods' for accessing help and source code

#seq.Date

seq()

## [1] 1

args(seq)

## function (...) 
## NULL

args(round)

## function (x, digits = 0) 
## NULL

ls

## function (name, pos = -1L, envir = as.environment(pos), all.names = FALSE, 
##     pattern, sorted = TRUE) 
## {
##     if (!missing(name)) {
##         pos <- tryCatch(name, error = function(e) e)
##         if (inherits(pos, "error")) {
##             name <- substitute(name)
##             if (!is.character(name)) 
##                 name <- deparse(name)
##             warning(gettextf("%s converted to character string", 
##                 sQuote(name)), domain = NA)
##             pos <- name
##         }
##     }
##     all.names <- .Internal(ls(envir, all.names, sorted))
##     if (!missing(pattern)) {
##         if ((ll <- length(grep("[", pattern, fixed = TRUE))) && 
##             ll != length(grep("]", pattern, fixed = TRUE))) {
##             if (pattern == "[") {
##                 pattern <- "\\["
##                 warning("replaced regular expression pattern '[' by  '\\\\['")
##             }
##             else if (length(grep("[^\\\\]\\[<-", pattern))) {
##                 pattern <- sub("\\[<-", "\\\\\\[<-", pattern)
##                 warning("replaced '[<-' by '\\\\[<-' in regular expression pattern")
##             }
##         }
##         grep(pattern, all.names, value = TRUE)
##     }
##     else all.names
## }
## <bytecode: 0x0000000013a54ce0>
## <environment: namespace:base>

6. Simulation

6.1 Generating Random Numbers

Here are functions for probability distribution in R. They help us simulate variables from given probability distributions. * rnorm: generates random normal variables * pnorm: evaluate the cumulative distribution of Noraml distribtion * dnorm: evaluates normal probaility density * qnorm: quantiles

For each peobability density function, there are four functions related to them: * d for density * r for random number generator * p for cumulative distribution * q for quantile function

Examples: * dnorm(x,mean=0, sd=1, log=FALSE) * pnorm(q,mean=0, sd=1, lower.tail=TRUE, log.p=FALSE) * dnorm(p,mean=0, sd=1, lower.tail=TRUE, log.p=FALSE) * dnorm(n,mean=0, sd=1)

If $F$ is the cumulative distribution function for a standard nor,al distribution, then $\text{pnorm}(q)=F(q)$ and $\text{qnorm}(p)= F^{-1}(p)$

#Simulation
# rnorm, dnorm, pnorm, 
x <- rnorm(10)
x

##  [1]  0.7170349 -1.1499620  0.6184537 -0.6282082  1.1563672 -0.3281488
##  [7]  0.1507257  0.1272213 -1.3083004 -1.0102714

x <- rnorm(10,20,2)
x

##  [1] 19.67680 20.98605 21.33224 21.49759 18.40721 22.18899 21.81866
##  [8] 21.21492 21.12033 22.02307

summary(x)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   18.41   21.02   21.27   21.03   21.74   22.19

set.seed(1)
rnorm(5)

## [1] -0.6264538  0.1836433 -0.8356286  1.5952808  0.3295078

rnorm(5)

## [1] -0.8204684  0.4874291  0.7383247  0.5757814 -0.3053884

set.seed(1)
rnorm(5)

## [1] -0.6264538  0.1836433 -0.8356286  1.5952808  0.3295078

rnorm(5)

## [1] -0.8204684  0.4874291  0.7383247  0.5757814 -0.3053884

ppois(2,2) ##cumulative distribution

## [1] 0.6766764

##Pr(x<=2)
ppois(4,2) ##Pr(x<=4)

## [1] 0.947347

set.seed(20)
x <- rnorm(100)
e <- rnorm(100,0,2)
y <- 0.5+2*x+e
summary(y)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -6.4084 -1.5402  0.6789  0.6893  2.9303  6.5052

plot(x,y)

6.2 Random Sampling

The sample() fnction draws randomly from a specified set of (scalar) objects allowing you to sample from arbitrary distributions.

Summary: * Drawing samples from specific probability distribution can be done with r- function * Standard distributions are Normal, Poisson, Biomial, Exponential, Gamma, etc. * the sample() function can be used tio draw random samples from abitrary vectors * Settng the random number generator via set.seed() is ritical for reproducability.

set.seed(1)
sample(1:10, 4) # without replacement

## [1] 3 4 5 7

sample(1:10,4)

## [1] 3 9 8 5

sample(letters, 5)

## [1] "q" "b" "e" "x" "p"

sample(1:10) #permutation

##  [1]  4  7 10  6  9  2  8  3  1  5

sample(1:10)

##  [1]  2  3  4  1  9  5 10  8  6  7

sample(1:10, replace=TRUE) #sample with replacement

##  [1] 2 9 7 8 2 8 5 9 7 8

7. Plotting

7.1 Building graphics from data

Dataframes are a powerful tool to organizing and visualizing data. However, it is hard to interpret large data sets, no matter how organized they are. Sometimes it is much easier to interpret graphs than numbers.

Some of the key base plotting functions

plot(): plots based on the object type of the imput
lines(): add lines to the plot (just connect dots)
points(): add points
text(): add text labels to a plot using x,y coordinates
title(): add titles
mtext():add arbitrary text to the margin
axis(): adding axis ticks/labels

some important parameters

pch: the plotting symbol (plotting character)
lty: the line type; solid, dashed, …
lwd: the line width; lwd=2
col: color; col=“red”
xlab: x-axis label; xlab=“units”
ylab: y-axix label; ylab=“price”

plot(c(2,3), c(3,4))

x <- seq(-2*pi,2*pi,0.1)
plot(x, sin(x),
    main="my Sine function",
    xlab="the values",
    ylab="the sine values")

Different values for type * “p” - points (defult) * “l” - lines * “b” - both points and lines * “c” - empty points joined by lines * “o” - overplotted points and lines * “s” and “S” - stair steps * “h” - histogram-like vertical lines * “n” - does not produce any points or lines

x <- seq(-2*pi,2*pi,0.1)
plot(x, sin(x),
    main="my Sine function",
    xlab="the values",
    ylab="the sine values",
    type="s",
    col="blue")

Calling plot() multiple times will replace the current graph with the previous one. However, sometimes we wish to overlay the plots in order to compare the results. This is done with the functions lines() and points() to add lines and points respectively, to the existing plot.

plot(x, sin(x),
 main="Overlaying Graphs",
 type="l",
 col="blue")

lines(x,cos(x), col="red")

legend("topleft",
      c("sin(x)","cos(x)"),
      fill=c("blue","red")
)

By setting some graphical parameters we can put several graphs in a single plot. The par() is used for global graphics parameters. R programming has a lot of graphical parameters which control the way our graphs are displayed. * before doing any change record the standard default parameters oldpar <- par() * las: the rientation of axix labels on the plot * bg: the background color * mar: the margin size * oma: the outer margin size * mfrow: number of plots per row (plots are filled row-wise) * mfcol: number of plots per row (plots are filled column-wise) * at the end, par(oldpar) and neglect the warning messages.

#par() # to see all the parameters

par("mar") # to see the margins, bottom, left, top, right

## [1] 5.1 4.1 4.1 2.1

# par(mfrow=c(1,2))    # set the plotting area into a 1*2 array

oldpar <- par()
# make labels and margins smaller
par(cex=0.7, mai=c(0.1,0.1,0.2,0.1))

Temperature <- airquality$Temp

# define area for the histogram
par(fig=c(0.1,0.7,0.3,0.9))
hist(Temperature)

# define area for the boxplot
par(fig=c(0.8,1,0,1), new=TRUE)
boxplot(Temperature)

# define area for the stripchart
par(fig=c(0.1,0.67,0.1,0.25), new=TRUE)
stripchart(Temperature, method="jitter")

par(oldpar)

## Warning in par(oldpar): graphical parameter "cin" cannot be set

## Warning in par(oldpar): graphical parameter "cra" cannot be set

## Warning in par(oldpar): graphical parameter "csi" cannot be set

## Warning in par(oldpar): graphical parameter "cxy" cannot be set

## Warning in par(oldpar): graphical parameter "din" cannot be set

## Warning in par(oldpar): graphical parameter "page" cannot be set

str(mtcars)

## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

plot(mtcars$wt, mtcars$mpg, main="MPG and weight", col="blue", pch=5)
abline(lm(mtcars$mpg~mtcars$wt), col="red", lwd=3)

plot(mtcars$wt, mtcars$mpg, 
     main="MPG and weight", 
     col="blue", 
     pch=5,
    xlab="wt",
    ylab="mpg")
abline(lm(mtcars$mpg~mtcars$wt), col="red", lwd=3)

oldpar <- par()
par(mfrow = c(1,2))
hist(islands, breaks = 16)
boxplot(islands)

par(oldpar)

## Warning in par(oldpar): graphical parameter "cin" cannot be set

## Warning in par(oldpar): graphical parameter "cra" cannot be set

## Warning in par(oldpar): graphical parameter "csi" cannot be set

## Warning in par(oldpar): graphical parameter "cxy" cannot be set

## Warning in par(oldpar): graphical parameter "din" cannot be set

## Warning in par(oldpar): graphical parameter "page" cannot be set

drawFun <- function(f){
    x <- seq(-5, 5, len=1000)
    y <- sapply(x, f)
    plot(x, y, type="l", col="blue")
}

drawFun(sin)
abline(h=0, col="red", lwd=3, lty=1)
abline(v=2, col="green", lwd=3, lty=2)
abline(2,1, col="pink", lwd=3, lty=3)

#develop a function which overlays a normal approximation density function and kernel density function over a histogram

funn <- function(x){
    h <- hist(x, col="red", breaks=10, freq=FALSE)
    xfit<-seq(min(x)-10,max(x)+10,length=40) 
    yfit<-dnorm(xfit,mean=mean(x),sd=sd(x)) 
    #yfit <- yfit*diff(h$mids[1:2])*length(x) 
    lines(xfit, yfit, col="blue", lwd=2)
    d <- density(mtcars$mpg) # returns the density data 
    lines(d, col="green", lwd=2)
    
    
}

funn(mtcars$mpg)

7.2 Saving Garphs

Temperature <- airquality$Temp

#to save as a jpeg to the currnt directory
jpeg(file="plot1.jpeg")
hist(Temperature, col="darkgreen")
dev.off()

## png 
##   2

#saving as a png
png(file="plot2.png",
   width=600, height=350)
hist(Temperature, col="gold")
dev.off()

## png 
##   2

#saving as a pdf file
pdf(file="saving_plot4.pdf")
hist(Temperature, col="violet")
dev.off()

## png 
##   2

x <- seq(-4,4, 0.01)
y <- sin(x)
plot(x,y, ylim=c(-2,7), type="l", col="blue")
lines(c(1.5,2.5,3),c(3,3,5), col="red")
lines(c(0,0.5,1),c(0,2,0), col="green")

plot(c(1,2,3), c(1,2,4))

plot(c(1,2,3), c(1,2,4))

x <- c(1,2,3)
y <- c(1,3,8)
plot(x,y)
lmout <- lm(y ~ x)
abline(lmout)

plot() is a generic function meaning that it is a placeholder for a family of functions. The function that actually gets called will depend on the class of the object on which it is called. Using plot(), you can add componenets one by one. * abline() then adds a line to the current graph * lines() gets a vector of x values and a vector of y values, and joins the ponits to each other * points() function adds a set of (x,y)-points * legend() is used to add a legend to a multicurve graph * text() function places some text anywhere in the current graph * mtext() adds text in the margins * polygon() draws arbitrary polygonal objects

plot(c(0,2,3), c(1,2,4))

x <- c(0,2,3)
y <- c(1,3,8)
plot(x,y) # same as before
fit <- lm(y ~ x) # a regression line
#The call to abline() then adds a line to the current graph. 
#abline(c(2,1)) adds y = x + 2
abline(fit) #adds a line to a plot.
abline(h=1, col="red")
abline(v=2, col="blue")
abline(3,4, col="green") # y=3x+4

plot(x,y, type="l", col="blue")
lines(c(1.5,2.5),c(3,3), col="red")
text(2.5,4,"R is COOL")

f <- function(x) return(sin(x))
curve(f,0,2)
polygon(c(1.2,1.4,1.4,1.2),c(0,0,f(1.3),f(1.3)),col="gray")

f <- function(x) return(1-exp(-x))
curve(f,0,2)
polygon(c(1.2,1.4,1.4,1.2),c(0,0,f(1.3),f(1.3)),col="gray")

plot(x,y)
lines(lowess(x,y))

g <- function(t) { return (t^2+1)^0.5 } # define g()
x <- seq(0,5,length=10000) # x = [0.0004, 0.0008, 0.0012,..., 5]
y <- g(x) # y = [g(0.0004), g(0.0008), g(0.0012), ..., g(5)]
plot(x,y,type="l")

curve((x^2+1)^0.5,0,5)

x <- c(0,2,3)
y <- c(1,3,8)
plot(x,y) # same as before
fit <- lm(y ~ x) # a regression line
#The call to abline() then adds a line to the current graph. 
#abline(c(2,1)) adds y = x + 2
abline(fit) #adds a line to a plot.
abline(h=1, col="red")
abline(v=2, col="blue")
abline(3,4, col="green") # y=3x+4
curve((x^2+1)^0.5,0,5,add=T, col="yellow")

f <- function(x) return((x^2+1)^0.5)
plot(f,0,5) # the argument must be a function name

Introductory Rrogramming in R

Asef Nazari

19 July 2018