R Programming Quick Notes :: Part

Let us start our journey with vector.

Let us assume a vector with at least n elements in it. The following are some of the operations one can perform on that vector:

[n] - return the element value at the n'th position
[-n] - return all the elements except the one at the n'th position
[m:n] - return all the elements starting at the m'th position and ending at the n'th position (where n > m)
(-[m:n]) - return all the elements except the ones starting at the m'th position and ending at the n'th position (where n > m)
[c(i, j, k, ...)] - return elements is the specified positions i, j, k, ...
[]-c(i, j, k, ...)] - return elements except those in the specified positions i, j, k, ...

Output (vector_ops.R)

> a <- sample(1:25, 10)
> print(a)
 [1]  2 11  7 18 16 22 12  9 24 25
>
> b <- sample(1:10, 10, replace = TRUE)
> print(b)
 [1]  2  1  2  4  7 10  2  4  5  9
>
> print(a[5])
[1] 16
>
> print(a[-5])
[1]  2 11  7 18 22 12  9 24 25
>
> print(b[4:6])
[1]  4  7 10
>
> print(b[-(4:6)])
[1] 2 1 2 2 4 5 9
>
> print(a[c(3, 5, 7)])
[1]  7 16 12
>
> print(a[-c(3, 5, 7)])
[1]  2 11 18 22  9 24 25
>
> print(b/2)
 [1] 1.0 0.5 1.0 2.0 3.5 5.0 1.0 2.0 2.5 4.5
>
> c <- a > 15
> print(a[c])
[1] 18 16 22 24 25
>
> d <- (a > 15) & (a <= 20)
> print(a[d])
[1] 18 16
>
> e <- (b > 3) & (b <= 7)
> print(b[e])
[1] 4 7 4 5
>
> str(a)
 int [1:10] 2 11 7 18 16 22 12 9 24 25
>
> length(b)
[1] 10
>
> which(a > 15)
[1]  4  5  6  9 10
>
> table(b)
b
 1  2  4  5  7  9 10
 1  3  2  1  1  1  1
>
> min(a)
[1] 2
>
> max(b)
[1] 10
>
> sum(a)
[1] 146
>
> mean(b)
[1] 4.6
>
> median(a)
[1] 14
>
> summary(b)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
    1.0     2.0     4.0     4.6     6.5    10.0

The sample() function returns a vector of elements after taking a random sample from a specified vector (first argument), of the specified size (second argument) either with (replace = TRUE) or without replacement.

The logical expression a > 15 (or for that matter of fact (a > 15) & (a <= 20)) returns a logical (TRUE or FALSE) vector that is the results of performing the logical operation on each element in the vector a.

The str() function displays the structure of any R objects such as a vector, list, matrix, or a data.frame.

The length() function returns a count of the number of elements in the vector.

The which() function returns a vector of index positions where the specified logical expression is TRUE.

The table() function returns a frequency table that lists each unique element of the specified vector and how many times that element occurs.

The min() function finds the minimum value in the vector.

The max() function finds the maximum value in the vector.

The sum() function computes the sum of all the elements in the numeric vector.

The mean() function computes the arithmetic mean of all the elements in the numeric vector.

The median() function finds the median from all the elements in the numeric vector.

The summary() function outputs the distribution of the values in the numeric vector, such as, the min value, max value, the mean, the median, and the quartiles.

Let us assume a list with at least n elements in it.

The following are some of the operations one can perform on a list:

[n] - return the element value at the n'th position as a list
[[n]] - return the element value as is at the n'th position. This is the difference compared to the [n] operator
[m:n] - return all the elements starting at the m'th position and ending at the n'th position (where n > m) as a list
[c(i, j, k, ...)] - return elements is the specified positions i, j, k, ... as a list
[[c(m, n)]] - return the element value at n'th position of the collection located at the m'th position
[[m]][[n]] - is the same as [[c(m, n)]]
[['xyz']] - return the element value with the name xyz
$xyz - same as [['xyz']]

Output (list_ops.R)

> a <- list(5, 'ABC', c(1, 2, 3), 7.5, FALSE)
> print(a)
[[1]]
[1] 5

[[2]]
[1] "ABC"

[[3]]
[1] 1 2 3

[[4]]
[1] 7.5

[[5]]
[1] FALSE

>
> print(a[3])
[[1]]
[1] 1 2 3

> class(a[3])
[1] "list"
>
> print(a[[1]])
[1] 5
> class(a[[1]])
[1] "numeric"
>
> print(a[2:3])
[[1]]
[1] "ABC"

[[2]]
[1] 1 2 3

> class(a[2:3])
[1] "list"
>
> print(a[c(1, 3)])
[[1]]
[1] 5

[[2]]
[1] 1 2 3

>
> print(a[[c(3, 2)]])
[1] 2
>
> print(a[[3]][[2]])
[1] 2
>
> b <- list(b1 = 10, b2 = 'PQR', b3 = 6:8, b4 = 8.75, b5 = TRUE)
> print(b)
$b1
[1] 10

$b2
[1] "PQR"

$b3
[1] 6 7 8

$b4
[1] 8.75

$b5
[1] TRUE

>
> print(b[['b1']])
[1] 10
>
> print(b$b3)
[1] 6 7 8

Now, let us switch gears to explore matrix.

One of the ways by which a matrix can be created is using the format:

matrix(vector_of_data, no_of_rows, no_of_columns)

By default, the data elements in a Matrix are filled column-wise. One can change that behavior to fill the data elements row-wise by specifying the argument byrow = TRUE.

One can assign names to each of the rows and columns in a matrix using the argument dirnames = list(rows_names, column_names).

Let us assume a matrix with m rows and n columns.

The following are some of the operations one can perform on a matrix:

[x, y] - return the data element at the x'th row and y'th column (where x <= m and y <= n)
[x,] - return the data elements across all the columns in the x'th row (where x <= m) as a vector
[,x] - return the data elements across all the rows in the x'th column (where x <= n) as a vector
[x:y,] - return all the data elements across all the columns starting at the x'th row and ending at the y'th row (where x <= m and y <= m) as a matrix
[,x:y] - return all the data elements across all the rows starting at the x'th column and ending at the y'th column (where x <= n and y <= n) as a matrix
['r', 'c'] - return the data element at the row with the name 'r' and at the column with the name 'c'
%*% - perform a true matrix multiplication (not element-wise)

Output (matrix_ops.R)

> a <- matrix(1:20, 4, 5)
> print(a)
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    5    9   13   17
[2,]    2    6   10   14   18
[3,]    3    7   11   15   19
[4,]    4    8   12   16   20
>
> class(a)
[1] "matrix"
>
> dim(a)
[1] 4 5
>
> b <- matrix(1:20, 4, 5, byrow = TRUE)
> print(b)
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    2    3    4    5
[2,]    6    7    8    9   10
[3,]   11   12   13   14   15
[4,]   16   17   18   19   20
>
> m <- 1:5
> n <- 6:10
> o <- 11:15
> p <- 16:20
>
> c <- cbind(m, n, o, p)
> print(c)
     m  n  o  p
[1,] 1  6 11 16
[2,] 2  7 12 17
[3,] 3  8 13 18
[4,] 4  9 14 19
[5,] 5 10 15 20
>
> d <- rbind(m, n, o, p)
> print(d)
  [,1] [,2] [,3] [,4] [,5]
m    1    2    3    4    5
n    6    7    8    9   10
o   11   12   13   14   15
p   16   17   18   19   20
>
> e <- matrix(1:20, 4, 5,
+             dimnames = list(c('r1', 'r2', 'r3', 'r4'),
+                             c('c1', 'c2', 'c3', 'c4', 'c5')))
>
> print(e)
   c1 c2 c3 c4 c5
r1  1  5  9 13 17
r2  2  6 10 14 18
r3  3  7 11 15 19
r4  4  8 12 16 20
>
> rownames(e)
[1] "r1" "r2" "r3" "r4"
>
> colnames(e)
[1] "c1" "c2" "c3" "c4" "c5"
>
> print(a[1,])
[1]  1  5  9 13 17
>
> print(b[,2])
[1]  2  7 12 17
>
> print(c[2:3,])
     m n  o  p
[1,] 2 7 12 17
[2,] 3 8 13 18
>
> print(d[,2:3])
  [,1] [,2]
m    2    3
n    7    8
o   12   13
p   17   18
>
> print(e[2,4])
[1] 14
>
> print(e['r2', 'c3'])
[1] 10
>
> f <- t(e)
> print(f)
   r1 r2 r3 r4
c1  1  2  3  4
c2  5  6  7  8
c3  9 10 11 12
c4 13 14 15 16
c5 17 18 19 20
>
> a * b
     [,1] [,2] [,3] [,4] [,5]
[1,]    1   10   27   52   85
[2,]   12   42   80  126  180
[3,]   33   84  143  210  285
[4,]   64  136  216  304  400
>
> a / b
          [,1]      [,2]      [,3]      [,4]     [,5]
[1,] 1.0000000 2.5000000 3.0000000 3.2500000 3.400000
[2,] 0.3333333 0.8571429 1.2500000 1.5555556 1.800000
[3,] 0.2727273 0.5833333 0.8461538 1.0714286 1.266667
[4,] 0.2500000 0.4705882 0.6666667 0.8421053 1.000000
>
> a %*% f
      r1  r2  r3  r4
[1,] 565 610 655 700
[2,] 610 660 710 760
[3,] 655 710 765 820
[4,] 700 760 820 880

The dim() function returns the dimensions (rows, columns) for the specified matrix.

The rownames() function returns the row names for the specified matrix.

The colnames() function returns the column names for the specified matrix.

The cbind() function combines the specified list of vectors as columns of a matrix.

The rbind() function combines the specified list of vectors as rows of a matrix.

The t() function performs a transpose operation on the specified matrix.

Let us assume a data.frame with n rows and columns with names c1, c2, c3, ..., cm.

The following are some of the operations one can perform on a data.frame:

['c1'] - return a data.frame with just the column with name c1
[c('c2', 'c3')] - return a data.frame with the columns named c2 and c3
[['c4']] - return the data elements across all the rows for the column named c4 as a vector
$c5 - return the data elements across all the rows for the column named c5 as a vector
[x,] - return the data elements across all the columns in the x'th row (where x <= n) as a data.frame
[x, c('c6', 'c7')] - return the data elements for the columns named c6 and c7 in the x'th row (where x <= n) as a data.frame
[x:y,] - return all the data elements across all the columns starting at the x'th row and ending at the y'th row (where x <= n and y <= n) as a data.frame
[x:y, c('c8', 'c9')] - return the data elements for the columns named c8 and c9 starting at the x'th row and ending at the y'th row (where x <= n and y <= n) as a data.frame
[c(x, y, z),] - return all the data elements across all the columns for rows at positions x, y, and z (where x <= n, y <= n, and z <= n) as a data.frame
[c(x, y, z), c('c1', 'c2')] - return the data elements for the columns named c1 and c2 for rows at positions x, y, and z (where x <= n, y <= n, and z <= n) as a data.frame
[x, 'c3'] - return the data element at the x'th row for the column with the name 'c3'
[lexpr,] - return all the data elements across all the columns for the rows that satisfy the logical expression lexpr, where the logical expression will involve one or more of the columns

Output (dataframe_ops.R)

> a <- sample(21:30, 10)
> b <- sample(c('Blue', 'Green', 'Orange', 'Red', 'Yellow'), 10, replace = TRUE)
> c <- sample(c(NA, 'HS', 'BS', 'MS', 'PHD'), 10, replace = TRUE)
> d <- sample(c(NA, TRUE, FALSE), 10, replace = TRUE)
> e <- sample(round(runif(5, min = 5, max = 6.5), digits = 1), 10, replace = TRUE)
> f <- sample(c(NA, 150, 160, 170, 180, 190, 200), 10, replace = TRUE)
>
> df <- data.frame(age = a,
+                  color = b,
+                  education = c,
+                  employed = d,
+                  height = e,
+                  weight = f)
> print(df)
   age  color education employed height weight
1   29    Red       PHD       NA    5.1    150
2   25 Orange        BS     TRUE    5.1    170
3   26   Blue        BS       NA    5.0    170
4   28 Yellow        MS       NA    5.9    150
5   21   Blue        MS       NA    5.1    160
6   23   Blue        MS       NA    5.9    150
7   24 Orange        HS    FALSE    5.0     NA
8   27  Green           TRUE    5.5     NA
9   30    Red        BS    FALSE    5.9    190
10  22 Yellow        HS       NA    5.9    150
>
> class(df)
[1] "data.frame"
>
> nrow(df)
[1] 10
>
> ncol(df)
[1] 6
>
> dim(df)
[1] 10  6
>
> length(df)
[1] 6
>
> str(df)
'data.frame':	10 obs. of  6 variables:
 $ age      : int  29 25 26 28 21 23 24 27 30 22
 $ color    : Factor w/ 5 levels "Blue","Green",..: 4 3 1 5 1 1 3 2 4 5
 $ education: Factor w/ 4 levels "BS","HS","MS",..: 4 1 1 3 3 3 2 NA 1 2
 $ employed : logi  NA TRUE NA NA NA NA ...
 $ height   : num  5.1 5.1 5 5.9 5.1 5.9 5 5.5 5.9 5.9
 $ weight   : num  150 170 170 150 160 150 NA NA 190 150
>
> head(df)
  age  color education employed height weight
1  29    Red       PHD       NA    5.1    150
2  25 Orange        BS     TRUE    5.1    170
3  26   Blue        BS       NA    5.0    170
4  28 Yellow        MS       NA    5.9    150
5  21   Blue        MS       NA    5.1    160
6  23   Blue        MS       NA    5.9    150
>
> tail(df)
   age  color education employed height weight
5   21   Blue        MS       NA    5.1    160
6   23   Blue        MS       NA    5.9    150
7   24 Orange        HS    FALSE    5.0     NA
8   27  Green           TRUE    5.5     NA
9   30    Red        BS    FALSE    5.9    190
10  22 Yellow        HS       NA    5.9    150
>
> names(df)
[1] "age"       "color"     "education" "employed"  "height"    "weight"
>
> df['age']
   age
1   29
2   25
3   26
4   28
5   21
6   23
7   24
8   27
9   30
10  22
>
> df[c('color', 'education')]
    color education
1     Red       PHD
2  Orange        BS
3    Blue        BS
4  Yellow        MS
5    Blue        MS
6    Blue        MS
7  Orange        HS
8   Green      
9     Red        BS
10 Yellow        HS
>
> df[['employed']]
 [1]    NA  TRUE    NA    NA    NA    NA FALSE  TRUE FALSE    NA
>
> df$age
 [1] 29 25 26 28 21 23 24 27 30 22
>
> df[5,]
  age color education employed height weight
5  21  Blue        MS       NA    5.1    160
>
> df[5, c('height', 'weight')]
  height weight
5    5.1    160
>
> df[2:5,]
  age  color education employed height weight
2  25 Orange        BS     TRUE    5.1    170
3  26   Blue        BS       NA    5.0    170
4  28 Yellow        MS       NA    5.9    150
5  21   Blue        MS       NA    5.1    160
>
> df[2:5, c('age', 'education')]
  age education
2  25        BS
3  26        BS
4  28        MS
5  21        MS
>
> df[c(1, 3, 5),]
  age color education employed height weight
1  29   Red       PHD       NA    5.1    150
3  26  Blue        BS       NA    5.0    170
5  21  Blue        MS       NA    5.1    160
>
> df[c(1, 3, 5), c('employed', 'height')]
  employed height
1       NA    5.1
3       NA    5.0
5       NA    5.1
>
> df[5, 'weight']
[1] 160
>
> g <- df$weight > 160
>
> df[g,]
     age  color education employed height weight
2     25 Orange        BS     TRUE    5.1    170
3     26   Blue        BS       NA    5.0    170
NA    NA                NA     NA     NA
NA.1  NA                NA     NA     NA
9     30    Red        BS    FALSE    5.9    190
>
> h <- (df$age > 22) & (df$education == 'MS')
>
> df[h,]
   age  color education employed height weight
4   28 Yellow        MS       NA    5.9    150
6   23   Blue        MS       NA    5.9    150
NA  NA                NA     NA     NA

The runif() function returns a specified number of random samples (the first argument) as a uniform distribution in the interval between min to max.

The nrow() function returns the number of rows for the specified data.frame.

The ncol() function returns the number of columns for the specified data.frame.

The head() function returns the first 5 rows of the specified data.frame.

The tail() function returns the last 5 rows of the specified data.frame.