Author Archives: Brett

R Open Lab Fall 2018 – Randomness and linear regression

In today’s open lab, we didn’t cover a lot. We first looked at how to generate random samples with certain conditions. Then we did an easy example of linear regression in R. The purpose of this lab is to let attendants understand how randomness works in R and how to use linear regression model for their own scenario.

Here is the link to our open lab’s GitHub repository: https://github.com/wbh0912/R-Open-Lab-Fall-2018

If you have further questions regarding topics covered in the material, please feel free to drop in during consultation hours or leave a comment.

 

R Open Lab Fall 2018 – Data manipulation

Today we covered the topic of data manipulation. We first reviewed the basic ways to subset data frames such as logical expression and subset function. Then, we looked at ways to combine, merge, and split data frames. Finally, we covered the usage of package plyr.

Here is the link to our open lab’s GitHub repository: https://github.com/wbh0912/R-Open-Lab-Fall-2018

If you have further questions regarding topics covered in the material, please feel free to drop in during consultation hours or leave a comment.

R Open Lab Fall 2018 – More visualization

Today we will explore more about the advanced data visualization in R. First, we will review the basic graphical functions covered in the last open lab and learn how to use additional parameters to achieve different goals. Then, we will focus on the powerful package ggplot2.

Here is the link to our open lab’s GitHub repository: https://github.com/wbh0912/R-Open-Lab-Fall-2018

If you have further questions regarding topics covered in the material, please feel free to drop in during consultation hours or leave a comment.

R Open Lab Fall 2018 – Functions, environment, and apply

The topic of this week is functions, environment, and apply family in R. We first cover the method of defining your own function in R, then we bring in the concept of environment since they are relevant. At last, we go over the apply family. Recall that we learned loops as one of the basic concepts at the very beginning; you can review it from the Starter Kit and the Lab featuring More Fundamentals. Although loop is conceptually simple and intuitive, it is inefficient. The apply family comes in handy in this case.

Here is the link to our open lab’s GitHub repository: https://github.com/wbh0912/R-Open-Lab-Fall-2018/blob/master/function%2C%20environment%2C%20apply.R

If you have further questions regarding topics covered in the material, please feel free to drop in during consultation hours or leave a comment.

R Open Lab Fall 2018 – Getting Started

This is the first R open lab of this semester. We focus on introducing basic concepts to the new users of R language. The file we used is called Starter Kit.

Here is the link of the GitHub repository with all the scripts: https://github.com/wbh0912/R-Open-Lab-Fall-2018

If you have further question regarding topics covered in the material, please feel free to drop in during consultation hours or leave a comment.

Spring 2018 R Open Lab: Advanced Visualization

Apr 18

Today we will explore the advanced data visualization in R. First, we will review the basic graphic functions in R and learn how to use additional parameters to achieve different goals. Then, we will introduce the powerful package ggplot2. Here are the codes:

# Quick review of basic visualization
library(ggplot2)
plot(diamonds$carat, diamonds$price, main = “Price vs Carat”, xlab = “Carat”, ylab = “Price”)
pairs(~carat+depth+table+price, data = diamonds)
barplot(table(diamonds$cut))
hist(diamonds$price, breaks = 100)
boxplot(diamonds$price~diamonds$cut)
pie(c(10, 2, 4, 7), c(“A”, “B”, “C”, “D”))

d <- diamonds[sample(1:nrow(diamonds), 1000), ]

# Plot by factor
plot(d$carat, d$price, col = d$cut)
# Add legend
legend(“bottomright”,
legend = levels(diamonds$cut),
fill = 1:5, cex = 0.4)
# Add line
ols <- lm(price~carat, data = d)
abline(ols, lty = 2, lwd = 2)
# Add point
points(2, 2500, pch = 3)
# Add text
text(2, 2000, “new point”)
# Useful parameters
pch
main
xlab
ylab
lty # line type
lwd # line width
cex # character expand
col

 

# ggplot2 package
p <- ggplot(data = d)
p + geom_point(mapping = aes(x = carat, y = price,
col = d$cut))

# facet
p+geom_point(mapping = aes(x = carat, y = price))+
facet_wrap(~cut, nrow = 2)
p+geom_point(mapping = aes(x = carat, y = price))+
facet_grid(~cut)

# regression line
p+geom_point(mapping = aes(x = carat, y = price))+
geom_smooth(mapping = aes(x = carat, y = price), method = “auto”)

# other functions to explore
ggplot(data = )+
geom_histogram(mapping = aes())+
geom_bar(mapping = aes())+
stat_function(mapping = , fun = )+
labs(title = , x = , y = )+
geom_text()+
geom_abline()+
geom_boxplot()


Thank you all for showing up. If you have further questions regarding topics covered in the material, please feel free to drop by during next week’s lab or email me or leave a comment.

See you all next week!

Spring 2018 R Open Lab: Apply Family

Apr 11

The topic of this week is the apply family in R. Recall that we learned loops as one of the basic concepts at the very beginning; you can review it from the Starter Kit and the Lab featuring More Fundamentals. Although loop is conceptually simple and intuitive, it is inefficient. The apply family comes in handy in this case. In this lab, we will cover apply,lapply, sapply,mapply, tapply, and sweep. Here are the codes for this lab:

# apply: better than loops!
m <- matrix(1:9, 3, 3, byrow = TRUE)
for (i in 1:3) {
print(mean(m[i, ]))
}

rowMeans(m)
apply(m, 1, mean)
apply(m, 2, mean)

sos <- function(x, y) {
return(x^2+y^2)
}
apply(m, 1, sos, y = 3)

apply(diamonds[, 2:4], 2, table) # data frame `diamonds` is defined in package `ggplot2`

sweep(m, 2, mu, “*”)
mu <- apply(m, 2, mean)
sweep(m, 2, mu, FUN = “-“)

 

# lapply and sapply
lapply(m, sos, y = 3)
l <- list(c(1, 2, 3), 4, 5, m)
lapply(l, sos, 3)
sapply(l, sos, 3)

lapply(1:10, function(x) x^2)
sapply(1:10, function(x) x^2, simplify = F)
unlist(lapply(1:10, function(x) x^2))
sapply(1:10, function(x) x^2)

 

# mapply
mapply(rep, 1:4, 4:1)
mapply(rep, 2:9, 4)

 

# tapply
s <- c(10:19, 2:5, 3:15)
i <- factor(c(rep(1, 10), rep(2, 4), rep(3, 13)))
tapply(s, i, sum)


Here are a few practice problems you can try by yourself (All of them require the data frame diamonds defined in the package ggplot2) :

Task 1: Find the color and clarity of largest 5 entries of price using apply family.

Task 2: Compute leave-one-out mean for carat and find which observation has the greatest leave-one-out mean.

Task 3: Compute mean and standard deviation for different groups of cut.


Thank you all for showing up. If you have further questions regarding topics covered in the material, please feel free to drop by during next week’s lab or email me or leave a comment.

See you all next week!

Spring 2018 R Open Lab: Exploratory Data Analysis

Mar 28

This week ‘s topic is Exploratory Data Analysis in R. The goal of the lab is to give attendants some ideas about what they can learn when they first have a data set in their hands and the corresponding approaches. The lab started with introduction of data frame’s concept and how to create one in R. Then, we talked about different ways to import data into R. After that, we learned ways to explore features of a data frame using the data set “diamonds.csv”. With the information we learned, we started to manipulate the data frame into our desired form by reordering and subsetting. We ended this lab with 2 simple practices of what we learned so far.

Here is the link to the script for this open lab:

https://drive.google.com/file/d/12ejSGZspc5_rBjDfvbu61H3lGdmYjAyf/view?usp=sharing

Here is the data set we used for this lab:

https://drive.google.com/file/d/19crfzpYAS3T0ZXaVxkWQVgk8dFbboFdf/view?usp=sharing

Thank you all for showing up. If you have further questions regarding topics covered in the material, please feel free to drop by during next week’s lab or email me or leave a comment.

See you all next week!

Spring 2018 R Open Lab: Character Strings

Feb 28

For this week, the topic we discussed is character strings in R. This lab’s content is a steppingstone for text  analysis.  We started with introducing the concept of characters, character strings, and character string vectors in R. Then, we talked about operations on strings such as getting substring and combining different character strings.  Finally, we learned about extracting and replacing certain patterns within a text-form data set.

Here is the link to the script for this open lab:

https://drive.google.com/file/d/12Etf6qQtpJPymIYrGRqLiaxt8LchNBUB/view?usp=sharing

Here is a reference for regular expressions in R:

https://www.rstudio.com/wp-content/uploads/2016/09/RegExCheatsheet.pdf

Thank you all for showing up. If you have further question regarding topics covered in the material, please feel free to drop by during next week’s lab or email me or leave a comment.

See you all next week!

Spring 2018 R Open Lab: More Fundamentals

Last week, we walked through the R starter kit which introduced most of the useful basic concepts in R such as vectors, matrices, and loops. This week, we continued to talk about more basics in R and demonstrated examples. The goal of this lab is to get attendants a better understanding of how R language works so that they can transform their specific real-life problem into R algorithms smoothly.

Here is the link to the script for this open lab:

https://drive.google.com/file/d/1SePSSVF980EJfCxv4eZ4AQlCP7vlTbe7/view?usp=sharing

The script also has comments and explanations. You can open it with R studio and run it step by step.

Thank you all for showing up. If you have further question regarding topics covered in the material, please feel free to drop by during next week’s lab or email me or leave a comment. See you all next week!