Category Archives: Open Lab

R Open Lab Fall 2018 – Text data in R

R is also a powerful tool to deal with text data. This time we first clarified the concept between character and string by practicing some tricky examples and some basic ideas that we can play with our text data such as substring, combining and replacing. Then a txt file was introduced to let attendee play with it. The result really interested them a lot.

Here is the link to our open lab’s GitHub repository: https://github.com/wbh0912/R-Open-Lab-Fall-2018

If you have further questions regarding topics covered in the material, please feel free to drop in during consultation hours or leave a comment.

R Open Lab Fall 2018 – More visualization

Today we will explore more about the advanced data visualization in R. First, we will review the basic graphical functions covered in the last open lab and learn how to use additional parameters to achieve different goals. Then, we will focus on the powerful package ggplot2.

Here is the link to our open lab’s GitHub repository: https://github.com/wbh0912/R-Open-Lab-Fall-2018

If you have further questions regarding topics covered in the material, please feel free to drop in during consultation hours or leave a comment.

R Open Lab Fall 2018 – Dataframe and basic visualization

This week we stepped into the most basic but important data structure called dataframe, several ways of constructing dataframes and importing dataframes are introduced. At the mean time, we reviewed the basic idea of extracting data by index/condition by giving some exercises to practice. Then, we focused on how to show the general picture of a dataset in numeric and graphic way at first glance.

Here is the link to our open lab’s GitHub repository: https://github.com/wbh0912/R-Open-Lab-Fall-2018

If you have further questions regarding topics covered in the material, please feel free to drop in during consultation hours or leave a comment.

R Open Lab Fall 2018 – Functions, environment, and apply

The topic of this week is functions, environment, and apply family in R. We first cover the method of defining your own function in R, then we bring in the concept of environment since they are relevant. At last, we go over the apply family. Recall that we learned loops as one of the basic concepts at the very beginning; you can review it from the Starter Kit and the Lab featuring More Fundamentals. Although loop is conceptually simple and intuitive, it is inefficient. The apply family comes in handy in this case.

Here is the link to our open lab’s GitHub repository: https://github.com/wbh0912/R-Open-Lab-Fall-2018/blob/master/function%2C%20environment%2C%20apply.R

If you have further questions regarding topics covered in the material, please feel free to drop in during consultation hours or leave a comment.

R Open Lab Fall 2018 – More Fundamentals

We reviewed a little bit about what is R and R Studio, how they work together and then continued the starter kit of R. First we talked about how to do calculation, commenting and assignment value to variables in R, then we also gave some tricky example to clarify those coding standards such as case sensitivity. The most important part of this open lab was to introduce different classes in R, which would be really useful after understanding this concept. Filtering or slicing is shown by some demonstrated examples and it can be deal with in R in multiple ways. We also started the beginning part of function and loop, and they would be interpreted and practiced in the next session. Hope all of you will get more interests in R and know the fact that how powerful it is in the World of Data

Here is the link to our open lab’s GitHub repository: https://github.com/wbh0912/R-Open-Lab-Fall-2018

If you have further questions regarding topics covered in the material, please feel free to drop in during consultation hours or leave a comment.

R Open Lab Fall 2018 – Getting Started

This is the first R open lab of this semester. We focus on introducing basic concepts to the new users of R language. The file we used is called Starter Kit.

Here is the link of the GitHub repository with all the scripts: https://github.com/wbh0912/R-Open-Lab-Fall-2018

If you have further question regarding topics covered in the material, please feel free to drop in during consultation hours or leave a comment.

Python Open Labs: April 23, 2018

Today was the last Python Open Lab of the semester – congrats to all of the students who have made it this far and picked up skills in a new programming language!

Over the course of the semester, we’ve been learning the basics of Python: how to initialize lists, create dictionaries, iterate through items, and define functions and classes.

The students wanted to see how programming could be applied to a specific problem and how it could be used to analyze existing information or data. I chose to design the last lesson around data visualization. We particularly focused on how to create visualizations using the seaborn library.

The seaborn library is a visualization library based off of matplotlib. It also has the capability to store datasets as dataframes, similar as to how pandas may store an external file. I have recently been exploring seaborn and already find it a very flexible and intuitive library. Borrowing concepts from a tutorial via DataCamp, we were able to create some very beautiful visualizations using only a few lines of code.

Check out some things we were able to make below!

a swarm plot displaying customer tip amounts

a facet grid displaying total bill amounts based on varying aspects of gender and dining time

a colored heat map displaying information related to airplane flights

Students really enjoyed using seaborn and some were even able to apply it to their own datasets. Lots of people were specifically fans of the swarm plots.

Yang Rui (left) and Elena Dubova (right) learning to master seaborn

If you’d like to follow the lesson for today’s class more closely, please click here for step-by-step instructions and enjoy coding things up in your favorite text editor.

Python has become a really popular programming language in the past years. I am glad to see more and more people taking the initiative to learn it and can’t wait to see the amazing challenges my students will take on in the future!

Navie Narula

Spring 2018 R Open Lab: Advanced Visualization

Apr 18

Today we will explore the advanced data visualization in R. First, we will review the basic graphic functions in R and learn how to use additional parameters to achieve different goals. Then, we will introduce the powerful package ggplot2. Here are the codes:

# Quick review of basic visualization
library(ggplot2)
plot(diamonds$carat, diamonds$price, main = “Price vs Carat”, xlab = “Carat”, ylab = “Price”)
pairs(~carat+depth+table+price, data = diamonds)
barplot(table(diamonds$cut))
hist(diamonds$price, breaks = 100)
boxplot(diamonds$price~diamonds$cut)
pie(c(10, 2, 4, 7), c(“A”, “B”, “C”, “D”))

d <- diamonds[sample(1:nrow(diamonds), 1000), ]

# Plot by factor
plot(d$carat, d$price, col = d$cut)
# Add legend
legend(“bottomright”,
legend = levels(diamonds$cut),
fill = 1:5, cex = 0.4)
# Add line
ols <- lm(price~carat, data = d)
abline(ols, lty = 2, lwd = 2)
# Add point
points(2, 2500, pch = 3)
# Add text
text(2, 2000, “new point”)
# Useful parameters
pch
main
xlab
ylab
lty # line type
lwd # line width
cex # character expand
col

 

# ggplot2 package
p <- ggplot(data = d)
p + geom_point(mapping = aes(x = carat, y = price,
col = d$cut))

# facet
p+geom_point(mapping = aes(x = carat, y = price))+
facet_wrap(~cut, nrow = 2)
p+geom_point(mapping = aes(x = carat, y = price))+
facet_grid(~cut)

# regression line
p+geom_point(mapping = aes(x = carat, y = price))+
geom_smooth(mapping = aes(x = carat, y = price), method = “auto”)

# other functions to explore
ggplot(data = )+
geom_histogram(mapping = aes())+
geom_bar(mapping = aes())+
stat_function(mapping = , fun = )+
labs(title = , x = , y = )+
geom_text()+
geom_abline()+
geom_boxplot()


Thank you all for showing up. If you have further questions regarding topics covered in the material, please feel free to drop by during next week’s lab or email me or leave a comment.

See you all next week!

Spring 2018 R Open Lab: Apply Family

Apr 11

The topic of this week is the apply family in R. Recall that we learned loops as one of the basic concepts at the very beginning; you can review it from the Starter Kit and the Lab featuring More Fundamentals. Although loop is conceptually simple and intuitive, it is inefficient. The apply family comes in handy in this case. In this lab, we will cover apply,lapply, sapply,mapply, tapply, and sweep. Here are the codes for this lab:

# apply: better than loops!
m <- matrix(1:9, 3, 3, byrow = TRUE)
for (i in 1:3) {
print(mean(m[i, ]))
}

rowMeans(m)
apply(m, 1, mean)
apply(m, 2, mean)

sos <- function(x, y) {
return(x^2+y^2)
}
apply(m, 1, sos, y = 3)

apply(diamonds[, 2:4], 2, table) # data frame `diamonds` is defined in package `ggplot2`

sweep(m, 2, mu, “*”)
mu <- apply(m, 2, mean)
sweep(m, 2, mu, FUN = “-“)

 

# lapply and sapply
lapply(m, sos, y = 3)
l <- list(c(1, 2, 3), 4, 5, m)
lapply(l, sos, 3)
sapply(l, sos, 3)

lapply(1:10, function(x) x^2)
sapply(1:10, function(x) x^2, simplify = F)
unlist(lapply(1:10, function(x) x^2))
sapply(1:10, function(x) x^2)

 

# mapply
mapply(rep, 1:4, 4:1)
mapply(rep, 2:9, 4)

 

# tapply
s <- c(10:19, 2:5, 3:15)
i <- factor(c(rep(1, 10), rep(2, 4), rep(3, 13)))
tapply(s, i, sum)


Here are a few practice problems you can try by yourself (All of them require the data frame diamonds defined in the package ggplot2) :

Task 1: Find the color and clarity of largest 5 entries of price using apply family.

Task 2: Compute leave-one-out mean for carat and find which observation has the greatest leave-one-out mean.

Task 3: Compute mean and standard deviation for different groups of cut.


Thank you all for showing up. If you have further questions regarding topics covered in the material, please feel free to drop by during next week’s lab or email me or leave a comment.

See you all next week!

Python Open Labs: April 9, 2018

We spent class today reviewing functions and how they work in Python. Students were given problem statements and were asked to write functions to return the correct output. We went over multiple problems, and I’ll step through one in this blog post today.

Imagine that you are given two inputs in the form strings: Jewels and Stones.

Jewels contain unique characters, while Stones do not.

Here is what an example of what these inputs might look like:

Jewels = “aA”

Stones = “aAAbbbb”

Students were asked to write a function to count the number of Jewels present in Stones. In the example above, the output would be 3 given that “a” and “A” are Jewels and that there is 1 of “a” and 2 of “A” in Stones.

Students in the class understood that functions must start with a definition and contain a return statement. What was more difficult to come up with was the syntax used to solve the problem within the actual function itself.

Students eventually came up with the idea to initialize a value to store a result and simply loop through the Stones string to check how many Jewels appear in the Stones input.

Here is how one might solve the problem in code using Python:

def countJewelsinStones(Jewels, Stones):

>>>>count = 0

>>>>for s in Stones:

>>>>>>>>if s in Jewels:

>>>>>>>>>>>count += 1

>>>>return count

We can see that the approach is not only simple, but also uses concepts we’ve reviewed in previous lessons such as conditionals and for loops. I am excited to see that many students in the class were able to solve this problem with little assistance and can’t wait to see what they accomplish next!

Navie Narula