Research Data Service Fall Walk-in Hours

The Research Data Service will resume walk-in hours Tuesday, September 4th from 12pm – 4pm Monday through Thursday. See the calendar for specific areas of help, but generally, 12pm  – 2pm help is available for R, Stata, SPSS & SAS and 2pm – 4pm for help with GIS. Data and general reference help is available during walk-in hours as well as by appointment.

R and Python Open Labs will start the third week of September with dates & times announced the week prior on the Workshops & Training page.

Python Open Labs: April 23, 2018

Today was the last Python Open Lab of the semester – congrats to all of the students who have made it this far and picked up skills in a new programming language!

Over the course of the semester, we’ve been learning the basics of Python: how to initialize lists, create dictionaries, iterate through items, and define functions and classes.

The students wanted to see how programming could be applied to a specific problem and how it could be used to analyze existing information or data. I chose to design the last lesson around data visualization. We particularly focused on how to create visualizations using the seaborn library.

The seaborn library is a visualization library based off of matplotlib. It also has the capability to store datasets as dataframes, similar as to how pandas may store an external file. I have recently been exploring seaborn and already find it a very flexible and intuitive library. Borrowing concepts from a tutorial via DataCamp, we were able to create some very beautiful visualizations using only a few lines of code.

Check out some things we were able to make below!

a swarm plot displaying customer tip amounts

a facet grid displaying total bill amounts based on varying aspects of gender and dining time

a colored heat map displaying information related to airplane flights

Students really enjoyed using seaborn and some were even able to apply it to their own datasets. Lots of people were specifically fans of the swarm plots.

Yang Rui (left) and Elena Dubova (right) learning to master seaborn

If you’d like to follow the lesson for today’s class more closely, please click here for step-by-step instructions and enjoy coding things up in your favorite text editor.

Python has become a really popular programming language in the past years. I am glad to see more and more people taking the initiative to learn it and can’t wait to see the amazing challenges my students will take on in the future!

Navie Narula

Spring 2018 R Open Lab: Advanced Visualization

Apr 18

Today we will explore the advanced data visualization in R. First, we will review the basic graphic functions in R and learn how to use additional parameters to achieve different goals. Then, we will introduce the powerful package ggplot2. Here are the codes:

# Quick review of basic visualization
library(ggplot2)
plot(diamonds$carat, diamonds$price, main = “Price vs Carat”, xlab = “Carat”, ylab = “Price”)
pairs(~carat+depth+table+price, data = diamonds)
barplot(table(diamonds$cut))
hist(diamonds$price, breaks = 100)
boxplot(diamonds$price~diamonds$cut)
pie(c(10, 2, 4, 7), c(“A”, “B”, “C”, “D”))

d <- diamonds[sample(1:nrow(diamonds), 1000), ]

# Plot by factor
plot(d$carat, d$price, col = d$cut)
# Add legend
legend(“bottomright”,
legend = levels(diamonds$cut),
fill = 1:5, cex = 0.4)
# Add line
ols <- lm(price~carat, data = d)
abline(ols, lty = 2, lwd = 2)
# Add point
points(2, 2500, pch = 3)
# Add text
text(2, 2000, “new point”)
# Useful parameters
pch
main
xlab
ylab
lty # line type
lwd # line width
cex # character expand
col

 

# ggplot2 package
p <- ggplot(data = d)
p + geom_point(mapping = aes(x = carat, y = price,
col = d$cut))

# facet
p+geom_point(mapping = aes(x = carat, y = price))+
facet_wrap(~cut, nrow = 2)
p+geom_point(mapping = aes(x = carat, y = price))+
facet_grid(~cut)

# regression line
p+geom_point(mapping = aes(x = carat, y = price))+
geom_smooth(mapping = aes(x = carat, y = price), method = “auto”)

# other functions to explore
ggplot(data = )+
geom_histogram(mapping = aes())+
geom_bar(mapping = aes())+
stat_function(mapping = , fun = )+
labs(title = , x = , y = )+
geom_text()+
geom_abline()+
geom_boxplot()


Thank you all for showing up. If you have further questions regarding topics covered in the material, please feel free to drop by during next week’s lab or email me or leave a comment.

See you all next week!

Spring 2018 R Open Lab: Apply Family

Apr 11

The topic of this week is the apply family in R. Recall that we learned loops as one of the basic concepts at the very beginning; you can review it from the Starter Kit and the Lab featuring More Fundamentals. Although loop is conceptually simple and intuitive, it is inefficient. The apply family comes in handy in this case. In this lab, we will cover apply,lapply, sapply,mapply, tapply, and sweep. Here are the codes for this lab:

# apply: better than loops!
m <- matrix(1:9, 3, 3, byrow = TRUE)
for (i in 1:3) {
print(mean(m[i, ]))
}

rowMeans(m)
apply(m, 1, mean)
apply(m, 2, mean)

sos <- function(x, y) {
return(x^2+y^2)
}
apply(m, 1, sos, y = 3)

apply(diamonds[, 2:4], 2, table) # data frame `diamonds` is defined in package `ggplot2`

sweep(m, 2, mu, “*”)
mu <- apply(m, 2, mean)
sweep(m, 2, mu, FUN = “-“)

 

# lapply and sapply
lapply(m, sos, y = 3)
l <- list(c(1, 2, 3), 4, 5, m)
lapply(l, sos, 3)
sapply(l, sos, 3)

lapply(1:10, function(x) x^2)
sapply(1:10, function(x) x^2, simplify = F)
unlist(lapply(1:10, function(x) x^2))
sapply(1:10, function(x) x^2)

 

# mapply
mapply(rep, 1:4, 4:1)
mapply(rep, 2:9, 4)

 

# tapply
s <- c(10:19, 2:5, 3:15)
i <- factor(c(rep(1, 10), rep(2, 4), rep(3, 13)))
tapply(s, i, sum)


Here are a few practice problems you can try by yourself (All of them require the data frame diamonds defined in the package ggplot2) :

Task 1: Find the color and clarity of largest 5 entries of price using apply family.

Task 2: Compute leave-one-out mean for carat and find which observation has the greatest leave-one-out mean.

Task 3: Compute mean and standard deviation for different groups of cut.


Thank you all for showing up. If you have further questions regarding topics covered in the material, please feel free to drop by during next week’s lab or email me or leave a comment.

See you all next week!

Python Open Labs: April 9, 2018

We spent class today reviewing functions and how they work in Python. Students were given problem statements and were asked to write functions to return the correct output. We went over multiple problems, and I’ll step through one in this blog post today.

Imagine that you are given two inputs in the form strings: Jewels and Stones.

Jewels contain unique characters, while Stones do not.

Here is what an example of what these inputs might look like:

Jewels = “aA”

Stones = “aAAbbbb”

Students were asked to write a function to count the number of Jewels present in Stones. In the example above, the output would be 3 given that “a” and “A” are Jewels and that there is 1 of “a” and 2 of “A” in Stones.

Students in the class understood that functions must start with a definition and contain a return statement. What was more difficult to come up with was the syntax used to solve the problem within the actual function itself.

Students eventually came up with the idea to initialize a value to store a result and simply loop through the Stones string to check how many Jewels appear in the Stones input.

Here is how one might solve the problem in code using Python:

def countJewelsinStones(Jewels, Stones):

>>>>count = 0

>>>>for s in Stones:

>>>>>>>>if s in Jewels:

>>>>>>>>>>>count += 1

>>>>return count

We can see that the approach is not only simple, but also uses concepts we’ve reviewed in previous lessons such as conditionals and for loops. I am excited to see that many students in the class were able to solve this problem with little assistance and can’t wait to see what they accomplish next!

Navie Narula

Python Open Lab: Week 7

Due to the complex nature of functions, on April 1, we started with a review of functions with the following problem:

We then introduced classes and methods:

Classes:

Python is an “object-oriented programming language.” This means that almost all the code is implemented using a special construct called classes. Programmers use classes to keep related things together. This is done using the keyword “class,” which is a grouping of object-oriented constructs.

A class is a code template for creating objects. Objects have member variables and have behaviour associated with them. In python a class is created by the keyword class. An object is created using the constructor of the class. This object will then be called the instance of the class.

In Python we create instances in the following manner Instance = class(arguments)

How to create a class:

The simplest class can be created using the class keyword:

In [1]: class Snake:
pass
In [2]: snake = Snake()
In [3]: print(snake)
<__main__.Snake object at 0x109c05630>

Methods:

Once there are attributes that “belong” to the class, you can define functions that will access
the class attribute. These functions are called methods. When you define methods, you will need to always provide the first argument to the method with a self keyword. For example, you can define a class Snake, which has one attribute name and one method change_name. The method change name will take in an argument new_name along with the keyword self.

Here is a fun example to practice classes/methods:

Spring 2018 R Open Lab: Exploratory Data Analysis

Mar 28

This week ‘s topic is Exploratory Data Analysis in R. The goal of the lab is to give attendants some ideas about what they can learn when they first have a data set in their hands and the corresponding approaches. The lab started with introduction of data frame’s concept and how to create one in R. Then, we talked about different ways to import data into R. After that, we learned ways to explore features of a data frame using the data set “diamonds.csv”. With the information we learned, we started to manipulate the data frame into our desired form by reordering and subsetting. We ended this lab with 2 simple practices of what we learned so far.

Here is the link to the script for this open lab:

https://drive.google.com/file/d/12ejSGZspc5_rBjDfvbu61H3lGdmYjAyf/view?usp=sharing

Here is the data set we used for this lab:

https://drive.google.com/file/d/19crfzpYAS3T0ZXaVxkWQVgk8dFbboFdf/view?usp=sharing

Thank you all for showing up. If you have further questions regarding topics covered in the material, please feel free to drop by during next week’s lab or email me or leave a comment.

See you all next week!

Python Open Labs: March 26, 2018

Over the past few weeks, students have been learning how to iterate through items – whether they may be in strings, lists, tuples, or dictionaries. Students have mainly been using for loops to grab the value of each item, and lots of progress has been in regards to writing them with little to no instruction.

In this blog post, I wanted to go over two different ways to write for loops that I presented in class. One method uses the loop to take on the literal value of an item, whereas the other method uses the loop to take on the index of the item.

Let’s say that we have a list:

lst = [1, 2, 3, 4, 5, 6]

If I wanted to iterate through the list such that my iterator takes on the literal value of the list, then I would write my for loop like this:

for item in lst:

>>>>print(item)

The output of this program would simply be:

1
2
3
4
5
6

If I wanted my for loop to instead take on the index value of each item within the list, then I would write my for loop like this:

for item in range(len(lst)):

>>>>print(item)

The output of the program would now be:

0
1
2
3
4
5

The only difference between this for loop and the previous one is the additional use of the keywords range and len, which allow the iterator to take on an item’s index value.

Note that using the for loop structure with these keywords also allow you to take on the literal value of the list when you index the list within the loop.

Here is example of what that looks like:

for item in range(len(lst)):

>>>>print(item, lst[item])

The output of this program is:

0 1
1 2
2 3
3 4
4 5
5 6

Writing for loops either way is acceptable, but it’s important to know which one might be most relevant to your program. If I were simply looking to add one to every item in the list and print its output, using the for loop without range or len is just fine. If I had multiple lists to iterate over that all happen to be the same length, I might want to incorporate the keywords to save time and efficiency in my program.

For instance, let’s say I had two lists that were of the same length.

animals = [“dog”, “bird”, “horse”]

nums = [2, 11, 19]

I would only need to iterate through one list to grab the values from both by using range and len.

Here is what that looks like:

for item in range(len(animals)):

>>>>print(animals[item], nums[item])

My output would look like:

dog 2
bird 11
horse 19

I hope you found this blog post about for loop structure and output helpful and are confident enough to know which ones to use in your own programs!

Navie Narula

Python Open Lab: Week 4

On March 5, we built on the lesson from week 3, and reviewed functions. Here is one of the problems you can try:

The full jupyter notebook is available for this lesson. Please comment, or send us an email and we can make this lesson (or any other lesson from the semester) available to you!