Category Archives: Uncategorized

New Additions to the Collection, May 22, 2017

The following titles were recently added or updated.

CU Numeric Data Catalog Holdings


CU Spatial Data Catalog Holdings


Python Open Labs

In the penultimate session of Python Open Labs we had a brief review of csv concepts and XML parsing with BeautifulSoup Library.

A boiler plate code for practise can be obtained below from the Google Drive link, filed under Session-19 folder.

All of the course slides and examples are made available on:

Next week on April 21, I will going over the basic concepts and some practise problems from the python concepts that we looked into !

Python Open Labs – CSV Files

As we move towards the end of Spring semester, having covered most of the basics in Python, the recent sessions have been focusing on introducing python modules requested by attendees.

Last week had a second session on web-scraping with BeautifulSoup, I have updated the practice code for the same in Session-17 folder of the google drive link mentioned below.

This week, on April 7, 2017 I introduced the Python CSV module for reading and writing data from csv files. A very easy module primarily for reading CSV data, it requires the user to understand only a few of the details. The relevant sample code and a practise csv file can be found on the google drive link below, under Session-18 folder.

All of the course slides and examples are made available on:

Next week’s topic is XML file parsing !

See you next Friday from 1:30 PM – 3:30 PM at DSSC (Room – 215), Lehman Library at Columbia SIPA !

Python Open Labs – Web Scraping

For those who have been following this blog series, sorry for a late post on the updates about Python Open Labs.

Last week we covered some basics about web scraping with python, but before I start let me make a customary disclaimer.

Make sure that any of the websites that you want to scrape have granted you the required permissions to do so. Make sure you are not violating any terms of use by doing so.

So, getting along with the updates. In a nutshell web scraping can be described as a way of extracting useful relevant information from web pages i.e html pages. This can be abstracted into following steps:

  1. Downloading the web page content (user urllib or requests module in python)
  2. View page source in a web browser to examine the html structure of web page and locate information of interest for your task at hand
  3. Try to figure out the html structuring such as class, id, html tag etc that will help your python script locate the information.
  4. Use the beautifulsoup python module to parse and reach as close as possible to the relevant information in the html page structure and then extract the information using string methods.

The steps 2 – 4 go hand in hand, i.e one helps you build more upon the other. For example, the more you understand about the html structure surrounding your page the more specific inputs you can provide to beautifulsoup methods to extract out the information.

For the previous I have uploaded the sample python files with commented code lines on the Google Drive link mentioned below which you can access under Session – 16 folder. Make sure you work through those. Doubts, queries, feedbacks are always welcome 🙂

All of the course slides and examples are made available on:

We will be continuing with the web scraping lecture on march 31, 2017 after which I will also upload a comprehensive document with some additional relevant sources and more interesting code.

Happy Scraping !

See you next Friday from 1:30 PM – 3:30 PM at DSSC (Room – 215), Lehman Library at Columbia SIPA !

R Open Lab – Merge and Filter Data

During the first 20-30 minutes of yesterday’s open lab, we talked about how to merge datasets and filter data using base R and dplyr package. The rest of the open lab were free discussions between participants and instructors.

Thank you to all who showed up!

Welcome to explore the materials I used for the open lab:

Enjoy the spring!

DSSC Extended Walk-in Hours

We’ve extended our walk-in hours from two hours to four hours per day between March 27th through April 27th, Monday – Thursday.

We’ve also added a calendar listing the type of help you can expect during walk-in hours as well as some of the other activities in the DSSC such as the Open Labs or workshops.

The new hours are as follows, although best to consult the calendar in case of changes. Outside of these hours, you can always request a one-on-one consultation with one of our staff.

Monday 12pm – 4pm: help with R, Stata, SPSS and SAS

Tuesday 12pm – 4pm: help with GIS

Wednesday 12:30pm  – 4:30pm: help with R, Stata, SPSS and SAS

Thursday 12pm – 4pm: help with GIS

R Open Lab – ggplot

Data visualization is an integral part of data exploration and presentation. Yesterday, we talked about ggplot2, a package which provides a mature and consistent system for plotting in R.

We explored the advantage and disadvantages of ggplot2, the syntax and usage of the package.

As always, thank you to everyone who showed up.

Materials I used for the open lab can be found here.

Enjoy spring! ❤️

Python Open Labs – Format Strings


In the 15th session of Python Open Labs, this week we looked at some miscellaneous topics and revision of basic concepts of file reading and string handling from previous sessions. We also briefly looked into format strings / format specifiers for string construction in Python. The relevant slides are available on the Session – 15 folder on the google drive link mentioned below.

All of the course slides and examples are made available on:

As always, please keep up with your programming practise, a suggested link for the same is:

See you next Friday from 1:30 PM – 3:30 PM at DSSC (Room – 215), Lehman Library at Columbia SIPA ! We will be covering some basics about web scraping.

R Open Labs – Apply Family

This Wednesday we talked about apply function family in base R. We covered apply(), tapply(), lapply(), sapply() and vapply(). We also briefly introduced the concept of factors in R.

As always, thank you to all who showed up! Next two weeks’ R Open Lab will be cancelled due to midterm week and spring break.

Good luck on your midterms and have a wonderful spring break! 😄