Category Archives: Statistics & Data

Survey Documentation and Analysis (SDA)

Survey Documentation and Analysis (SDA) is a web based interface that allows access and analysis of data. The data can be accessed from IPUMS or from  the Inter-university Consortium for Political and Social Research (ICPSR).

SDA allows you to:

  • Browse the codebook describing a datasetsda
  • Calculate frequencies or crosstabulation (with charts)
  • Do comparison of means
  • Calculate a correlation matrix
  • Compare correlations
  • Perform multiple regression
  • Perform logit/probit regression
  • List values of individual cases
  • Recode variables (into public work area)
  • Compute a new variable
  • List/delete derived variables
  • Download a customized subset

SDA allows you to analyze data at a level appropriate to your level of experience. SDA can only analyze datasets that reside on an SDA server. If you would like to test drive SDA, or to see if SDA is useful for your research check out their General Information page.

R Open Labs – Basic Syntax

Hi all, last Wednesday we kicked off the first session of R Open Lab in the DSSC( based in Lehman Library). We started with basic syntax and briefly discussed how to explore the features of our datasets. We used data from Wal-Mart and we will continue exploring this dataset for the next few sessions. Beginners are welcome to join!

rplot02

See you this Wednesday  10/12/2016 at 10:00 AM!

Python Open Labs Session – 2

In the second session of Python Open Labs, the focus was on conditional statements. Topics covered include : Conditional Operators, Conditional Statements, Boolean Expressions, Python’s obsession with indentation (and the idea of scope!), Two Way Decisions, Multi-way Decisions (using if-else, and elif).

The code on slides is in Python 2.7 – So if you use anything > 2.7, please remember to replace raw_input with input and add parenthesis  to your print statements!

The course material from all Python Open Labs Session is available on Lion Mail Drive link here.
PS: Google drive link requires Columbia UNI login, in case you don’t have one or you prefer using another email ID,  please request on the drive link to grant access and Kunal will do so.

R Open Lab: Week 1

I <3 R

Last week, we kicked off R Open Labs with a demo on Base Graphics, or how to make graphics using basic commands. We also gave a brief intro to Swirl, a great package for learning R.

You can catch up here with these helpful slides: Base Graphics System

Check back next week, Wednesday, Feb. 3 at 10 am, for a quick demo on how to load different data files into R, a free I <3 R button, and plenty of time to practice your code and ask questions.

Got feedback or want to suggest a package to demo? Leave a comment or take our short survey!

Worth A Read

It might seem quiet around the DSSC this week as we gear up for the fall semester, but it’s the perfect time to catch up on summer reading. Here are a few interesting data stories that have come across our radar in recent days:

  • Nathan Yau of Flowing Data made a Statistical Atlas in R, using 2010 census data and an 1874 design aesthetic.
  • Ramiro Gómez provides directions in matplotlib to make a surprisingly evocative map of the UK, using the locations of pubs.
  • LMGTDFY, Let Me Get That Data For You, searches .gov websites to find out what data is publicly available for download.

Got a data project in mind?  Stop by the DSSC, 1pm to 5pm, Monday to Friday to explore what’s possible and cool off from this heat!

NYC 2010 Tract Boundaries

Map of 2010 Census Tracts

Click for interactive map

There are so many places to get Census boundaries, but often for NYC, the layers from NYC Dept of City Planning BYTES of the Big Apple are the most detailed.

However, these boundaries do not contain fields to join with some of the more popular sources for Census variables, either for the the 2010 Decennial Census or the American Community Survey 5-year estimates.

It doesn’t take too much time to create the various fields, and as you can see in the examples, these are very similar with just a couple minor variations. The boundaries are available in the data catalog.

NYC Planning uses a seven character ID identifying tracts, the first digit is the borough ID and the remaining six are the Census Bureau defined tract ID which is good if joining to the tables created by NYC Planning only.

Example Borough ID Tract ID
1010400 1 010400

The Census Bureau uses an 11 character ID for joining with data from the Census Bureau American FactFinder or Social Explorer

Example State Code County Code Tract ID
36061010400 36 061 010400

NHGIS uses a 14 character ID

Example Prefix State ID County ID Tract ID
G3600610010400 G 360 0610 010400

Infoshare uses a 10 character ID

Example County ID Tract ID
0610104.00 061 0104.00

The 2010 Tract boundaries can be joined with data from

  • The 2010 Decennial Census
  • American Community Survey (ACS) 5-yr estimates
    (except ’05-’09 which uses the 2000 boundaries!)

Polling, Surveys, & Public Opinion

Looking for opinion poll or survey information? Search these databases:

iPoll Databank US polls originally gathered by academic, commercial and media survey organizations such as Gallup Organization, Harris Interactive, Pew Research Associates, and many more. From 1935 to present.

Polling the Nations Polls on a variety of subjects conducted by over 1000 polling organizations in the United States and 100 other countries from 1986 to the present time.

Peruse also this research guide on Opinion Poll Data

Looking for Statistical Time Series?

The libraries provide access to both Bloomberg and DataStream.

Bloomberg Provides current and historical financial quotes, business newswires, and descriptive information, research and statistics on over 52,000 companies worldwide. Printing and downloading are available. For search help see our Bloomberg Help Guide.

Datastream Provides statistical time series covering stocks, bonds, commodities and economic data for numerous countries, along with company profiles. Coverage varies by dataset.

One Bloomberg terminal is available in the Digital Social Science Center in Lehman Library and many others are available in the Business Library. Datastream is available on both the  Digital Social Science Center  and  Business Library computers.

HELP When You Need It

Don't know where to start?                 Need help using the Libraries' collection

New to using statistical software?            What is the best way to handle citations?

Can't find the right GIS mapping tool to complete you assignment?

 

Check out what the Digital Social Science Center (DSSC) has to offer.

  • Walk in and ask.
    Librarians are available weekdays at  regularly schedule hours throughout the year in the DSSC Consulting Office which is located in the glass-walled area immediately on your right as you enter the main DSSC area, IAB323, on the second floor of Lehman Library.
     
    Additionally the DSSC Data Service is a space set up for those doing quantitative work or looking for numeric or spatial data. The DSSC Data Service is on the lower level of Lehman, IAB215, and can be reached from the main area of the DSSC by a staircase. It also maintains its own regularly schedule hours throughout the year.
     
  • Make an appointment.
    Send an email to dssc@libraries.cul.columbia.edu to request an appointment. Briefly explain what you need and someone will get back in touch. For questions involving numeric or spatial data, statistical software or mapping you can reach the DSSc Data Servic directly at dssc.data@columbia.edu.
     
  • Send an email.
    Use the same steps as a request for an appointment.  If an email contains only your question a librarian will answer by email.
     
  • Telephone.
    You can reach a librarian in the DSSC Consulting Office, 212-854-8043, or a librarian in the DSSC Data Service, 212-854-6012, during the location's regularly schedule hours.
     
  • Help yourself.
    The DSSC main page and DSSC Data Services page each gives overviews of the collections they support.
    Check to see if there are any in-library workshops offered by the DSSC (these occur most frequently early in a semester).
    Self-paced online tutorials are available from Lynda.com Software Tutorials for a wide range of softwares or GIS Self-paced Online Courses.

Census Bureau Honors Veterans

In honor of America's war veterans the Census Bureau has add a new web page to its InfoGraphics series.  The InfoGraphics series is part of the Bureau's efforts to use data visualization techniques to present in an easy to understand format the information contained in it data collection.

The Memorial Day Information Graphic is located at http://www.census.gov/how/infographics/memorial_day.html.  It includes a timeline across major conflicts with counts of those who served and who died.&nbsp; There are also graphics pertaining to the current makeup of our armed forces