Tag Archives: data

New Additions to the Collection, May 22, 2017

The following titles were recently added or updated.

CU Numeric Data Catalog Holdings


CU Spatial Data Catalog Holdings


New Additions to the Collection, May 15, 2017

The following titles were recently added or updated.

CU Numeric Data Catalog Holdings


Recently updated

CU Spatial Data Catalog Holdings


NYC 2010 Tract Boundaries

Map of 2010 Census Tracts

Click for interactive map

There are so many places to get Census boundaries, but often for NYC, the layers from NYC Dept of City Planning BYTES of the Big Apple are the most detailed.

However, these boundaries do not contain fields to join with some of the more popular sources for Census variables, either for the the 2010 Decennial Census or the American Community Survey 5-year estimates.

It doesn’t take too much time to create the various fields, and as you can see in the examples, these are very similar with just a couple minor variations. The boundaries are available in the data catalog.

NYC Planning uses a seven character ID identifying tracts, the first digit is the borough ID and the remaining six are the Census Bureau defined tract ID which is good if joining to the tables created by NYC Planning only.

Example Borough ID Tract ID
1010400 1 010400

The Census Bureau uses an 11 character ID for joining with data from the Census Bureau American FactFinder or Social Explorer

Example State Code County Code Tract ID
36061010400 36 061 010400

NHGIS uses a 14 character ID

Example Prefix State ID County ID Tract ID
G3600610010400 G 360 0610 010400

Infoshare uses a 10 character ID

Example County ID Tract ID
0610104.00 061 0104.00

The 2010 Tract boundaries can be joined with data from

  • The 2010 Decennial Census
  • American Community Survey (ACS) 5-yr estimates
    (except ’05-’09 which uses the 2000 boundaries!)

Journalism Library Data Viz Contest – Decorate Our Wall EXTENDED

Hold the phone!!!  The Journalism Library is excited to EXTEND the deadline for the data visualization contest! Submit your entries by Wednesday, May 15th 5pm, for a chance to see yours in poster-size proudly displayed in the Journalism Library. The winner will also receive a libraries mug and the opportunity to submit your work to Columbia University Libraries Academic Commons.

Contest Rules:

  • submissions must use publicly available data; data is broadly defined and can include video, audio, photo
  • submissions must be received no later than 5pm on May 15th – please send to journalism@libraries.cul.columbia.edu
  • you may use previously submitted class work!
  • submissions must be in PDF – please do not include your name in the filename, but please share with us the following in your email!
    • your full name
    • graduation year
    • title for your creation
    • data source/s used

Try Ashley's new cool data tools page for help and ideas!

All submissions will be judged based on accurate use of data and originality in aesthetic presentation; panel of judges includes Journalism Librarian, Cristina Ergunay, Data Services Coordinator, Ashley Jester, and JSchool Professors Susan McGregor and Mark Hansen!

The winner will be announced at the JSchool Innovation Showcase on May 17th. We look forward to your submissions! 

Resource Spotlight – Ashley Jester

Please welcome Ashley Jester, the new Data Services Coordinator in the Social Sciences Libraries!  Ashley holds a PhD from Stanford in Political Science with advanced specializations in international relations, organizational behavior, and political economy.  She's here to assist you with your research, from the initial steps of background research and finding data to analysis and interpretation of data.  She's great with Excel, STATA, SPSS, R, and you can find her staffing the reference desks at the Digital Social Science Center (DSSC) and Data Services in Lehman Library.  She's also available for individual/small group meetings and consultations, so don't hesitate to call upon Ashley!

ashley.jester@columbia.edu, (212) 854-0514

Citing Datasets

Citing datasets in your work is just as important as citing journal articles and books. IASSIST, an international organization which supports data for research and teaching in the social sciences, recommends minimum elements required for dataset identification and retrieval. Peruse the IASSIST Quick Guide to Data Citation for examples in APA, Chicago, and MLA citation styles.  Make sure all your sources are properly cited.


Using OpenStreetMap XML data


OpenStreetMap is a great resource mostly user created and maintained. The data is free of charge and there are several ways of accessing the data for use in a GIS project.

One of the easiest and quickest ways is to download data from CloudMade in shapefile format, but occasionally not all of the features shown in OpenStreetMap are available in the shapefile download.

An alternative is to download the OSM XML data, open it in QGIS, and if needed, export to shapefile.

To do this, open QGIS and under the Plugins menu, select Manage Plugins and turn on the OpenStreetMap plugin. If it’s not there then you will have to add it from Fetch Python Plugins.

The plugin allows for viewing downloaded OSM XML data, downloading large scale areas directly, and uploading edits you’ve made (account required).

Click on the Load OSM from file icon and navigate to the downloaded OSM XML layer.

Put a check mark next to the fields you want to create (these will only populate if the information is encoded). Keep a check mark next to Use custom renderer if you want to symbolize your data similar to the OSM scheme.

If you need to work with any of the features in shapefile format, right click on a layer and select Save vector layer as, and choose ESRI shapefile from the Format pull down menu if not already selected.

And that’s it!

New datasets in CU Spatial Data Catalog

We’ve been working all summer to add records for our newer datasets in the CU spatial data catalog.  We will be detailing some of these new additions over the next few days.   Of notable interest is the addition of the vector layers for the 2008 ESRI Data & Maps Collection.  These layers include a variety of data types from throughout the world and are in ESRI’s compressed Smart Data Compression (SDC) format.   They can also be downloaded remotely for current Columbia affiliates.  Please note that most of the superseded data layers from previous years’ collections have been removed from the catalog in order to make finding the most recent data easier.  If you do require any of the older editions, please come see us in the EDS lab.

BIN numbers, park names, & building heights!

Finally! We have better answers to questions we get asked quite frequently, DOITT has added variables to their datasets.

I’m not sure when the layers were updated (6/09?) but this is something that will help a lot for students and researchers.

The Building footprints layer now has a field containing the BIN, and is offered as a file geodatabase for use in ESRI software, which should help with drawing speed and file size.

All other layers now have attributes attached, the spot elevation layer looks like it was also updated with building heights (heights are from MSL?), and the street centerline layer has street widths.

This is a huge improvement over what was offered previously and am very glad to see these layers publicly available!

HGL’s New Look

The Harvard Geospatial Library has a whole new look and feel, using OpenLayers for the display and navigation map.

Searching and browsing datasets is also much improved, including the updated advanced search option.

A good portion of the 6,500+ records are publicly downloadable which makes this an amazing resource even for non Harvard affiliates.

I’m very impressed with what they’ve put together.