Hi all, last Wednesday we kicked off the first session of R Open Lab in the DSSC( based in Lehman Library). We started with basic syntax and briefly discussed how to explore the features of our datasets. We used data from Wal-Mart and we will continue exploring this dataset for the next few sessions. Beginners are welcome to join!
See you this Wednesday 10/12/2016 at 10:00 AM!
In the second session of Python Open Labs, the focus was on conditional statements. Topics covered include : Conditional Operators, Conditional Statements, Boolean Expressions, Python’s obsession with indentation (and the idea of scope!), Two Way Decisions, Multi-way Decisions (using if-else, and elif).
The code on slides is in Python 2.7 – So if you use anything > 2.7, please remember to replace raw_input with input and add parenthesis to your print statements!
The course material from all Python Open Labs Session is available on Lion Mail Drive link here.
PS: Google drive link requires Columbia UNI login, in case you don’t have one or you prefer using another email ID, please request on the drive link to grant access and Kunal will do so.
Last week, we kicked off R Open Labs with a demo on Base Graphics, or how to make graphics using basic commands. We also gave a brief intro to Swirl, a great package for learning R.
You can catch up here with these helpful slides: Base Graphics System
Check back next week, Wednesday, Feb. 3 at 10 am, for a quick demo on how to load different data files into R, a free I <3 R button, and plenty of time to practice your code and ask questions.
Got feedback or want to suggest a package to demo? Leave a comment or take our short survey!
It might seem quiet around the DSSC this week as we gear up for the fall semester, but it’s the perfect time to catch up on summer reading. Here are a few interesting data stories that have come across our radar in recent days:
- Nathan Yau of Flowing Data made a Statistical Atlas in R, using 2010 census data and an 1874 design aesthetic.
- Ramiro Gómez provides directions in matplotlib to make a surprisingly evocative map of the UK, using the locations of pubs.
- LMGTDFY, Let Me Get That Data For You, searches .gov websites to find out what data is publicly available for download.
Got a data project in mind? Stop by the DSSC, 1pm to 5pm, Monday to Friday to explore what’s possible and cool off from this heat!
Click for interactive map
There are so many places to get Census boundaries, but often for NYC, the layers from NYC Dept of City Planning BYTES of the Big Apple are the most detailed.
However, these boundaries do not contain fields to join with some of the more popular sources for Census variables, either for the the 2010 Decennial Census or the American Community Survey 5-year estimates.
It doesn’t take too much time to create the various fields, and as you can see in the examples, these are very similar with just a couple minor variations. The boundaries are available in the data catalog.
NYC Planning uses a seven character ID identifying tracts, the first digit is the borough ID and the remaining six are the Census Bureau defined tract ID which is good if joining to the tables created by NYC Planning only.
The Census Bureau uses an 11 character ID for joining with data from the Census Bureau American FactFinder or Social Explorer
NHGIS uses a 14 character ID
Infoshare uses a 10 character ID
The 2010 Tract boundaries can be joined with data from
- The 2010 Decennial Census
- American Community Survey (ACS) 5-yr estimates
(except ’05-’09 which uses the 2000 boundaries!)
Looking for opinion poll or survey information? Search these databases:
iPoll Databank US polls originally gathered by academic, commercial and media survey organizations such as Gallup Organization, Harris Interactive, Pew Research Associates, and many more. From 1935 to present.
Polling the Nations Polls on a variety of subjects conducted by over 1000 polling organizations in the United States and 100 other countries from 1986 to the present time.
Peruse also this research guide on Opinion Poll Data
The libraries provide access to both Bloomberg and DataStream.
Bloomberg Provides current and historical financial quotes, business newswires, and descriptive information, research and statistics on over 52,000 companies worldwide. Printing and downloading are available. For search help see our Bloomberg Help Guide.
Datastream Provides statistical time series covering stocks, bonds, commodities and economic data for numerous countries, along with company profiles. Coverage varies by dataset.
One Bloomberg terminal is available in the Digital Social Science Center in Lehman Library and many others are available in the Business Library. Datastream is available on both the Digital Social Science Center and Business Library computers.
In honor of America's war veterans the Census Bureau has add a new web page to its InfoGraphics series. The InfoGraphics series is part of the Bureau's efforts to use data visualization techniques to present in an easy to understand format the information contained in it data collection.
The Memorial Day Information Graphic is located at http://www.census.gov/how/infographics/memorial_day.html. It includes a timeline across major conflicts with counts of those who served and who died. There are also graphics pertaining to the current makeup of our armed forces