Research Data Services Blog – Providing support for and consultations on the research data needs at Columbia University

Text Mine ProQuest with ChatGPT

Posted on June 16, 2025 by mpd2149

On June 13, 2025, ProQuest announced that the popular text-mining environment TDM Studio now includes a beta feature that lets users integrate GPT models into their R or Python workbench notebooks. TDM Studio, available at no cost to researchers at Columbia, opens up the ProQuest databases to large-scale analyses of the full-text corpora. It comes […]

Libraries Acquire Full-Text Corpus Data

Posted on March 18, 2025 by mpd2149

In December, the Libraries acquired twelve full-text corpus datasets, compiled by Mark Davies, a retired professor of linguistics from Brigham Young University. The corpora will help Columbia researchers across many disciplines to understand how language is and has been used around the world, and they serve as another mark in the Libraries’ commitment to supporting […]

Data Engineering in Python with Polars 1

Posted on March 5, 2025 by mpd2149

Today, we begin learning Polars, an alternative data analysis Python library to pandas. We’ll learn about how Polars is similar to and different from pandas and why it is an appealing choice in 2025 for ETL (extract-transform-load) operations. […]

SQL and NoSQL Databases in Python with Pandas

Posted on February 19, 2025 by mpd2149

Today we looked at using databases in Python. […]

Git and Gitting Organized (Also, Text Editing)

Posted on February 6, 2025 by mpd2149

Today we talk a bit about project management and see how to use Git with VS Code. […]

Resource Spotlight: newly-purchased Dave Leip election datasets

Posted on December 4, 2024 by Wei Yin

The Research Data Services (RDS) just purchased a few new election datasets from Dave Leip for “United States Presidential Presidential Results” & “US Presidential Primary Election Results for Republican Party and Democratic Party”. All the RDS licensed Dave Leip datasets can be found in CLIO. This resource is available only to current Columbia affiliates. Please […]

Day One Exploratory Data Analysis with JavaScript

Posted on November 26, 2024 by mpd2149

Today we return back to our Observable notebooks to learn how to do lightning fast exploratory data analysis! […]

Resource Spotlight: University of Florida Election Lab

Posted on November 12, 2024 by Eric

The University of Florida Election Lab Data Resources is a new resource that presents precinct level data for US national state and local elections for recent election years (as far back as 2010). […]

Day One Generating Jamstack Websites

Posted on November 7, 2024 by mpd2149

Today we ported our knowledge of the Observable workflow into making our own bespoke Jamstack websites. This was a rocky road, but everyone won in the end! […]

Installing Observable Framework from Zero

Posted on November 2, 2024 by mpd2149

On November 7, we’ll be deconstructing websites built with Observable’s “Framework” framework for making data-driven web apps like dashboards. But before we can deconstruct, we have to construct. This short video shows you how to get an Observable Framework site running in four steps. […]