In the ‘plm’ package blog (2), we’ve gotten regression outputs for both fixed and random effect models. One common question after getting regression output is to figure out which model should be chosen using Hausman test. The fixed effect output is names as “grun.fe” and the random effect output is names as “grun.re”. The function […]

# Category: Statistics & Data

## Introduction to R ‘plm’ package (2)

The first blog for “plm” package provides basic information about how to define panel data. This blog aims to introduce syntax for both fixed and random effects regression models. The dataset “Grunfeld” is a balanced panel of 10 observational units (firms) from 1935 to 1954, and we are going to use this dataset to run […]

## Introduction to R ‘plm’ package (1)

This blog is an introduction to use ‘plm’ package for panel data analysis. Panel data means datasets with the same observations (respondents) and variables across different time units (such as year, month). And it’s common for researchers to have an unbalanced panel dataset in practice (for example, GDP data could be missing in different years […]

## Introduction to R ‘survey’ package (4)

In the previous 3 blogs, I have introduced how to define survey data and do descriptive statistics (here are the links for R ‘survey’ package blog (1) (2) (3)). Today, I am going to introduce basic regression syntax in this package. svyglm() # generalized linear regression using survey data Let’s use the two-stage cluster sample […]

## Introduction to R ‘survey’ package (3)

After defining your survey dataset (please refer back to ‘survey’ package blog (1) & (2) ), you could use the functions below to describe your survey data and estimate population. Let’s still use apiclus1 data. After svydesign() function, you have a designed survey dataset, dclus1, which we designed in the last week. In this dataset, […]

## Introduction to R package ‘survey’ (2)

Here are more types of survey data except the case (simple random sample) we introduced before. The ‘survey’ package contains several sample datasets from the California Academic Performance Index. After installing and loading the ‘survey’ package, you could import these data samples using command: data(api). And you will see 5 datasets are loaded in R, […]

## Introduction to R package ‘survey’ (1)

If you are using R for survey data analysis, you might find the ‘survey’ package is useful for you. I assume that you have already known how to read/import data in R, so this blog will skip the steps of data cleaning and loading. After importing survey data in R, here are some functions you […]

## Survey Documentation and Analysis (SDA)

Survey Documentation and Analysis (SDA) is a web based interface that allows access and analysis of data. The data can be accessed from IPUMS or from the Inter-university Consortium for Political and Social Research (ICPSR). SDA allows you to: Browse the codebook describing a dataset Calculate frequencies or crosstabulation (with charts) Do comparison of means […]

## R Open Labs – Basic Syntax

Hi all, last Wednesday we kicked off the first session of R Open Lab in the DSSC( based in Lehman Library). We started with basic syntax and briefly discussed how to explore the features of our datasets. We used data from Wal-Mart and we will continue exploring this dataset for the next few sessions. Beginners are welcome […]

## Python Open Labs Session – 2

In the second session of Python Open Labs, the focus was on conditional statements. Topics covered include : Conditional Operators, Conditional Statements, Boolean Expressions, Python’s obsession with indentation (and the idea of scope!), Two Way Decisions, Multi-way Decisions (using if-else, and elif). The code on slides is in Python 2.7 – So if you use anything […]