Introduction to R package ‘survey’ (1)

If you are using R for survey data analysis, you might find the ‘survey’ package is useful for you.

I assume that you have already known how to read/import data in R, so this blog will skip the steps of data cleaning and loading. After importing survey data in R, here are some functions you must know for survey data analysis.

All the functions introduced in this blog are with prefix “svy”. The first step is to define your survey data. The command is svydesign(). This week’s blog will introduce how to design simple random sampling (SRS) data, and in the next week, I will post more information about how to design stratified and clustered survey data.

Simple Random Sampling: The sample subjects are selected by an equal random chance.

svydesign(ids = ~1, strata = NULL, fpc = rep(N, n), data = dat)

ids = ~1 means there is no clustering
strata = NULL means there was no stratification
fpc = rep(N, n) N is population size, n is sample size
data = dat dat is your survey dataset name

You could assign a new name to your survey data, so that you could use it in the following data analysis steps.

For example, you received 305 surveys randomly from all the 6291 residents in a neighborhood about their basic demographic and socio-economic characteristics (such as age, gender, race, household income) and attitudes of online shopping. You have a survey dataset named ‘shopping’, with n=305 and N=6291, and you are going to define this dataset as survey data called “shoppingsvy”.

Firstly, import ‘shopping’ dataset in R. Then define your data as a simple random sampling data named as “shoppingsvy” as following:
shoppingsvy <- svydesign(ids=~1, strata=NULL, fpc = rep(6291, 305), data=shopping)