In the previous 3 blogs, I have introduced how to define survey data and do descriptive statistics (here are the links for R ‘survey’ package blog (1) (2) (3)). Today, I am going to introduce basic regression syntax in this package.
svyglm() # generalized linear regression using survey data
Let’s use the two-stage cluster sample (we have introduced in blog (2)) “apiclus2” as an example. Let’s assume api00 is the dependent variable, ell, meals and mobility are independent variables, survey data is defined using svydesign() function, named as “dclus2”. The syntax of this generalized linear model is written as following.
svyglm(api00 ~ ell + meals + mobility, design = dclus2)
# The default family is linear regression, if you aim for non-linear regression, for example binomial logistic regression, the syntax could be modified as following. stype is the dependent variable in this model.
svyglm(stype ~ ell + meals + mobility, design = dclus2, family=binomial)
The full version of manual about package ‘survey’ is here. Please check more functions and detailed descriptions, arguments and examples in this link. If you would like to discuss any further questions based on this blog, feel free to email data@library.columbia.edu.