Skip to content
James Scott edited this page May 17, 2013 · 5 revisions

Welcome

Welcome to the UT Summer Statistics Institute! This short course covers several topics in multivariate data analysis in R, including:

  • simple group-wise models (introduction to ANOVA),
  • linear regression,
  • binary, ordinal, and multinomial logistic regression,
  • Poisson regression and its generalizations,
  • Survival analysis using the Cox proportional hazards model, and
  • Linear and generalized-linear mixed models.

First steps

  1. Install the latest versions of both R and RStudio. R is the engine here. If you're an R veteran, feel free to use the interface that comes along with the base distribution for your OS, or indeed any front-end of your choosing. But if you're relatively new to R, you should consider RStudio. It's a slick graphical front-end to R (technically it's an integrated development environment, or IDE). It provides a unified interface across all platforms, in addition to a lot of nice features. For newbies, RStudio's nicest feature is its ability to import data sets interactively. For you experienced R users, that means never worrying about the "Working Directory" ever again!

  2. Begin by installing the following libraries, or packages: mosaic, lattice, grid, survival, splines, Hmisc. (The main one we need for now is mosaic, but mosaic depends on the others. In fact, if you check the "Install Dependencies" box, they should be installed automatically along with mosaic.) If you need help installing packages in RStudio, consult this YouTube video. There will be other packages to install later, but this will get you started.

  3. Start playing around! That is, make a concerted effort to familiarize yourself with the R environment, especially if you are new to R, or haven't used it in awhile. There are hundreds, if not thousands, of tutorials on the web. They range from the concise to the voluminous. But for a self-contained introduction to doing basic statistical analyses with R, I highly recommend the textbook put together by Daniel Kaplan of Macalester College. He's done us all a great service by putting Chapters 1-5 of the book online for free.. Each chapter describes some basic statistical ideas, and ends with a section called ``Computational Technique'' that shows how to implement that chapter's ideas in R. As a review, we will discuss the material roughly at the level of his Chapters 4 and 5 in the first hour and a half of class. Thus you will have a much better experience overall if you've read this before we begin.

  4. This is strictly optional. But if you are looking for a bit of background on multiple regression analysis (which forms the intellectual core of this course), then I recommend Edward Tufte's oldie-but-goodie "Data Analysis for Politics and Policy." It's short, out of print, and available online for free.

Clone this wiki locally