Welcome to CSI 771

Computational Statistics

Fall, 2000

Instructor: James Gentle

Instructor's email: jgentle@gmu.edu

Class meets on Wednesdays from 4:30pm to 7:10pm.

This Web page will evolve as the semester progresses.


This course is about modern, computationally-intensive methods in statistics. It emphasizes the role of computation as a fundamental tool of discovery in statistical analysis.

Topics to be covered include

  • Monte Carlo studies in statistics
  • Data partitioning and resampling
  • Graphical methods in computational statistics
  • Nonparametric probability density estimation
  • Statistical models and data fitting

    Prerequsites for this course include a course in applied statistics and a course in statistical inference.


    The text for the course is Computationallly-Intensive Methods of Statistics, which will be distributed as separate sections during the semester.
    Corrections will be accumulated during the semester.

    Student work in the course (and the relative weighting of this work in the overall grade) will consist of

  • a number of small assignments, problems, etc. (15)
  • a semester project to replicate and extend a published Monte Carlo study (30)
  • an in-class midterm (25)
  • a final exam consisting of an in-class component and a take-home component (30)

    Each student will prepare a Web page for presentation of the project and for some of the smaller assignments.

    August 30

    Course overview; method of communication
    Computer organization: Unix and basic tools; S-Plus
    Computational statistics
    Monte Carlo studies
    Random number generation in S-Plus

    September 6

    Discussion of Monte Carlo studies; Student presentations of descriptions of articles (first project milestone)
    Monte Carlo methods for statistical inference

    September 13

    Discussion of projects if necessary (second project milestone)
    Markov chain Monte Carlo
    Assignment: Exercises 1.1, 1.2, 1.3

    September 20

    Student presentations of plans for projects (third project milestone)
    Markov chain Monte Carlo
    Data partitioning: cross validation; jackknife

    September 27

    Data partitioning: cross validation; jackknife
    Bootstrap methods
    Assignment: Exercises 1.6, 1.8, 1.9, 1.10, 1.11, 1.12, 1.16
    Addition for 1.8.b: "Consider some special cases, especially when p and g are very close. Consider, for example, the degenerate case in which p(x)=g(x)=6x(1-x), for 0 Correction for 1.16: Insert "When T is the sample mean, that is, when J(T) = T, "
    (The point of the execrise is to provide additional intuition for the jackknifed variance estimator.)

    October 4

    Bootstrap methods
    Assignment: Consider the plug-in variance estimator (that is, the sum of squares divided by n instead of n-1). Let t be the functional that yields this estimator. Using t(P_n^(1)) and t(P_n), determine the correction for the bias.

    October 11

    Columbus Day holiday (no class)

    October 18

    More on jackknifing and booststrap; review of homework and other problems.

    October 25

    Midterm (in class)

    November 1

    Student presentations of Monte Carlo studies (fourth project milestone)
    Probability density estimation
    Assignment: Exercises 2.1, 2.2, 2.3

    November 8

    Student reviews of Monte Carlo studies (fifth project milestone)

    Assignment: Exercises 2.5, 2.10, 2.13

    November 15

    Probability density estimation
    Structure in multivariate data
    Assignment: Exercises 2.21, 2.22, 2.24 (due Nov 29)

    November 22

    Structure in multivariate data
    Graphical displays, grand tour

    November 29

    Student final presentations of Monte Carlo studies (sixth project milestone)

    December 6

    Statistical model building
    Transformations to fit models
    Handout take-home portion of final exam

    December 13

    Take-home portion of final exam due
    In-class portion of final exam

    Computational Resources

    Labs with Unix workstations are available for use in this class in both CSI and IT&E.
  • CSI facilities.
  • Software available in SITE labs.

    Other Resources

  • S (or S-Plus) Cheatsheet (courtesy of Barry Brown, University of Texas at Houston)

    The most important WWW repository of statistical stuff (datasets, programs, general information, connection to other sites, etc.) is StatLib Index at Carnegie Mellon.

    Students

    The students in the class all have homepages on which they put parts of their assignments and other interesting stuff.
    Yaru Li
    Mark Lukens
    Jon Schuler
    Chunguang Yu

    James Gentle, jgentle@gmu.edu