- Autumn 2022
Welcome to Modern Data Analysis Techniques
Team taught by Miguel Morales (Physics) and Bryna Hazelton (eScience), the goal of this class is to introduce current techniques and best practices in the statistically rigorous analysis of large data sets. The class is organized around four themes: practical statistics, advanced data visualization, collaborative analysis code, and advanced data analysis practices.
Everyone learns so much better in person please come to class in person whenever possible, and the room is a new space B143 across from the SPS lounge. That said, many advanced students need to travel for research and covid is still around, so we will offer zoom on demand and will endeavor to record the classes. Send Miguel and email if you want zoom for a class, and this will be the link we use.
Miguel Morales, Monday 1:30-2:30, plus by appointment or opportunity. C325.
Bryna Hazelton, Thursday 2-3 or by appointment. C-wing 6th floor (eScience Institute).
As a graduate elective, what you get out of the course largely depends on what you put into it. Further, this class is designed to scale depending on your interests and time. At one end, it is designed to provide motivated students with a firm grounding in advanced statistics and data analysis tools that can be used on a wide range of academic and professional problems. At the other end it is designed to serve as a low-pressure survey of modern analysis techniques. During the first week you will detail what your goals are, and your grade will be based on how well you achieve your goals. There will be no exams, with the homework and final project forming the basis of your grade.
Themes: Practical Statistics; Data Visualization; Collaborative Analysis; Advanced Data Analysis Practices
Th: Welcome; course overview; what does sigma mean? video, slides
Homework: Intro quiz
T: Introduction to git & GitHub video, slides
Th: Statistical building blocks (video; slides)
Homework: Homework #1 (git game)
T: Data visualization pt. 1; (video; slides)
Th: No class
Homework: Homework #2 (intro to stats)
Wikipedia entries can be useful, look under ‘related distributions’
T: Data visualization pt. 2, workshopping plots, analysis plans, worry lists; (video; slides)
Th: Trials factors; parameter distributions; (video; slides)
Homework: Homework #3
T: Parameters cont.; Fisher matrix; triangle plots; variable backgrounds; (video; slides)
Th: Statistically valid plots; jackknife tests; (video; slides)
T: Developing an analysis plan; (video; slides)
Th: Confidence intervals; (video; slides)
T: Metadata, Provenance & Test Thickets; (video; slides)
Th: Stats mini-review; the blob, analysis dragons; (video, slides)
T: Deconvolution/forward modeling; ML overview (video, slides)
Th: Machine Learning (Sam Tetef); plots as a language (Sam's slides, Miguel's slides, video)
T: Blind & semi-blind analyses; data rampages (video, slides)
T: Presentations: Jordan Fonseca; Charles Cardot (video)
Th: Presentations: Omar Beesley; Cautionary examples of statistical errors (video; slides)
T: Presentations: Michaela Guzzetti; Akira Pfeffer; Chris Munley (video)
Th: Presentations: Caio Nascimento; Valeria Hurtado; Murali Saravanan (video)
Washington state law requires that UW develop a policy for accommodation of student absences or significant hardship due to reasons of faith or conscience, or for organized religious activities. The UW’s policy, including more information about how to request an accommodation, is available at Religious Accommodations Policy (https://registrar.washington.edu/staffandfaculty/religious-accommodations-policy/). Accommodations must be requested within the first two weeks of this course using the Religious Accommodations Request form (https://registrar.washington.edu/students/religious-accommodations-request/).