- Spring 2021
Syllabus Description:
Zoom Link https://washington.zoom.us/j/96898385387
Welcome to Modern Data Analysis Techniques
Team taught by Miguel Morales (Physics) and Bryna Hazelton (eScience), the goal of this class is to introduce current techniques and best practices in the statistically rigorous analysis of large data sets. The class is organized around four themes: practical statistics, advanced data visualization, building collaborative analysis code, and advanced data analysis practices.
Grading
As a graduate elective, what you get out of the course largely depends on what you put into it. Further, this class is designed to scale depending on your interests and time. At one end, it is designed to provide motivated students with a firm grounding in advanced statistics and data analysis tools that can be used on a wide range of academic and professional problems. At the other end it is designed to serve as a low-pressure survey of modern analysis techniques. During the first week you will detail what your goals are, and your grade will be based on how well you achieve your goals. There will be no exams, with the homework and final project forming the basis of your grade.
Syllabus
(Lecture title links to zoom, link to slide pdfs follows. Syllabus still under development, subject to change.)
Week 1
T: Welcome; course overview; what does sigma mean? slides
Th: Analysis chains; Introduction to git & GitHub slides (analysis chain; git)
Homework: Intro quiz
Week 2
T:Statistical building blocks (slides; matlab pt1 & pt2)
Th: Data visualization pt. 1; slides
Homework: Homework #1
Week 3
T: Data visualization pt. 2 (more examples); poisson ± sigma difference; trials factors; parameter distributions; slides
Th: Workshopping your plots; systematics scavenger hunt; time-position variable backgrounds ; slides
Homework: HW #2
Week 4
T:jackknife & statistically valid plots; examples; HW discussion ; slides
Th: Stats review & common errors; python hints; slides
Week 5
T: Developing an analysis plan: statistical worries; git issues; slides
Th: Confidence intervals; slides
Homework: HW #3
Week 6
T: Metadata, Provenance & Test Thickets; slides
Th: Machine Learning (the blob pt 1); slides
Week 7
T: Parameters, inherited analyses (the blob pt 2); slides
Th: Blind & semi-blind analyses (Hertzog guest)
Week 8
Th (5/20): Presentations: Michael Pun, Debby Tran, Samantha Tetef
Week 9
T (5/25): Presentations: Anna Wirth-Singh, Chris Thomas
Th (5/27): Presentations: Samantha Gilbert, Miguel Morales
Week 10
T: Presentations: Tharindu W. Fernando; Data rampages
Th: Presentations: Rodolfo Garcia, Robert Pecoraro, David Wang
Holding pen (early): example of multi-dimensional probability; multi-parameter distributions, multi-dimensional spaces and triangle plots
Holding pen (late): art of parameters, blind & semi-blind analyses, peer reviewed code.