EOSC 510 · Data Analysis in Atmospheric, Earth and Ocean Sciences

Instructor: Valentina Radic ( )
TA: Sam Anderson ( )

Course Description
This is a course for graduate-level students in programs across geosciences (e.g., Atmospheric Sciences, Environmental Sciences, Geophysics, Geological Engineering, Geology, Oceanography) where students will develop deeper understanding of a research process in their specialization. Specifically, students will gain advanced technical skills for data analysis and empirical modeling to tackle research questions drawn from across the spectrum of Earth, ocean, atmospheric and planetary sciences. The goal of the course is not on the techniques per se, but on developing of research objectives with identified data sources, and choosing and applying appropriate analysis techniques with a focus on delivering discipline-specific results. The computer labs and assignments facilitate student's 'learning by doing'.

Examples of research questions and methods for their tackling:
Questions: What is a relationship among multiple variables in a given system/phenomena? What variables are the dominant drivers of a given phenomena?
Methods: Linear regression, multiple linear regression, stepwise regression

Questions: What are the most significant modes (behaviors) in a system and how are they inter-related? How to 'compress' a big data without loosing its essential information, i.e. how to meaningfully reduce degrees of freedom in a system?
Methods: Principal component analysis and canonical component analysis

Questions: How to decompose a noisy signal in order to find any signals of interest? How to effectively analyze a time series?
Methods: Fourier spectral analysis, filters, and singular spectrum analysis

Questions: What are the most characteristic features (temporal or spatial patterns) in a given large dataset? How to split a large dataset into ‘meaningful’ clusters/groups?
Methods: Classification and clustering (e.g. Self-Organizing Maps, hierarchical clustering)

Questions: How to derive a model just by using the data, i.e. without any a priori knowledge of the physical processes in the system? How to test the performance of such model? How to correctly calibrate, validate and test empirical models?
Methods: Linear and non-linear empirical models; model calibration, optimization and validation

Course Outline
The course is taught through lectures and labs, and through online supporting material (online notes and videos). The online material covers theoretical development of each method. The lectures aim to demonstrate in detail the application of each analysis technique on a variety of different research project from geosciences. The emphasis is on a conceptual and practical understanding (as oppose to only theoretical understanding) of the learned methods by demonstrating how they work in practice, i.e. when applied on real datasets. Instructor will introduce a dataset and research questions drawn from a given field (e.g. atmospheric science, volcanology, seismology, glaciology), walk the students through the application of a given method, present results and lead an interactive discussion on successes and limitations of the application. Labs are designed as workshops where students will perform programing in MATLAB to solve a given set of data-oriented research problems. During the labs, students will work individually or in groups, while the instructor will provide assistance and guidance. For each lab, a problem set description will be given that introduces a dataset and outlines a set of research questions. 

During each week students are expected to view/read the online material (ca 1.5 hours per week) and come prepared to the weekly lecture (1.5 hours) on Tuesday. Labs on Thursdays will consist of 'hands-on' computer exercises (1.5 hours) on the concepts covered during lectures. Students need to bring their own laptops to the labs. Labs on Tuesdays afternoon (2-3 hours), which are mandatory only for undergrad students, will consist of a recap of the online material, extra tutorials with the lab exercises, and office hours.

Topic outline by week (Winter Term 2018/19):

All material posted is protected by copyright. All rights reserved. Please contact Valentina Radic for permission to copy, distribute or reprint. 

• Week 1 (2-4 Jan)
Thursday: Introduction to the course (presentation from the class)
Quiz 1 (to be handed in or emailed to instructor by Thu, 10th Jan)

• Week 2 (7-11 Jan)
Online lectures/readings before the class (Tue): Chapter 1. Mean and variance, Correlation, Linear regression, Multiple linear regression, MATLAB programming (Ch1.pdf, Ch1_Q_solns.pdf - PDF file containing solutions to questions posed in the videos) 
Recommended: useful material (from EOSC250 course) if you need to brush up on calculus and linear algebra 
Lecture (Tue): Multiple linear regression and stepwise regression in MATLAB (example on synthetic and real data; Tutorial2.zip - class presentation, Matlab scripts and data files)
Extra lab (Tue 2-5 pm): Intro to MATLAB (Lab_material.zip)
Lab (Thu): Multiple linear regression and stepwise regression (Lab2.zip, Lab2_solutions.m)
Quiz 2 (to be handed in or emailed to instructor by Thu, 17th Jan)

• Week 3 (16-18 Jan)
Online lectures/readings before the class (Tue): Chapter 2. Principal component analysis (PCA) and rotated PCA: Geometric approachEigenvector approach, Complex data; (Ch2a.pdf, Ch2_Q_solns.pdf)
Lecture (Tue): Intro to PCA, PCA in MATLAB (examples on synthetic data); Tutorial3.zip
Extra lab & office hours (Tue 2-5 pm): Multiple linear regression and stepwise regression
Lab (Thu): PCA (Lab3.zipLab3_solutions.m)
Quiz 3 (to be handed in or emailed to instructor by Thu, 24th Jan)
ASSIGNMENT 1 (solutions)

• Week 4 (21-25 Jan)
Online lectures/readings before the class (Tue): Chapter 2. PCA applied on real data, Scaling; degeneracy, Smaller covariance matrix; mean removal, Singular value decomposition, Missing data; significance tests (Ch2b.pdf)
Lecture (Tue): PCA in MATLAB (example on real data); Tutorial4.zip - large file!
Extra lab & office hours (Tue 2-5 pm): PCA
Lab (Thu): PCA on real data (Lab4.zipLab4_solutions.m, Lab4_part2_solutions.m)
Quiz 4 (to be handed in or emailed to instructor by Thu, 31st Jan)

• Week 5 (28 Jan-1 Feb)
Online lectures/readings before the class (Tue): Chapter 2. Rotated PCA, Varimax; teleconnection patterns, PCA versus Rotated PCA, (Optional: PCA for vectors),
Chapter 3. Canonical correlation analysis (CCA), CCA theory (part 1), CCA theory (part 2), Pre-filter by PCA, Maximum covariance analysis (Ch2c.pdf, Ch3.pdf, Ch3_Q_solns.pdf)
Lecture (Tue): Rotated PCA and CCA in MATLAB (synthetic and real data); Tutorial5.zip
Extra lab & office hours (Tue 2-5 pm): PCA 
Lab (Thu): Rotated PCA and CCA (Lab5.zip)
Quiz 5 (to be handed in or emailed to instructor by Thu, 7th Feb)
ASSIGNMENT 2 (solutions)

• Week 6 (4-8 Feb)
Online lectures/readings before the class (Tue): Chapter 4. Time series, Fourier spectral analysis: autospectrum, Autospectrum (part 1), Autospectrum (part 2), Cross-spectrum (Ch4a.pdfCh4_Q_solns.pdf)
Lecture (Tue): FSA on synthetic data in MATLAB; Tutorial6.zip
Extra lab & office hours (Tue 2-5 pm): Rotated PCA and CCA
Lab (Thu): FSA on real data (Lab6.zipLab6_solutions.m)
Quiz 6 (to be handed in or emailed to instructor by Thu, 14th Feb)

• Week 7 (11-25 Feb)
Online lectures/readings before the class (Tue): Chapter 4. Windows, Filters (part1), Filters (part2), Singular spectrum analysisMultichannel singular spectrum analysis (Ch4b.pdf, Ch4c.pdf)
Lecture (Tue): Filtering and SSA in MATLAB; Tutorial7.zip
Extra lab & office hours (Tue 2-5 pm): FSA
Lab (Thu): Filtering and SSA (Lab7.zip)
Quiz 7 (to be handed in or emailed to instructor by Thu, 28th Feb)

• Week 8 (25 Feb-1 Mar)
Online lectures/readings before the class (Tue): Chapter 5. Classification and clustering, Classification: k-nearest neighbour classifier, Conditional probabilities, Bayes' theorem, Logistic regression, Clustering: k-means clustering, Hierarchical clustering (Ch5a.pdf, Ch5_Q_solns.pdf, Ch5b.pdf)
Lecture (Tue): Clustering; Tutorial8.zip
Extra lab & office hours (Tue 2-5 pm): Filtering and SSA 
Lab (Thu): Classification and clustering (Lab8.zipLab8_solutions.m)
Quiz 8 (to be handed in or emailed to instructor by Thu, 7th Mar)
ASSIGNMENT 3 (solutions)

• Week 9 (4-8 Mar)
Online lectures/readings before the class (Tue): Chapter 5. Self-organizing maps, Chapter 6. Feed-forward neural network models: McCulloch and Pitts model, Perceptrons, Limitations of perceptrons (Ch5c.pdf, Ch6a.pdf, Ch6_Q_solns.pdf)
Lecture (Tue): Application of Self-organizing maps (SOMs); Tutorial9.zipsomtoolbox.zip
Extra lab & office hours (Tue 2-5 pm): Classification and clustering
Lab (Thu): SOMs (Lab9.zipLab9_solutions.m)
Quiz 9 (to be handed in or emailed to instructor by Thu, 14th Mar)

• Week 10 (11-15 Mar)
Online lectures/readings before the class (Tue): Chapter 6. Multi-layer perceptrons (MLP) - part 1, MLP - part 2, MLP - part 3, Back-propagation, Hidden neurons, MLP classifier (Ch6b.pdf)
Lecture (Tue): Non-linear empirical modelling (example on synthetic data); Tutorial10.zip
Extra lab & office hours (Tue 2-5 pm): SOMs
Lab (Thu): Non-linear empirical modelling (Lab10.zipLab10_solutions.m)
Quiz 10 (to be handed in or emailed to instructor by Thu, 21st Mar)

• Week 11 (18-23 Mar)
Online lectures/readings before the class (Tue): Chapter 7. Nonlinear optimization, Gradient descent methods​, Chapter 8. Learning and generalization: Mean squared error and maximum likelihood, Objective functions and robustness, Variance and bias errors, Regularization (Ch7.pdf, Ch8a.pdf, Ch7_Q_solns.pdfCh8_Q_solns.pdf)
Lecture (Tue): Non-linear empirical modelling (example on real data); Tutorial11.zip
Extra lab & office hours (Tue 2-5 pm): Non-linear empirical modelling
Lab (Thu): Non-linear empirical modelling (Lab11.zip)
ASSIGNMENT 4 (solutions)

• Week 12 (25-29 Mar)
Online lectures/readings before the class (Tue): Chapter 8. Cross-validation​, Bayesian neural networks, Errors of ensembles, Nonlinear ensemble averaging; boosting, Linearization from time-averaging, Regularization of linear models (Ch8b.pdf)
Lecture (Tue): Recap of the course material + guest presentation (Recap.pdf)
Extra lab & office hours (Tue 2-5 pm): In-class work on students' projects 
Lab (Thu): In-class work on students' projects 

• Week 13 (1-4 Apr)
Presentations (Tue): Students' project presentations (part 1; grads)
Presentations (Tue 2-5 pm): Students' project presentations (part 2; undergrads and grads)
Presentations (Thu): Students' project presentations (part 3; grads)
Expectations and grading rubric (for presentations and final reports)
Presentations schedule