EOSC 510 · Data Analysis in Atmospheric, Earth and Ocean Sciences

IMPORTANT: The course is offered in Term 1 (fall) of the school year 2020/21. Due to COVID-19, the course is offered online only.  The schedule is:
Tuesdays (lecture) 11:00 - 12:30 
Thursdays (tutorial/lab) 11:00 - 12:30 or 18:30 - 20:00
Please register for the preferred tutorial/lab time on Thursdays (it's the same tutorial offered at two different times). Once selected, you'll need to stick to that time throughout the term. This option is given to enable students with time zone conflicts to make it to the tutorials/labs. Tuesdays' lectures will be recorded and available for viewing throughout the term. 

Course Availability & Schedule

Instructor: Valentina Radic ( )
Teaching Assistant: TBA

Course Description
This is a course for graduate-level students in programs across geosciences (e.g., Atmospheric Sciences, Environmental Sciences, Geophysics, Geological Engineering, Geology, Oceanography) where students will develop deeper understanding of a research process in their specialization. Specifically, students will gain advanced technical skills for data analysis and empirical modeling to tackle research questions drawn from across the spectrum of Earth, ocean, atmospheric and planetary sciences. The goal of the course is not on the techniques per se, but on developing of research objectives with identified data sources, and choosing and applying appropriate analysis techniques with a focus on delivering discipline-specific results. The computer labs and assignments facilitate student's 'learning by doing'.

Examples of research questions and methods for their tackling:
Questions: What is a relationship among multiple variables in a given system/phenomena? What variables are the dominant drivers of a given phenomena?
Methods: Linear regression, multiple linear regression, stepwise regression

Questions: What are the most significant modes (behaviors) in a system and how are they inter-related? How to 'compress' a big data without loosing its essential information, i.e. how to meaningfully reduce degrees of freedom in a system?
Methods: Principal component analysis and canonical component analysis

Questions: How to decompose a noisy signal in order to find any signals of interest? How to effectively analyze a time series?
Methods: Singular spectrum analysis

Questions: What are the most characteristic features (temporal or spatial patterns) in a given large dataset? How to split a large dataset into ‘meaningful’ clusters/groups?
Methods: Classification and clustering (e.g. Self-Organizing Maps, hierarchical clustering)

Questions: How to derive a model just by using the data, i.e. without any a priori knowledge of the physical processes in the system? How to test the performance of such model? How to correctly calibrate, validate and test empirical models?
Methods: Linear and non-linear empirical models; model calibration, optimization and validation

The materials below are from school year 2018/19 (the new materials for 2020/21 will be modified and updated):

Course Outline
The course is taught through lectures and labs, and through online supporting material (online notes and videos). The online material covers theoretical development of each method. The lectures aim to demonstrate in detail the application of each analysis technique on a variety of different research project from geosciences. The emphasis is on a conceptual and practical understanding (as oppose to only theoretical understanding) of the learned methods by demonstrating how they work in practice, i.e. when applied on real datasets. Instructor will introduce a dataset and research questions drawn from a given field (e.g. atmospheric science, volcanology, seismology, glaciology), walk the students through the application of a given method, present results and lead an interactive discussion on successes and limitations of the application. Labs are designed as workshops where students will perform programing (in MATLAB or Python) to solve a given set of data-oriented research problems. During the labs, students will work individually or in groups, while the instructor will provide assistance and guidance. For each lab, a problem set description will be given that introduces a dataset and outlines a set of research questions. 

During each week students are expected to view/read the online material (ca 1.5 hours per week) and come prepared to the weekly lecture (1.5 hours) on Tuesday. Labs on Thursdays will consist of 'hands-on' computer exercises (1.5 hours) on the concepts covered during lectures. Students need to bring their own laptops to the labs. 

The materials below are from school year 2018/19 (the new materials for 2020/21 will be modified and updated):

• Week 1 
Thursday: Introduction to the course (presentation from the class)

• Week 2 
Online lectures/readings before the class (Tue): Chapter 1. Mean and variance, Correlation, Linear regression, Multiple linear regression, MATLAB programming (Ch1.pdf, Ch1_Q_solns.pdf - PDF file containing solutions to questions posed in the videos) 
Recommended: useful material (from EOSC250 course) if you need to brush up on calculus and linear algebra 
Lecture (Tue): Multiple linear regression and stepwise regression in MATLAB (example on synthetic and real data; Tutorial2.zip - class presentation, Matlab scripts and data files)
Extra lab (Tue 2-5 pm): Intro to MATLAB (Lab_material.zip)
Lab (Thu): Multiple linear regression and stepwise regression (Lab2.zip)

• Week 3 
Online lectures/readings before the class (Tue): Chapter 2. Principal component analysis (PCA) and rotated PCA: Geometric approachEigenvector approach, Complex data; (Ch2a.pdf, Ch2_Q_solns.pdf)
Lecture (Tue): Intro to PCA, PCA in MATLAB (examples on synthetic data); Tutorial3.zip
Extra lab & office hours (Tue 2-5 pm): Multiple linear regression and stepwise regression
Lab (Thu): PCA (Lab3.zip)

• Week 4 
Online lectures/readings before the class (Tue): Chapter 2. PCA applied on real data, Scaling; degeneracy, Smaller covariance matrix; mean removal, Singular value decomposition, Missing data; significance tests (Ch2b.pdf)
Lecture (Tue): PCA in MATLAB (example on real data); Tutorial4.zip - large file!
Extra lab & office hours (Tue 2-5 pm): PCA
Lab (Thu): PCA on real data (Lab4.zip)

• Week 5 
Online lectures/readings before the class (Tue): Chapter 2. Rotated PCA, Varimax; teleconnection patterns, PCA versus Rotated PCA, (Optional: PCA for vectors),
Chapter 3. Canonical correlation analysis (CCA), CCA theory (part 1), CCA theory (part 2), Pre-filter by PCA, Maximum covariance analysis (Ch2c.pdf, Ch3.pdf, Ch3_Q_solns.pdf)
Lecture (Tue): Rotated PCA and CCA in MATLAB (synthetic and real data); Tutorial5.zip
Extra lab & office hours (Tue 2-5 pm): PCA 
Lab (Thu): Rotated PCA and CCA (Lab5.zip)

• Week 6 
Online lectures/readings before the class (Tue): Chapter 4. Time series, Fourier spectral analysis: autospectrum, Autospectrum (part 1), Autospectrum (part 2), Cross-spectrum (Ch4a.pdfCh4_Q_solns.pdf)
Lecture (Tue): FSA on synthetic data in MATLAB; Tutorial6.zip
Extra lab & office hours (Tue 2-5 pm): Rotated PCA and CCA
Lab (Thu): FSA on real data (Lab6.zip)

• Week 7 
Online lectures/readings before the class (Tue): Chapter 4. Windows, Filters (part1), Filters (part2), Singular spectrum analysisMultichannel singular spectrum analysis (Ch4b.pdf, Ch4c.pdf)
Lecture (Tue): Filtering and SSA in MATLAB; Tutorial7.zip
Extra lab & office hours (Tue 2-5 pm): FSA
Lab (Thu): Filtering and SSA (Lab7.zip)

• Week 8 
Online lectures/readings before the class (Tue): Chapter 5. Classification and clustering, Classification: k-nearest neighbour classifier, Conditional probabilities, Bayes' theorem, Logistic regression, Clustering: k-means clustering, Hierarchical clustering (Ch5a.pdf, Ch5_Q_solns.pdf, Ch5b.pdf)
Lecture (Tue): Clustering; Tutorial8.zip
Extra lab & office hours (Tue 2-5 pm): Filtering and SSA 
Lab (Thu): Classification and clustering (Lab8.zip)

• Week 9 
Online lectures/readings before the class (Tue): Chapter 5. Self-organizing maps, Chapter 6. Feed-forward neural network models: McCulloch and Pitts model, Perceptrons, Limitations of perceptrons (Ch5c.pdf, Ch6a.pdf, Ch6_Q_solns.pdf)
Lecture (Tue): Application of Self-organizing maps (SOMs); Tutorial9.zipsomtoolbox.zip
Extra lab & office hours (Tue 2-5 pm): Classification and clustering
Lab (Thu): SOMs (Lab9.zip)

• Week 10 
Online lectures/readings before the class (Tue): Chapter 6. Multi-layer perceptrons (MLP) - part 1, MLP - part 2, MLP - part 3, Back-propagation, Hidden neurons, MLP classifier (Ch6b.pdf)
Lecture (Tue): Non-linear empirical modelling (example on synthetic data); Tutorial10.zip
Extra lab & office hours (Tue 2-5 pm): SOMs
Lab (Thu): Non-linear empirical modelling (Lab10.zip)

• Week 11 
Online lectures/readings before the class (Tue): Chapter 7. Nonlinear optimization, Gradient descent methods​, Chapter 8. Learning and generalization: Mean squared error and maximum likelihood, Objective functions and robustness, Variance and bias errors, Regularization (Ch7.pdf, Ch8a.pdf, Ch7_Q_solns.pdfCh8_Q_solns.pdf)
Lecture (Tue): Non-linear empirical modelling (example on real data); Tutorial11.zip
Extra lab & office hours (Tue 2-5 pm): Non-linear empirical modelling
Lab (Thu): Non-linear empirical modelling (Lab11.zip)

• Week 12 
Online lectures/readings before the class (Tue): Chapter 8. Cross-validation​, Bayesian neural networks, Errors of ensembles, Nonlinear ensemble averaging; boosting, Linearization from time-averaging, Regularization of linear models (Ch8b.pdf)
Lecture (Tue): Recap of the course material + guest presentation (Recap.pdf)
Extra lab & office hours (Tue 2-5 pm): In-class work on students' projects 
Lab (Thu): In-class work on students' projects 

• Week 13 (1-4 Apr)
Presentations (Tue): Students' project presentations (part 1; grads)
Presentations (Tue 2-5 pm): Students' project presentations (part 2; undergrads and grads)
Presentations (Thu): Students' project presentations (part 3; grads)