Professor's Syllabus
Introduction to Data Science and Advanced Programming, HEC Lausanne, Fall Semster 2025
Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician.
This advanced course introduces students to the Python programming language, core concepts of statistical learning, and high-performance computing. It is designed for Master’s students in Economics and Finance to build the computational and analytical skills necessary for modern quantitative analysis.
The course consists of three 45-minute lecture sessions and one 45-minute hands-on session each week.
It is offered at HEC Lausanne during the Fall Semester 2025 (Monday, September 15 - Monday, December 15, 2025).
Course Website
Course Objectives
By the end of this course, you should be able to:
- Write clean, efficient, and well-documented Python code.
- Manipulate and analyze data using NumPy and Pandas.
- Create insightful visualizations with Matplotlib and Seaborn.
- Understand the fundamental theory of statistical learning, including the bias–variance trade-off and model assessment.
- Implement and evaluate machine learning models for regression, classification, and clustering using scikit-learn.
- Use tree-based methods and ensemble learning.
- Gain awareness of deep learning concepts and implement a simple neural network.
- Apply basic high-performance computing (HPC) techniques to accelerate Python code.
- Independently manage and execute a data science project from conception to presentation.
Meeting time and location
- Time: Mondays, 12:30–16:00
- Place: Internef 263
TA sessions: Weekly on Mondays (15:15–16:00) with Anna Smirnova, Francesco Brunamonti, and Zhongshan Chen. Individual TA sessions available on Fridays upon request.
Class enrollment on the Nuvolos Cloud
- All lecture materials (slides, codes, and further readings) will be distributed via the Nuvolos Cloud.
- To enroll in this class, please click on this enrollment key, and follow the steps.
Video to get started with Nuvolos
- First steps on Nuvolos:
Approximate Schedule
Witness the incredible transformation of a programmer throughout the course, from humble beginnings to a master of the craft!
Part I: Stone Age Programmer
|
Part II: Industrial Data Era
|
Part III: Future Master Programmer
|
Part I: Python Foundations (Weeks 1–6)
Week 1 (Sep 15): Course Overview & Setup
- Lecture slides, week 1
- Topics:
- Introduction to the course
- Structure, grading, and capstone project
- Introduction to Nuvolos cloud computing platform
- Unix/Linux basics
- Topics:
Week 2 (Sep 22): No Class
- Swiss Federal Fast (Public Holiday)
Week 3 (Sep 29): Python Fundamentals I
- Lecture slides, week 3
- Topics:
- Python basics (variables, types)
- Control flow (loops, branching)
- String Manipulation
- Productivity: Git version control, programming style (if time permits)
- Topics:
Week 4 (Oct 6): Python Fundamentals II
- Lecture slides, week 4
- Topics:
- Function
- Basic Data structures (lists, tuples, dictionaries)
- Recursions
- Jupyter Notebooks (if time permits)
- Topics:
Week 5 (Oct 13): Special Session: Generative AI
- Lecture slides, week 5
- Topics:
- Hands-on: Large Language Models & Autonomous agents (guest elcture by Anna Smirnova)
- Topics:
Week 6 (Oct 20): Python Fundamentals III
- Lecture slides, week 6
- Topics:
- Selected Topics on Object Oriented Programming
- Selected Topics on Python Classes and Inheritance
- Basics on Program Efficiency
- A preview on Libraries (take-home materials)
- Productivity: Basics on Testing and Debugging (take-home materials)
- Productivity: Basics on Testing and Debugging – Notebook (take-home materials)
- Topics:
Part II: Basics of Data Science (Weeks 7–12)
Week 7 (Oct 27): Linear Regression
- Lecture slides, week 7
- Topics:
- Supervised Learning - the general idea
- Linear Regression (with multiple variables)
- Gradient Descent
- Polynomial Regression
- Tuning Model Complexity
- Stock Market Prediction (if time permits)
- Introduction to Pandas (quick tour; self-study)
- Further Reading: ISL Ch. 3, 5, 6.
- Further Reading: PML Ch. 6.3–6.5 (Bayesian linear regression, uncertainty, model comparison), Ch. 7.1–7.3 (Overfitting, generalization, cross-validation), PML Ch. 6.6 (Regularization as priors: ridge ↔ Gaussian, Lasso ↔ Laplace)
- Topics:
Week 8 (Nov 3): Classification
- Lecture slides, week 8
- Topics:
- Supervised Learning: Classification
- k-Nearest-Neighbours
- How to evaluate Classifiers
- Naive Bayes
- Decision Trees
- Combining Models (Boosting, Bagging – if time permits)
- Further Reading: ISL Ch. 4, Ch. 8,
- Further Reading: PML Ch. 8.1–8.4 (Logistic regression, generative vs discriminative classifiers), PML Ch. 8.5 (Bayesian logistic regression, optional)
- Topics:
Week 9 (Nov 10): Unsupervised Machine Learning
- Lecture slides, week 9
- Topics:
- k-Means
- Gaussian Mixture Models
- Expectation Maximization
- Principal Component Analysis
- Hierarchical Clustering
- Density-based Clustering
- Further Reading: ISL Ch. 10
- Further Reading: PML Ch. 10.1–10.4 (PCA as latent factor model), PML Ch. 11.1–11.3 (Clustering, mixture models, EM algorithm)
- Topics:
Week 10 (Nov 17): Deep Learning Primer
- Lecture slides, week 10
- Topics:
- Deep learning basics
- Multi-layer perceptron
- Feed-forward networks
- Network training - SGD
- Error back-propagation
- Some notes on overfitting
- Introduction to Tensorflow, applied to supervised machine learning problems
- Further Reading: ISL Ch. 10
- Further Reading: PML Ch. 16 (Neural networks), PML Ch. 17 (Deep learning, optimization & generalization)
- Topics:
Week 11 (Nov 24): Best Practices in Data Science
- Lecture slides, week 11
Part III: Advanced Programming & Wrap-Up
Week 12 (Dec 1): ** Introduction to High-Performance Computing
- Lecture slides, week 12
- Topics
- Concepts of shared memory parallelization
- Concepts of distributed memory parallelization
- Hybrid parallelization
- Topics
Week 13 (Dec 8): High-Performance Computing with Python
- Lecture slides, week 13: Concepts of accelerating codes in practice, and shared memory parallelization; slides 1-12 of this lecture.
Week 14 (Dec 15): Capstone Project Presentations - Topics: - Students voluntarily present their projects - Wrap-up and course summary
The Programmer’s Journey
Grading
- Every student has to provide a capstone project that illustrates what was learned.
- Each student individually has to propose a data science project and work on it over the course of the semester.
- The due date to submit the project is in the last week of the semester.
- The deliverables are:
- i) a report of about 10 pages lengths.
- ii) a GitHub repository with the related code and data.
- iii) a video recording of a maximum of 10 minutes length that presents the project, the findings, etc.
- We will award the grades based on whether the capstone project demonstrates an understanding of the material. There will be no exams.
- There will be possibilities to collect ``bonus points’’ via homework assignments.
Lecturer
- Simon Scheidegger (University of Lausanne, Department of Economics)
- Simon Scheidegger: simon.scheidegger@unil.ch
TAs and support
- Anna Smirnova anna.smirnova@unil.ch (TA lead)
- Francesco Brunamonti francesco.brunamonti@unil.ch
- Zhongshan Chen zhongshan.chen@unil.ch
- Nuvolos Support: support@nuvolos.cloud
Google document for the QA sessions:
References
- Guttag, Introduction to Computation and Programming Using Python, MIT Press
- Langtangen, A Primer on Scientific Programming with Python, Springer
- Goodfellow, Bengio, Courville, Deep Learning, MIT Press
- Murphy, Probabilistic Machine Learning: An Introduction, MIT Press
- James, Witten, Hastie, Tibshirani, An Introduction to Statistical Learning, 2nd Edition – statlearning.com
- QuantEcon
Auxiliary materials
Session # | Title | Screencast |
---|---|---|
1 | Git intro | <iframe src="https://player.vimeo.com/video/516690761" width="640" height="400" frameborder="0" allow="autoplay; fullscreen; picture-in-picture" allowfullscreen></iframe> |
1 | Terminal intro | <iframe src="https://player.vimeo.com/video/516691661" width="640" height="400" frameborder="0" allow="autoplay; fullscreen; picture-in-picture" allowfullscreen></iframe> |