Introduction to Data Science and Advanced Programming, HEC Lausanne, Fall Semster 2025

Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician.

This advanced course introduces students to the Python programming language, core concepts of statistical learning, and high-performance computing. It is designed for Master’s students in Economics and Finance to build the computational and analytical skills necessary for modern quantitative analysis.

The course consists of three 45-minute lecture sessions and one 45-minute hands-on session each week.

It is offered at HEC Lausanne during the Fall Semester 2025 (Monday, September 15 - Monday, December 15, 2025).


Course Website


Course Objectives

By the end of this course, you should be able to:

  • Write clean, efficient, and well-documented Python code.
  • Manipulate and analyze data using NumPy and Pandas.
  • Create insightful visualizations with Matplotlib and Seaborn.
  • Understand the fundamental theory of statistical learning, including the bias–variance trade-off and model assessment.
  • Implement and evaluate machine learning models for regression, classification, and clustering using scikit-learn.
  • Use tree-based methods and ensemble learning.
  • Gain awareness of deep learning concepts and implement a simple neural network.
  • Apply basic high-performance computing (HPC) techniques to accelerate Python code.
  • Independently manage and execute a data science project from conception to presentation.

Meeting time and location

  • Time: Mondays, 12:30–16:00
  • Place: Internef 263

TA sessions: Weekly on Mondays (15:15–16:00) with Anna Smirnova, Francesco Brunamonti, and Zhongshan Chen. Individual TA sessions available on Fridays upon request.


Class enrollment on the Nuvolos Cloud

  • All lecture materials (slides, codes, and further readings) will be distributed via the Nuvolos Cloud.
  • To enroll in this class, please click on this enrollment key, and follow the steps.

Video to get started with Nuvolos

  • First steps on Nuvolos:

Approximate Schedule

Witness the incredible transformation of a programmer throughout the course, from humble beginnings to a master of the craft!

Part I: Stone Age Programmer

Stone Age Programmer

Part II: Industrial Data Era

Industrial Data Era Programmer

Part III: Future Master Programmer

Future Master Programmer

Part I: Python Foundations (Weeks 1–6)

Week 1 (Sep 15): Course Overview & Setup

Week 2 (Sep 22): No Class

  • Swiss Federal Fast (Public Holiday)

Week 3 (Sep 29): Python Fundamentals I

Week 4 (Oct 6): Python Fundamentals II

Week 5 (Oct 13): Special Session: Generative AI

  • Lecture slides, week 5
    • Topics:
      • Hands-on: Large Language Models & Autonomous agents (guest elcture by Anna Smirnova)

Week 6 (Oct 20): Python Fundamentals III


Part II: Basics of Data Science (Weeks 7–12)

Week 7 (Oct 27): Linear Regression

Week 8 (Nov 3): Classification

  • Lecture slides, week 8
    • Topics:
      • Supervised Learning: Classification
      • k-Nearest-Neighbours
      • How to evaluate Classifiers
      • Naive Bayes
      • Decision Trees
      • Combining Models (Boosting, Bagging – if time permits)
      • Further Reading: ISL Ch. 4, Ch. 8,
      • Further Reading: PML Ch. 8.1–8.4 (Logistic regression, generative vs discriminative classifiers), PML Ch. 8.5 (Bayesian logistic regression, optional)

Week 9 (Nov 10): Unsupervised Machine Learning

  • Lecture slides, week 9
    • Topics:
      • k-Means
      • Gaussian Mixture Models
      • Expectation Maximization
      • Principal Component Analysis
      • Hierarchical Clustering
      • Density-based Clustering
      • Further Reading: ISL Ch. 10
      • Further Reading: PML Ch. 10.1–10.4 (PCA as latent factor model), PML Ch. 11.1–11.3 (Clustering, mixture models, EM algorithm)

Week 10 (Nov 17): Deep Learning Primer

Week 11 (Nov 24): Best Practices in Data Science


Part III: Advanced Programming & Wrap-Up

Week 12 (Dec 1): ** Introduction to High-Performance Computing

  • Lecture slides, week 12
    • Topics
      • Concepts of shared memory parallelization
      • Concepts of distributed memory parallelization
      • Hybrid parallelization

Week 13 (Dec 8): High-Performance Computing with Python

Week 14 (Dec 15): Capstone Project Presentations - Topics: - Students voluntarily present their projects - Wrap-up and course summary

Young Enthusiastic Professor

The Programmer’s Journey


Grading

  • Every student has to provide a capstone project that illustrates what was learned.
  • Each student individually has to propose a data science project and work on it over the course of the semester.
  • The due date to submit the project is in the last week of the semester.
  • The deliverables are:
    • i) a report of about 10 pages lengths.
    • ii) a GitHub repository with the related code and data.
    • iii) a video recording of a maximum of 10 minutes length that presents the project, the findings, etc.
  • We will award the grades based on whether the capstone project demonstrates an understanding of the material. There will be no exams.
  • There will be possibilities to collect ``bonus points’’ via homework assignments.

Lecturer


TAs and support


Google document for the QA sessions:


References


Auxiliary materials

Session # Title Screencast
1 Git intro <iframe src="https://player.vimeo.com/video/516690761" width="640" height="400" frameborder="0" allow="autoplay; fullscreen; picture-in-picture" allowfullscreen></iframe>
1 Terminal intro <iframe src="https://player.vimeo.com/video/516691661" width="640" height="400" frameborder="0" allow="autoplay; fullscreen; picture-in-picture" allowfullscreen></iframe>