Bad: One giant cell with everything
# DON'T DO THIS
import pandas as pd
import numpy as np
# ... 200 lines of code ...
Good: Logical, sequential cells
# Cell 1: Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Cell 2: Load data
df = pd.read_csv('data/sales.csv')
df.head()
# Cell 3: Data cleaning
df = df.dropna()
df['date'] = pd.to_datetime(df['date'])
1. One logical step per cell
2. Display results
# Show what you did
print(f"Removed {null_count} null values")
print(f"Final dataset: {len(df)} rows")
3. Add markdown between code
## Data Cleaning
We need to handle missing values and convert dates.
4. Use meaningful variable names
# Good
revenue_by_region = df.groupby('region')['revenue'].sum()
# Bad
x = df.groupby('a')['b'].sum()
Always add context:
# Create visualization
plt.figure(figsize=(12, 6))
plt.plot(df['date'], df['sales'])
plt.title('Daily Sales - Q4 2025', fontsize=14)
plt.xlabel('Date')
plt.ylabel('Sales ($)')
plt.grid(True, alpha=0.3)
plt.show()
**Figure 1**: Daily sales show clear weekly patterns with
peaks on Fridays and dips on Sundays. Note the spike on
Black Friday (Nov 24).
Explain what the reader should notice!
# [Project Title]
**Author**: Your Name
**Date**: YYYY-MM-DD
**Last Updated**: YYYY-MM-DD
## 1. Executive Summary
[Brief overview of findings]
## 2. Business Question
[What are we trying to answer?]
## 3. Data Description
[Where data came from, time period, etc.]
## 4. Methodology
[What analysis techniques did you use?]
## 5. Analysis
[The actual code and results]
## 6. Findings
[Key takeaways]
## 7. Recommendations
[What should be done based on findings?]
## 8. Limitations & Next Steps
[What couldn't be answered? What's next?]
1. HTML (most common):
jupyter nbconvert --to html notebook.ipynb
2. PDF (requires LaTeX):
jupyter nbconvert --to pdf notebook.ipynb
3. Markdown:
jupyter nbconvert --to markdown notebook.ipynb
4. Python script:
jupyter nbconvert --to python notebook.ipynb
Remove code cells (show only results):
jupyter nbconvert --to html notebook.ipynb \
--no-input
Remove output (show only code):
jupyter nbconvert --to html notebook.ipynb \
--no-output
Hide specific cells: Add tags in Jupyter
--TagRemovePreprocessor.remove_cell_tags='{"remove_cell"}'Best format for different audiences:
| Audience | Format | Why |
|---|---|---|
| Non-technical | HTML (no code) | Easy to view, looks professional |
| Technical | HTML (with code) | Can see methodology |
| Collaborators | .ipynb file | Can run and modify |
| Publication | Print-ready, formal | |
| Web | HTML + GitHub Pages | Publicly accessible |
Problem: "It works on my machine"
Solution: Document environment
1. List dependencies:
# Create requirements file
pip freeze > requirements.txt
# Or with uv
uv pip freeze > requirements.txt
2. Create README:
## Installation
```bash
uv pip install -r requirements.txt
Download from: [URL]
Place in: data/
jupyter notebook analysis.ipynb
---
# Reproducibility Checklist
- [ ] All dependencies listed in `requirements.txt`
- [ ] Data sources documented (with download links if possible)
- [ ] Random seeds set (`np.random.seed(42)`)
- [ ] File paths relative, not absolute
- [ ] Clear instructions in README
- [ ] Output cleared before committing to git
- [ ] Notebook runs top-to-bottom without errors
---
# Git + Jupyter Best Practices
**Problem**: Jupyter notebooks have metadata and outputs that change
**Solution**: Clear outputs before committing
```bash
# Clear all outputs
jupyter nbconvert --clear-output --inplace notebook.ipynb
# Or use nbstripout
pip install nbstripout
nbstripout notebook.ipynb
Or: Configure git to auto-strip outputs
nbstripout --install
You have sales data for the past year. Create a professional analysis notebook:
Data: sales_2025.csv (provided)
Your report should include:
Requirements:
# Annual Sales Analysis 2025
## Executive Summary
[Your findings in 2-3 sentences]
## Data Overview
import pandas as pd
import matplotlib.pyplot as plt
# Load data
df = pd.read_csv('data/sales_2025.csv')
df.head()
The dataset contains {len(df)} transactions from...
## Sales by Category
[Continue building out the analysis...]
Bad interpretation:
"The graph shows sales over time."
Good interpretation:
"Sales increased steadily through Q1-Q3, peaking at $2.1M in
September before declining 15% in Q4. The Q4 decline is
primarily driven by reduced enterprise sales, which dropped
23% compared to Q3."
What makes it good:
1. No context: Jumping straight into code
2. Too much code: Showing every exploratory step
3. No interpretation: Figures without explanation
4. Unclear flow: Random order of analyses
5. No conclusion: Analysis without recommendations
6. Assuming knowledge: Not defining terms/metrics
Remember: Your notebook is a story, not just code!
Save time by creating reusable templates:
# In your templates/ folder
# data_analysis_template.ipynb
"""
Contains:
- Standard imports
- Data loading section
- Exploratory analysis section
- Visualization section
- Results section
- Conclusion section
"""
Create your own for:
Documentation matters:
Jupyter notebooks as reports:
Sharing & reproducibility:
Create a Complete Data Analysis Report
Choose a dataset (or use one provided) and create a professional analysis:
Requirements:
Deliverable:
.ipynb file (original notebook).html file (exported report)requirements.txtREADME.mdNext week: Final project workshop - putting it all together!