---
title: "AIO2025M05 Vectorized Linear Regression Demo"
emoji: "📊"
colorFrom: "indigo"
colorTo: "blue"
sdk: "gradio"
sdk_version: "5.38.0"
app_file: "app.py"
pinned: false
license: "mit"
---

# AIO2025 Module 05 - Vectorized Linear Regression Demo

This interactive demo showcases Linear Regression implemented from scratch using numpy and gradient descent. Learn how linear regression works with pure matrix operations (vectorization) and visualize the training process step by step.

## 📊 Features

### Core Functionality
- **Dual Implementation**: Compare simple (Python loops) vs vectorized (NumPy) linear regression
- **Performance Comparison**: Real-time comparison showing speedup from vectorization
- **Vectorized Operations**: Efficient matrix operations for fast training
- **Gradient Descent**: Manual implementation of gradient descent optimization
- **Real-time Visualization**: Watch the loss decrease as the model learns
- **Interactive Parameters**: Adjust learning rate and epochs in real-time
- **Multiple Datasets**: Built-in regression datasets for experimentation

### Linear Regression Components
- **Epochs**: Number of training iterations (1-1000)
- **Learning Rate**: Step size for gradient descent
  - Slider with discrete values (powers of 10): 0.000001, 0.00001, 0.0001, 0.001, 0.01, 0.1, 1
  - Slider shows power: 0→1e-6, 1→1e-5, 2→1e-4, 3→1e-3, 4→1e-2, 5→1e-1, 6→1
  - Current value displayed in real-time
- **Batch Size**: Dynamic slider for mini-batch size selection
  - Powers of 2 from 1 (2^0) up to training set size
  - Automatically adjusts maximum based on dataset size
  - Includes "Full Batch" option at maximum
  - Both simple and vectorized versions use the same batch size
- **Train/Val Split**: Configurable data split ratio
- **Bias Term**: Automatic bias addition to feature matrix
- **Overflow Protection**: Both implementations include safety checks for numerical stability

### Visualizations
- **Loss Curves**: Separate charts for training and validation loss over epochs (Vectorized)
- **Performance Table**: Side-by-side comparison of simple vs vectorized implementations
- **Training Time**: Real-time measurement of training duration for both methods
- **Speedup Metric**: Shows how much faster vectorization is compared to simple loops
- **Training Details**: Displays learned parameters (θ) and model performance
- **Prediction Results**: Shows prediction for new input data
- **Split Information**: Training and validation set statistics

## 🚀 Quick Start

1. **Select Data**: Choose from sample datasets or upload your own CSV/Excel files
2. **Configure Target**: Select the target column for regression
3. **Set Parameters**: Adjust epochs and learning rate
4. **Enter Features**: Provide values for the new data point
5. **Run Training**: Execute gradient descent and view results

## 📊 Sample Datasets

### Regression Datasets
- **Diabetes**: Medical regression dataset (default)
- **California Housing**: Housing price prediction dataset

## 🛠️ Technical Details

### Dependencies
- `numpy`: Core numerical operations and matrix computations
- `pandas`: Data manipulation
- `plotly`: Interactive visualizations
- `gradio`: Web interface
- `scikit-learn`: Dataset loading only

### Algorithm Implementation

#### 1. Data Preprocessing & Normalization
```python
# Normalize features (standardization)
X_mean = np.mean(X_train, axis=0)
X_std = np.std(X_train, axis=0)
X_norm = (X - X_mean) / X_std

# Normalize labels
y_mean = np.mean(y_train)
y_std = np.std(y_train)
y_norm = (y - y_mean) / y_std

# Add bias term to normalized features
X_bias = np.c_[np.ones(X_norm.shape[0]), X_norm]
```

#### 2. Prediction Function
```python
def predict(X, theta):
    return X.dot(theta)  # ŷ = X @ θ
```

#### 3. Loss Function
```python
def compute_loss(y_hat, y):
    return np.mean((y_hat - y) ** 2)  # MSE
```

#### 4. Gradient Computation
```python
def compute_gradient(y_hat, y, X):
    N = len(y)
    return 2 * X.T.dot(y_hat - y) / N  # ∇L = 2X^T(ŷ-y)/N
```

#### 5. Parameter Update
```python
def update_theta(theta, gradient, lr):
    return theta - lr * gradient  # θ = θ - lr * ∇L
```

#### 6. Denormalization
```python
# Denormalize predictions back to original scale
y_pred = y_pred_norm * y_std + y_mean
```

### Architecture
- **Dual Implementation**: 
  - `src/simple_linear_regression.py`: Plain Python with loops (educational)
  - `src/vectorized_linear_regression.py`: NumPy vectorized (production-ready)
- **Modular Design**: Separated core logic for easy understanding
- **Vectorized Operations**: NumPy matrix operations for efficiency
- **Performance Benchmarking**: Automatic timing and comparison
- **Dynamic UI**: Automatic parameter and input generation
- **Error Handling**: Comprehensive validation and error messages

## 💡 Key Concepts

### Linear Regression Benefits
- **Simple & Interpretable**: Easy to understand and explain
- **Fast Training**: Vectorized operations make training efficient
- **Baseline Model**: Good starting point for regression tasks
- **Mathematical Foundation**: Clear mathematical derivation

### Gradient Descent Variants
- **Stochastic Gradient Descent (SGD)**: Updates after each sample (batch_size=1)
  - Simple LR uses this approach
  - Fast updates but noisy gradients
- **Mini-Batch Gradient Descent**: Updates after processing small batches
  - Vectorized LR supports configurable batch sizes
  - Balance between speed and stability
- **Batch Gradient Descent**: Updates after processing all data (Full Batch)
  - Most stable but slower per epoch
- **Learning Rate**: Controls convergence speed and stability
- **Loss Monitoring**: Track MSE to ensure convergence

### Simple vs Vectorized Implementation

#### Simple Linear Regression (Python Loops with Mini-Batch)
```python
# Mini-batch training with Python loops
for batch_start in range(0, n_samples, batch_size):
    # Accumulate gradients for the batch
    dw_batch, db_batch = [0.0] * n_features, 0.0
    
    for i in range(batch_start, batch_end):
        y_hat = predict(X_train[i], weights, bias)
        loss = compute_loss(y_hat, y_train[i])
        dw, db = compute_gradient(y_hat, y_train[i], X_train[i])
        # Accumulate gradients
        dw_batch = [dw_batch[j] + dw[j] for j in range(n_features)]
        db_batch += db
    
    # Average gradients and update
    dw_batch = [dw / batch_size for dw in dw_batch]
    db_batch /= batch_size
    weights, bias = update_parameters(weights, bias, dw_batch, db_batch, lr)
```

**Characteristics:**
- ✅ Easy to understand for beginners
- ✅ Clear step-by-step logic
- ✅ Includes normalization (same as vectorized)
- ✅ Supports mini-batch gradient descent (same batch size as vectorized)
- ❌ Slow for large datasets
- ❌ Uses Python loops (no vectorization)
- ⚠️ May overflow with high learning rates (protected with safety checks)

#### Vectorized Linear Regression (NumPy with Mini-Batch)
```python
# Process mini-batches
for i in range(0, n_samples, batch_size):
    X_batch = X_train[i:i+batch_size]
    y_batch = y_train[i:i+batch_size]
    
    y_hat = X_batch @ theta
    loss = np.mean((y_hat - y_batch) ** 2)
    gradient = 2 * X_batch.T @ (y_hat - y_batch) / batch_size
    theta = theta - lr * gradient
```

**Characteristics:**
- ✅ **10-100× faster** than simple loops
- ✅ Supports mini-batch gradient descent (configurable batch size)
- ✅ Leverages optimized C/Fortran libraries
- ✅ Handles millions of samples efficiently
- ✅ Production-ready implementation
- ✅ Better numerical stability (less prone to overflow)

### Vectorization Advantages
- **Performance**: Matrix operations are highly optimized in NumPy
- **Clarity**: Clean mathematical notation matching equations
- **Scalability**: Handles large datasets efficiently
- **No Loops**: Eliminates slow Python loops
- **Parallel Processing**: Can utilize multiple CPU cores

### Normalization Benefits
- **Numerical Stability**: Prevents numerical overflow/underflow
- **Faster Convergence**: Features on same scale speed up gradient descent
- **Reasonable Loss Values**: MSE values are easier to interpret
- **Better Learning Rates**: Can use higher learning rates without instability
- **Automatic Denormalization**: Predictions returned in original scale

## 🔧 Customization

### Adding New Datasets
1. Place CSV files in the `data/` directory
2. Update `SAMPLE_DATA_CONFIG` in `app.py`
3. Ensure target column is numeric

### Modifying Parameters
- Edit parameter ranges in the UI components
- Adjust default values for different use cases
- Add regularization if needed (L1/L2)

## 📈 Training Tips

- **Learning Rate**: Use the slider to select from discrete values (powers of 10)
  - Slider position: 0→1e-6, 1→1e-5, 2→1e-4, 3→1e-3, 4→1e-2, 5→1e-1, 6→1
  - Too high (>0.1): May cause overflow or instability
  - Too low (<0.0001): Training will be very slow
  - Recommended: Slider position 3-4 (0.001 - 0.01) for most cases
  - Current value displayed in real-time
- **Batch Size**: Use the slider to select batch size (powers of 2)
  - Slider shows power: 0→1, 1→2, 2→4, 3→8, etc.
  - Maximum adjusts based on training set size
  - Small batches (1-8): Faster updates, noisier gradients, may escape local minima
  - Medium batches (16-64): Good balance between speed and stability
  - Large batches/Full Batch: More stable gradients, slower convergence
- **Epochs**: 100-500 epochs usually sufficient for convergence
  - With smaller batches, you may need more epochs
  - With full batch, fewer epochs may suffice
- **Overfitting**: Monitor validation loss vs training loss
- **Normalization**: Features and labels are automatically normalized for optimal convergence
- **Loss Values**: Loss is computed on normalized scale, so values will be small (typically < 1)
- **Overflow Protection**: Simple LR includes safety checks; if it overflows, reduce learning rate

## 🎯 Use Cases

### Regression Tasks
- Price prediction (housing, stocks)
- Trend analysis
- Demand forecasting
- Scientific modeling
- Risk assessment

## 📝 Implementation Notes

- **Dual Implementation**: Both simple (loops) and vectorized (NumPy) versions included
- **Performance Comparison**: Real-time benchmarking shows vectorization benefits
- **Same Training Configuration**: Both versions use identical hyperparameters
  - Same batch size (mini-batch gradient descent support for both)
  - Same learning rate
  - Same epochs
  - Fair comparison showing pure vectorization speedup
- **Pure NumPy**: No sklearn models used (only for data loading)
- **From Scratch**: All core functions implemented manually
- **Educational Focus**: Designed for learning linear regression internals
  - Simple version: Easy to understand, step-by-step logic with plain Python normalization and mini-batch support
  - Vectorized version: Production-ready, optimized performance with NumPy normalization
- **Normalized Training**: Both implementations use feature & label standardization for numerical stability
- **Automatic Denormalization**: Predictions returned in original scale (both versions)
- **Real-time Updates**: Instant parameter adjustment and visualization

## 🔗 Related Resources

- [Linear Regression Theory](https://en.wikipedia.org/wiki/Linear_regression)
- [Gradient Descent Explained](https://en.wikipedia.org/wiki/Gradient_descent)
- [NumPy Documentation](https://numpy.org/doc/)
- [Matrix Calculus](https://en.wikipedia.org/wiki/Matrix_calculus)

## 📐 Mathematical Formulation

**Model**: ŷ = X @ θ where θ = [θ₀, θ₁, ..., θₙ]

**Loss**: L = (1/N) Σ(ŷᵢ - yᵢ)²

**Gradient**: ∇L = (2/N) X^T (ŷ - y)

**Update**: θ ← θ - α∇L (α is learning rate)

---

*This demo is part of AIO2025 Module 05 - Vectorized Linear Regression*