Progressive Enhancement: From Script to Pipeline¶

This guide shows how to smoothly transition from experimental scripts to production-ready pipelines.

The Journey: 5 Levels of Enhancement¶

Level 0: Raw Script (Start Here!)¶

You're a researcher with an idea. Start with a normal Python script:

# train.py
import torch
from torch.utils.data import DataLoader
from my_model import MyModel

# Simple training script
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = MyModel().to(device)
dataset = MyDataset()
dataloader = DataLoader(dataset, batch_size=32)

print("Starting training...")
for epoch in range(10):
    print(f"Epoch {epoch+1}/10")
    for i, batch in enumerate(dataloader):
        loss = train_step(model, batch, device)
        if i % 10 == 0:
            print(f"Batch {i}, Loss: {loss:.4f}")

Pros: Quick to write, easy to experiment Cons: No progress bars, no logging, hard to track experiments

Level 1: Add Progress Bars¶

Want to see progress? Just add one import:

# train.py
import torch
from torch.utils.data import DataLoader
from my_model import MyModel
from tipi.helpers import progress_bar  # ← Add this

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = MyModel().to(device)
dataset = MyDataset()
dataloader = DataLoader(dataset, batch_size=32)

# Add progress bars with minimal changes
for epoch in progress_bar(range(10), desc="Epochs"):  # ← Changed
    for batch in progress_bar(dataloader, desc="Training"):  # ← Changed
        loss = train_step(model, batch, device)

What you get:

Beautiful progress bars (via Rich)
Automatic time estimation
Works exactly like tqdm
No pipeline required!

Changes needed: 2 lines!

Level 2: Add Experiment Logging¶

Want to track your experiments in WandB?

# train.py
import torch
from torch.utils.data import DataLoader
from my_model import MyModel
from tipi.helpers import progress_bar, logger  # ← Add logger

# Initialize logging (runs once)
logger.init(project="my_research", entity="my_team")  # ← Add this

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = MyModel().to(device)
dataset = MyDataset()
dataloader = DataLoader(dataset, batch_size=32)

for epoch in progress_bar(range(10), desc="Epochs"):
    epoch_loss = 0
    for batch in progress_bar(dataloader, desc="Training"):
        loss = train_step(model, batch, device)
        epoch_loss += loss
        logger.log({"batch_loss": loss})  # ← Add this

    logger.log({"epoch_loss": epoch_loss / len(dataloader)})  # ← Add this

What you get:

Automatic WandB logging
Experiment tracking
Metric visualization
Still just a script!

Changes needed: 3 lines!

Level 3: Better Device Management¶

Tired of CUDA boilerplate?

# train.py
import torch
from torch.utils.data import DataLoader
from my_model import MyModel
from tipi.helpers import (
    progress_bar,
    logger,
    device_manager  # ← Add this
)

logger.init(project="my_research", entity="my_team")

# Let the helper pick the best device
device = device_manager.get_device()  # ← Simplified!
model = MyModel().to(device)
dataset = MyDataset()
dataloader = DataLoader(dataset, batch_size=32)

for epoch in progress_bar(range(10), desc="Epochs"):
    for batch in progress_bar(dataloader, desc="Training"):
        loss = train_step(model, batch, device)
        logger.log({"loss": loss})

What you get:

Automatic best device selection
Handles multi-GPU scenarios
Checks VRAM availability
Still just helpers!

Level 4: Extract Reusable Functions¶

Your script is working great! Now extract functions for reuse:

# train.py
import torch
from torch.utils.data import DataLoader
from my_model import MyModel
from tipi.helpers import progress_bar, logger, device_manager
from tipi.decorators import pipeline_process  # ← New!

logger.init(project="my_research", entity="my_team")

def train_step(model, batch, device):
    """Your existing training logic"""
    inputs, targets = batch
    inputs, targets = inputs.to(device), targets.to(device)

    optimizer.zero_grad()
    outputs = model(inputs)
    loss = criterion(outputs, targets)
    loss.backward()
    optimizer.step()

    return loss.item()

@pipeline_process  # ← Add decorator (optional for now)
def train_model(epochs: int = 10):
    """Now this function can become a pipeline process later!"""
    device = device_manager.get_device()
    model = MyModel().to(device)
    dataset = MyDataset()
    dataloader = DataLoader(dataset, batch_size=32)

    for epoch in progress_bar(range(epochs), desc="Epochs"):
        for batch in progress_bar(dataloader, desc="Training"):
            loss = train_step(model, batch, device)
            logger.log({"loss": loss})

    return model

# Still works as a script!
if __name__ == "__main__":
    trained_model = train_model(epochs=10)
    torch.save(trained_model.state_dict(), "model.pth")

What you get:

Reusable functions
Ready for pipeline conversion
Still runs as a script
Type hints for parameters

The decorator: @pipeline_process doesn't change behavior yet, but makes the function pipeline-ready!

Level 5: Convert to Full Pipeline¶

Ready for production? Create a config file and you're done!

Step 1: Organize your code¶

# processes/training.py
from tipi import PipelineProcess
from tipi.helpers import progress_bar, logger

class TrainingProcess(PipelineProcess):
    def __init__(self, controller, force: bool = False, epochs: int = 10):
        super().__init__(controller, force)
        self.epochs = epochs

    def execute(self):
        # Get permanences from pipeline
        device = self.controller.get_permanence("device")
        model = self.controller.get_permanence("network").model
        dataloader = self.controller.get_permanence("data").train_loader

        # Your SAME training logic from Level 4!
        for epoch in progress_bar(range(self.epochs), desc="Epochs"):
            for batch in progress_bar(dataloader, desc="Training"):
                loss = self.train_step(model, batch, device)
                logger.log({"loss": loss})

        return None

    def train_step(self, model, batch, device):
        """Copy from your script - no changes!"""
        # ... same code ...

Step 2: Create config¶

# configs/my_pipeline/execute_pipeline.toml

[permanences.device]
type = "Device"

[permanences.network]
type = "Network"
params = { model = "resnet50", num_classes = 10, pretrained = true }

[permanences.data]
type = "Data"
params = { dataset = "CIFAR10", batch_size = 32 }

[permanences.progress_manager]
type = "ProgressManager"

[permanences.wandb_logger]
type = "WandBManager"
params = { project = "my_research", entity = "my_team" }

[processes.training]
type = "TrainingProcess"
params = { epochs = 10 }

[processes.validation]
type = "ValidationProcess"

[processes.testing]
type = "TestingProcess"

Step 3: Run via CLI¶

# Run the full pipeline
tipi run my_pipeline

# Or with different config
tipi run my_pipeline --config custom_config.toml

# Or programmatically
python -c "
from tipi import PipelineRunner
runner = PipelineRunner('my_pipeline')
runner.run()
"

What you get:

Full pipeline management
Config-driven experiments
Nested progress bars
Permanence lifecycle management
WandB sweep support
Production-ready code
Testable components

Comparison Table¶

Feature	Level 0	Level 1	Level 2	Level 3	Level 4	Level 5
Lines of code	15	17	20	20	30	40+
Progress bars	❌	✅	✅	✅	✅	✅
Experiment logging	❌	❌	✅	✅	✅	✅
Device management	Manual	Manual	Manual	✅	✅	✅
Reusable code	❌	❌	❌	❌	✅	✅
Config-driven	❌	❌	❌	❌	❌	✅
Pipeline features	❌	❌	❌	❌	❌	✅
Testable	❌	❌	❌	❌	⚠️	✅
Production-ready	❌	❌	❌	❌	⚠️	✅

Key Principles¶

1. Zero Friction Start¶

Researchers start with normal scripts
No framework knowledge required
No config files needed initially

2. Progressive Enhancement¶

Each level adds ONE new concept
Previous levels still work
No forced migration

3. Gradual Type Safety¶

Start untyped (scripts)
Add types when extracting functions (Level 4)
Full type safety in pipeline (Level 5)

4. Helpers Work Everywhere¶

progress_bar() works in scripts and pipelines
logger works standalone and with pipeline
device_manager adapts to context

5. Copy-Paste Friendly¶

Training logic from Level 3 works in Level 5
Functions can be copied between scripts and processes
Minimal refactoring needed

Example: Real Research Workflow¶

Day 1: New Idea¶

# quick_experiment.py
for epoch in range(5):
    loss = train()
    print(loss)

Day 2: Looks Promising¶

# quick_experiment.py
from tipi.helpers import progress_bar, logger

logger.init(project="new_idea")

for epoch in progress_bar(range(10)):
    loss = train()
    logger.log({"loss": loss})

Week 1: Multiple Experiments¶

# experiment_v1.py, experiment_v2.py, experiment_v3.py
# All using helpers - easy to compare in WandB

Week 2: Extract Common Code¶

# train_utils.py
@pipeline_process
def train_model(learning_rate: float = 0.001):
    # Shared training logic
    pass

# experiment_v4.py
from train_utils import train_model
train_model(learning_rate=0.01)

Month 1: Production Pipeline¶

# configs/production/pipeline.toml
[processes.training]
type = "train_model"  # Your function!
params = { learning_rate = 0.001 }

tipi run production

Migration Checklist¶

Moving from Level N to Level N+1:

Level 0 → 1: Add Progress Bars¶

[ ] Import progress_bar from helpers
[ ] Wrap your loops with progress_bar()
[ ] Add descriptive names

Level 1 → 2: Add Logging¶

[ ] Import logger from helpers
[ ] Call logger.init() once at start
[ ] Add logger.log() calls for metrics

Level 2 → 3: Better Device Management¶

[ ] Import device_manager from helpers
[ ] Replace device selection with device_manager.get_device()

Level 3 → 4: Extract Functions¶

[ ] Identify reusable code blocks
[ ] Extract into functions with type hints
[ ] Add @pipeline_process decorator
[ ] Test standalone execution

Level 4 → 5: Full Pipeline¶

[ ] Create processes/ directory
[ ] Convert functions to PipelineProcess classes
[ ] Create TOML config file
[ ] Register in pipeline builder
[ ] Test via CLI

Tips for Researchers¶

When to move to the next level?¶

Level 0 → 1: When you're tired of seeing print() statements Level 1 → 2: When you lose track of which experiment was which Level 2 → 3: When GPU selection becomes annoying Level 3 → 4: When you copy-paste code between experiments Level 4 → 5: When you need to run in production or share with team

You don't have to reach Level 5!¶

Many experiments stay at Level 2-3
Only productionize what's proven to work
Keep prototyping at lower levels

Mix and match¶

# experiment_hybrid.py
from tipi.helpers import progress_bar, logger
from my_pipeline.processes.training import train_epoch  # ← From Level 5

# Quick experiment using production code
for epoch in progress_bar(range(3)):
    loss = train_epoch(my_model, data)  # ← Production function
    logger.log({"quick_test": loss})    # ← Script logging

Questions?¶

Q: Do I have to use all helpers? A: No! Use only what you need. progress_bar alone is useful.
Q: Can I use this with my existing code? A: Yes! Just import helpers and start adding them incrementally.
Q: What if I don't want WandB? A: logger will gracefully degrade to console output.
Q: Can I stay at Level 3 forever? A: Absolutely! Only move to Level 5 when you need production features.
Q: Does this work with my existing tqdm code? A: Yes! progress_bar is a drop-in replacement for tqdm.