Progressive Enhancement: From Script to Pipeline¶
This guide shows how to smoothly transition from experimental scripts to production-ready pipelines.
The Journey: 5 Levels of Enhancement¶
Level 0: Raw Script (Start Here!)¶
You're a researcher with an idea. Start with a normal Python script:
# train.py
import torch
from torch.utils.data import DataLoader
from my_model import MyModel
# Simple training script
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = MyModel().to(device)
dataset = MyDataset()
dataloader = DataLoader(dataset, batch_size=32)
print("Starting training...")
for epoch in range(10):
print(f"Epoch {epoch+1}/10")
for i, batch in enumerate(dataloader):
loss = train_step(model, batch, device)
if i % 10 == 0:
print(f"Batch {i}, Loss: {loss:.4f}")
Pros: Quick to write, easy to experiment Cons: No progress bars, no logging, hard to track experiments
Level 1: Add Progress Bars¶
Want to see progress? Just add one import:
# train.py
import torch
from torch.utils.data import DataLoader
from my_model import MyModel
from tipi.helpers import progress_bar # ← Add this
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = MyModel().to(device)
dataset = MyDataset()
dataloader = DataLoader(dataset, batch_size=32)
# Add progress bars with minimal changes
for epoch in progress_bar(range(10), desc="Epochs"): # ← Changed
for batch in progress_bar(dataloader, desc="Training"): # ← Changed
loss = train_step(model, batch, device)
What you get:
- Beautiful progress bars (via Rich)
- Automatic time estimation
- Works exactly like
tqdm - No pipeline required!
Changes needed: 2 lines!
Level 2: Add Experiment Logging¶
Want to track your experiments in WandB?
# train.py
import torch
from torch.utils.data import DataLoader
from my_model import MyModel
from tipi.helpers import progress_bar, logger # ← Add logger
# Initialize logging (runs once)
logger.init(project="my_research", entity="my_team") # ← Add this
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = MyModel().to(device)
dataset = MyDataset()
dataloader = DataLoader(dataset, batch_size=32)
for epoch in progress_bar(range(10), desc="Epochs"):
epoch_loss = 0
for batch in progress_bar(dataloader, desc="Training"):
loss = train_step(model, batch, device)
epoch_loss += loss
logger.log({"batch_loss": loss}) # ← Add this
logger.log({"epoch_loss": epoch_loss / len(dataloader)}) # ← Add this
What you get:
- Automatic WandB logging
- Experiment tracking
- Metric visualization
- Still just a script!
Changes needed: 3 lines!
Level 3: Better Device Management¶
Tired of CUDA boilerplate?
# train.py
import torch
from torch.utils.data import DataLoader
from my_model import MyModel
from tipi.helpers import (
progress_bar,
logger,
device_manager # ← Add this
)
logger.init(project="my_research", entity="my_team")
# Let the helper pick the best device
device = device_manager.get_device() # ← Simplified!
model = MyModel().to(device)
dataset = MyDataset()
dataloader = DataLoader(dataset, batch_size=32)
for epoch in progress_bar(range(10), desc="Epochs"):
for batch in progress_bar(dataloader, desc="Training"):
loss = train_step(model, batch, device)
logger.log({"loss": loss})
What you get:
- Automatic best device selection
- Handles multi-GPU scenarios
- Checks VRAM availability
- Still just helpers!
Level 4: Extract Reusable Functions¶
Your script is working great! Now extract functions for reuse:
# train.py
import torch
from torch.utils.data import DataLoader
from my_model import MyModel
from tipi.helpers import progress_bar, logger, device_manager
from tipi.decorators import pipeline_process # ← New!
logger.init(project="my_research", entity="my_team")
def train_step(model, batch, device):
"""Your existing training logic"""
inputs, targets = batch
inputs, targets = inputs.to(device), targets.to(device)
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, targets)
loss.backward()
optimizer.step()
return loss.item()
@pipeline_process # ← Add decorator (optional for now)
def train_model(epochs: int = 10):
"""Now this function can become a pipeline process later!"""
device = device_manager.get_device()
model = MyModel().to(device)
dataset = MyDataset()
dataloader = DataLoader(dataset, batch_size=32)
for epoch in progress_bar(range(epochs), desc="Epochs"):
for batch in progress_bar(dataloader, desc="Training"):
loss = train_step(model, batch, device)
logger.log({"loss": loss})
return model
# Still works as a script!
if __name__ == "__main__":
trained_model = train_model(epochs=10)
torch.save(trained_model.state_dict(), "model.pth")
What you get:
- Reusable functions
- Ready for pipeline conversion
- Still runs as a script
- Type hints for parameters
The decorator: @pipeline_process doesn't change behavior yet, but makes the function pipeline-ready!
Level 5: Convert to Full Pipeline¶
Ready for production? Create a config file and you're done!
Step 1: Organize your code¶
# processes/training.py
from tipi import PipelineProcess
from tipi.helpers import progress_bar, logger
class TrainingProcess(PipelineProcess):
def __init__(self, controller, force: bool = False, epochs: int = 10):
super().__init__(controller, force)
self.epochs = epochs
def execute(self):
# Get permanences from pipeline
device = self.controller.get_permanence("device")
model = self.controller.get_permanence("network").model
dataloader = self.controller.get_permanence("data").train_loader
# Your SAME training logic from Level 4!
for epoch in progress_bar(range(self.epochs), desc="Epochs"):
for batch in progress_bar(dataloader, desc="Training"):
loss = self.train_step(model, batch, device)
logger.log({"loss": loss})
return None
def train_step(self, model, batch, device):
"""Copy from your script - no changes!"""
# ... same code ...
Step 2: Create config¶
# configs/my_pipeline/execute_pipeline.toml
[permanences.device]
type = "Device"
[permanences.network]
type = "Network"
params = { model = "resnet50", num_classes = 10, pretrained = true }
[permanences.data]
type = "Data"
params = { dataset = "CIFAR10", batch_size = 32 }
[permanences.progress_manager]
type = "ProgressManager"
[permanences.wandb_logger]
type = "WandBManager"
params = { project = "my_research", entity = "my_team" }
[processes.training]
type = "TrainingProcess"
params = { epochs = 10 }
[processes.validation]
type = "ValidationProcess"
[processes.testing]
type = "TestingProcess"
Step 3: Run via CLI¶
# Run the full pipeline
tipi run my_pipeline
# Or with different config
tipi run my_pipeline --config custom_config.toml
# Or programmatically
python -c "
from tipi import PipelineRunner
runner = PipelineRunner('my_pipeline')
runner.run()
"
What you get:
- Full pipeline management
- Config-driven experiments
- Nested progress bars
- Permanence lifecycle management
- WandB sweep support
- Production-ready code
- Testable components
Comparison Table¶
| Feature | Level 0 | Level 1 | Level 2 | Level 3 | Level 4 | Level 5 |
|---|---|---|---|---|---|---|
| Lines of code | 15 | 17 | 20 | 20 | 30 | 40+ |
| Progress bars | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Experiment logging | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ |
| Device management | Manual | Manual | Manual | ✅ | ✅ | ✅ |
| Reusable code | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ |
| Config-driven | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ |
| Pipeline features | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ |
| Testable | ❌ | ❌ | ❌ | ❌ | ⚠️ | ✅ |
| Production-ready | ❌ | ❌ | ❌ | ❌ | ⚠️ | ✅ |
Key Principles¶
1. Zero Friction Start¶
- Researchers start with normal scripts
- No framework knowledge required
- No config files needed initially
2. Progressive Enhancement¶
- Each level adds ONE new concept
- Previous levels still work
- No forced migration
3. Gradual Type Safety¶
- Start untyped (scripts)
- Add types when extracting functions (Level 4)
- Full type safety in pipeline (Level 5)
4. Helpers Work Everywhere¶
progress_bar()works in scripts and pipelinesloggerworks standalone and with pipelinedevice_manageradapts to context
5. Copy-Paste Friendly¶
- Training logic from Level 3 works in Level 5
- Functions can be copied between scripts and processes
- Minimal refactoring needed
Example: Real Research Workflow¶
Day 1: New Idea¶
# quick_experiment.py
for epoch in range(5):
loss = train()
print(loss)
Day 2: Looks Promising¶
# quick_experiment.py
from tipi.helpers import progress_bar, logger
logger.init(project="new_idea")
for epoch in progress_bar(range(10)):
loss = train()
logger.log({"loss": loss})
Week 1: Multiple Experiments¶
# experiment_v1.py, experiment_v2.py, experiment_v3.py
# All using helpers - easy to compare in WandB
Week 2: Extract Common Code¶
# train_utils.py
@pipeline_process
def train_model(learning_rate: float = 0.001):
# Shared training logic
pass
# experiment_v4.py
from train_utils import train_model
train_model(learning_rate=0.01)
Month 1: Production Pipeline¶
# configs/production/pipeline.toml
[processes.training]
type = "train_model" # Your function!
params = { learning_rate = 0.001 }
tipi run production
Migration Checklist¶
Moving from Level N to Level N+1:
Level 0 → 1: Add Progress Bars¶
- [ ] Import
progress_barfrom helpers - [ ] Wrap your loops with
progress_bar() - [ ] Add descriptive names
Level 1 → 2: Add Logging¶
- [ ] Import
loggerfrom helpers - [ ] Call
logger.init()once at start - [ ] Add
logger.log()calls for metrics
Level 2 → 3: Better Device Management¶
- [ ] Import
device_managerfrom helpers - [ ] Replace device selection with
device_manager.get_device()
Level 3 → 4: Extract Functions¶
- [ ] Identify reusable code blocks
- [ ] Extract into functions with type hints
- [ ] Add
@pipeline_processdecorator - [ ] Test standalone execution
Level 4 → 5: Full Pipeline¶
- [ ] Create
processes/directory - [ ] Convert functions to
PipelineProcessclasses - [ ] Create TOML config file
- [ ] Register in pipeline builder
- [ ] Test via CLI
Tips for Researchers¶
When to move to the next level?¶
Level 0 → 1: When you're tired of seeing print() statements
Level 1 → 2: When you lose track of which experiment was which
Level 2 → 3: When GPU selection becomes annoying
Level 3 → 4: When you copy-paste code between experiments
Level 4 → 5: When you need to run in production or share with team
You don't have to reach Level 5!¶
- Many experiments stay at Level 2-3
- Only productionize what's proven to work
- Keep prototyping at lower levels
Mix and match¶
# experiment_hybrid.py
from tipi.helpers import progress_bar, logger
from my_pipeline.processes.training import train_epoch # ← From Level 5
# Quick experiment using production code
for epoch in progress_bar(range(3)):
loss = train_epoch(my_model, data) # ← Production function
logger.log({"quick_test": loss}) # ← Script logging
Questions?¶
-
Q: Do I have to use all helpers? A: No! Use only what you need.
progress_baralone is useful. -
Q: Can I use this with my existing code? A: Yes! Just import helpers and start adding them incrementally.
-
Q: What if I don't want WandB? A:
loggerwill gracefully degrade to console output. -
Q: Can I stay at Level 3 forever? A: Absolutely! Only move to Level 5 when you need production features.
-
Q: Does this work with my existing tqdm code? A: Yes!
progress_baris a drop-in replacement fortqdm.