Dec 1, 2024

Convention Over Configuration

Here’s what frustrated me about ML frameworks:

PyTorch: Too low-level, write your own training loops
PyTorch Lightning: Better, but boilerplate-heavy
Keras: Great API, but locked to TensorFlow (historically)
Hugging Face: Amazing for transformers, but domain-specific

I wanted: Write the architecture, point to data, get training. No boilerplate.

NeuroScript now has this. Here’s training XOR:

# 1. Write the architecture (01-xor.ns)
neuron XOR():
  in: [batch, 2]
  out: [batch, 1]
  graph:
    in ->
      Linear(2, 4)
      ReLU()
      Linear(4, 1)
      Sigmoid()
      out

# 2. Create training data (xor_train.jsonl)
{"input": [0.0, 0.0], "target": [0.0]}
{"input": [0.0, 1.0], "target": [1.0]}
{"input": [1.0, 0.0], "target": [1.0]}
{"input": [1.0, 1.0], "target": [0.0]}

# 3. Write a minimal config (xor_config.yml)
model:
  neuron: XOR
  file: examples/01-xor.ns

data:
  train: examples/data/xor_train.jsonl

training:
  epochs: 1000
  lr: 0.01

# 4. Train
python -m neuroscript_runtime.runner train --config xor_config.yml

# That's it!

What makes this work? Convention over Configuration

The runner:

Infers the task from input/output shapes:
- [batch, 2] -> [batch, 1] = Regression -> MSE loss
- [batch, seq] -> [batch, seq, vocab] = Language Model -> CrossEntropy
- [batch, C, H, W] -> [batch, classes] = Image Classification
Picks sensible defaults:
- Optimizer: Adam (good for most things)
- Batch size: 32
- Logging: Every 100 steps
- Checkpointing: Every 1000 steps

Makes extension trivial:

from neuroscript_runtime.contracts import DataLoaderContract, ContractRegistry

class MyHuggingFaceLoader(DataLoaderContract):
    # Implement interface
    pass

# Register it
ContractRegistry.register_dataloader("huggingface", MyHuggingFaceLoader)

# Use in config:
# data:
#   format: huggingface
#   dataset: "wikitext"

The Contract System is the secret sauce. Five extension points:

DataLoader: How to load data (default: JSONL files)
Loss: How to compute error (default: inferred from task)
Optimizer: How to update weights (default: Adam)
Checkpoint: How to save/load (default: torch.save)
Logger: How to track progress (default: console)

Ship with one good default for each. Make it trivial to swap in custom implementations. Let the community build the ecosystem.

The full Python API:

from neuroscript_runtime.runner_v2 import train_from_config
from xor_model import XOR  # Generated by NeuroScript compiler

model = XOR()
runner = train_from_config(model, "config.yml")

# Inference
import torch
result = runner.infer(torch.tensor([[1.0, 0.0]]))
print(result)  # [0.9999] ≈ 1.0

Why this matters: You can go from idea to trained model in minutes, not hours. When you need custom behavior, the extension points are obvious. When you need full control, it’s just PyTorch under the hood—drop down anytime.

Batteries included. Escape hatch provided.