@@ -0,0 +1,511 @@ |
| | 1 | +# Hybrid Model Deployment Guide |
| | 2 | + |
| | 3 | +## Overview |
| | 4 | + |
| | 5 | +JubJub Word uses **Option A: Pre-trained Models in Repository** for deploying hybrid Markov-LSTM models. This approach provides: |
| | 6 | + |
| | 7 | +- **Fast deployment** (<30 seconds startup) |
| | 8 | +- **Consistent models** across all instances |
| | 9 | +- **No training overhead** on Railway |
| | 10 | +- **Predictable behavior** in production |
| | 11 | + |
| | 12 | +## How It Works |
| | 13 | + |
| | 14 | +### Architecture |
| | 15 | + |
| | 16 | +``` |
| | 17 | +┌─────────────────────────────────────────────────────────┐ |
| | 18 | +│ Railway Deployment │ |
| | 19 | +│ │ |
| | 20 | +│ 1. Install dependencies (includes PyTorch) │ |
| | 21 | +│ 2. Run migrations │ |
| | 22 | +│ 3. Load corpora from database │ |
| | 23 | +│ 4. Prebuild Markov models (fast) │ |
| | 24 | +│ 5. Load pre-trained hybrid models from repo │ |
| | 25 | +│ 6. Start gunicorn │ |
| | 26 | +│ │ |
| | 27 | +│ Total Time: ~30 seconds │ |
| | 28 | +└─────────────────────────────────────────────────────────┘ |
| | 29 | + |
| | 30 | +┌─────────────────────────────────────────────────────────┐ |
| | 31 | +│ Local Training │ |
| | 32 | +│ │ |
| | 33 | +│ When corpora change: │ |
| | 34 | +│ 1. Update corpus files │ |
| | 35 | +│ 2. Run: python manage.py train_hybrid_models --all │ |
| | 36 | +│ 3. Commit trained models to repo │ |
| | 37 | +│ 4. Push to trigger Railway deployment │ |
| | 38 | +│ │ |
| | 39 | +│ Training Time: ~10-15 minutes (one-time) │ |
| | 40 | +└─────────────────────────────────────────────────────────┘ |
| | 41 | +``` |
| | 42 | + |
| | 43 | +## Model Storage |
| | 44 | + |
| | 45 | +### Directory Structure |
| | 46 | + |
| | 47 | +``` |
| | 48 | +backend/jubjub/jubjubword/ |
| | 49 | +├── hybrid_models/ # Committed to repo |
| | 50 | +│ ├── scifi/ |
| | 51 | +│ │ ├── lstm_model.pt # ~80KB (committed) |
| | 52 | +│ │ ├── vocabulary.json # ~2KB (committed) |
| | 53 | +│ │ ├── hybrid_config.json # ~200B (committed) |
| | 54 | +│ │ ├── best_model.pt # Training checkpoint (ignored) |
| | 55 | +│ │ └── training_history.json # Training log (ignored) |
| | 56 | +│ ├── fantasy/ |
| | 57 | +│ ├── food/ |
| | 58 | +│ ├── corporate/ |
| | 59 | +│ └── medical/ |
| | 60 | +├── models/ # Markov models (generated) |
| | 61 | +│ └── markov_n2_wbTrue_*.pkl # ~100KB each |
| | 62 | +└── DEPLOYMENT_HYBRID.md # This file |
| | 63 | +``` |
| | 64 | + |
| | 65 | +### What Gets Committed |
| | 66 | + |
| | 67 | +✅ **Committed** (for fast deployment): |
| | 68 | +- `lstm_model.pt` - Final trained LSTM (~80KB per corpus) |
| | 69 | +- `vocabulary.json` - Character vocabulary (~2KB) |
| | 70 | +- `hybrid_config.json` - Ensemble weights (~200 bytes) |
| | 71 | + |
| | 72 | +❌ **Ignored** (training artifacts): |
| | 73 | +- `best_model.pt` - Training checkpoints |
| | 74 | +- `training_history.json` - Loss curves and metrics |
| | 75 | + |
| | 76 | +Total committed size: **~500KB for all 5 corpora** |
| | 77 | + |
| | 78 | +## When to Retrain Models |
| | 79 | + |
| | 80 | +### Scenarios Requiring Retraining |
| | 81 | + |
| | 82 | +1. **Adding words to a corpus** |
| | 83 | + - Example: Adding 100 new sci-fi words |
| | 84 | + - Impact: Hybrid model won't know new vocabulary |
| | 85 | + - Action: Retrain affected corpus |
| | 86 | + |
| | 87 | +2. **Removing words from a corpus** |
| | 88 | + - Example: Filtering out inappropriate words |
| | 89 | + - Impact: Model may still generate removed patterns |
| | 90 | + - Action: Retrain affected corpus |
| | 91 | + |
| | 92 | +3. **Creating a new corpus** |
| | 93 | + - Example: Adding "mythology" corpus |
| | 94 | + - Impact: No hybrid model exists |
| | 95 | + - Action: Train new corpus |
| | 96 | + |
| | 97 | +4. **Changing Markov parameters** |
| | 98 | + - Example: Switching from n=2 to n=3 |
| | 99 | + - Impact: State space changed |
| | 100 | + - Action: Retrain all corpora |
| | 101 | + |
| | 102 | +### Scenarios NOT Requiring Retraining |
| | 103 | + |
| | 104 | +- Changing frontend code |
| | 105 | +- Updating Django views |
| | 106 | +- Modifying API endpoints |
| | 107 | +- Changing ensemble weights (can update hybrid_config.json directly) |
| | 108 | +- Railway redeployments (models load from repo) |
| | 109 | + |
| | 110 | +## Training Workflow |
| | 111 | + |
| | 112 | +### Initial Setup (One-Time) |
| | 113 | + |
| | 114 | +```bash |
| | 115 | +# 1. Install dependencies |
| | 116 | +cd backend |
| | 117 | +pip install -r requirements.txt |
| | 118 | + |
| | 119 | +# 2. Ensure database is populated |
| | 120 | +python manage.py migrate |
| | 121 | +python manage.py load_corpora |
| | 122 | + |
| | 123 | +# 3. Train all hybrid models (takes ~15 minutes) |
| | 124 | +python manage.py train_hybrid_models --all |
| | 125 | + |
| | 126 | +# 4. Verify models were created |
| | 127 | +ls -lh jubjub/jubjubword/hybrid_models/*/lstm_model.pt |
| | 128 | +# Should see 5 files, ~80KB each |
| | 129 | + |
| | 130 | +# 5. Commit models to repo |
| | 131 | +git add jubjub/jubjubword/hybrid_models/ |
| | 132 | +git commit -m "feat: Add pre-trained hybrid models for all corpora" |
| | 133 | +git push origin claude/your-branch |
| | 134 | +``` |
| | 135 | + |
| | 136 | +### Updating Existing Corpus |
| | 137 | + |
| | 138 | +When you modify a corpus (e.g., adding words to sci-fi): |
| | 139 | + |
| | 140 | +```bash |
| | 141 | +# 1. Update the corpus in database |
| | 142 | +python manage.py load_corpora --verbosity=2 |
| | 143 | + |
| | 144 | +# 2. Retrain affected corpus only |
| | 145 | +python manage.py train_hybrid_models --corpus scifi |
| | 146 | + |
| | 147 | +# 3. Test generation |
| | 148 | +python manage.py shell |
| | 149 | +>>> from jubjub.jubjubword.markov import get_markov_instance |
| | 150 | +>>> from jubjub.jubjubword.hybrid import HybridMarkovLSTM |
| | 151 | +>>> from pathlib import Path |
| | 152 | +>>> markov = get_markov_instance(corpus_slug='scifi') |
| | 153 | +>>> hybrid = HybridMarkovLSTM.load(Path('jubjub/jubjubword/hybrid_models/scifi'), markov) |
| | 154 | +>>> word, meta = hybrid.generate(max_length=10) |
| | 155 | +>>> print(f"Generated: {word}") |
| | 156 | + |
| | 157 | +# 4. Commit and push |
| | 158 | +git add jubjub/jubjubword/hybrid_models/scifi/ |
| | 159 | +git commit -m "feat: Retrain sci-fi hybrid model with expanded corpus" |
| | 160 | +git push origin claude/your-branch |
| | 161 | +``` |
| | 162 | + |
| | 163 | +### Adding New Corpus |
| | 164 | + |
| | 165 | +When adding a completely new corpus: |
| | 166 | + |
| | 167 | +```bash |
| | 168 | +# 1. Create corpus in database (via admin or migration) |
| | 169 | +# ... |
| | 170 | + |
| | 171 | +# 2. Train model for new corpus |
| | 172 | +python manage.py train_hybrid_models --corpus mythology |
| | 173 | + |
| | 174 | +# 3. Commit new model directory |
| | 175 | +git add jubjub/jubjubword/hybrid_models/mythology/ |
| | 176 | +git commit -m "feat: Add hybrid model for mythology corpus" |
| | 177 | +git push origin claude/your-branch |
| | 178 | +``` |
| | 179 | + |
| | 180 | +## Training Options |
| | 181 | + |
| | 182 | +### Basic Training |
| | 183 | + |
| | 184 | +```bash |
| | 185 | +# Train all corpora with defaults |
| | 186 | +python manage.py train_hybrid_models --all |
| | 187 | + |
| | 188 | +# Train specific corpus |
| | 189 | +python manage.py train_hybrid_models --corpus scifi |
| | 190 | +``` |
| | 191 | + |
| | 192 | +### Advanced Options |
| | 193 | + |
| | 194 | +```bash |
| | 195 | +# Larger model for better quality (takes longer) |
| | 196 | +python manage.py train_hybrid_models --corpus scifi \ |
| | 197 | + --hidden-size 128 \ |
| | 198 | + --num-layers 3 \ |
| | 199 | + --epochs 100 |
| | 200 | + |
| | 201 | +# Faster training for testing |
| | 202 | +python manage.py train_hybrid_models --corpus scifi \ |
| | 203 | + --hidden-size 32 \ |
| | 204 | + --epochs 20 |
| | 205 | + |
| | 206 | +# Adjust ensemble weights |
| | 207 | +python manage.py train_hybrid_models --corpus scifi \ |
| | 208 | + --markov-weight 0.7 \ |
| | 209 | + --lstm-weight 0.3 |
| | 210 | + |
| | 211 | +# GPU training (if available) |
| | 212 | +python manage.py train_hybrid_models --corpus scifi --device cuda |
| | 213 | +``` |
| | 214 | + |
| | 215 | +### Training Output |
| | 216 | + |
| | 217 | +Expect to see: |
| | 218 | + |
| | 219 | +``` |
| | 220 | +🚀 Training hybrid models for 1 corpora |
| | 221 | + |
| | 222 | +Hyperparameters: |
| | 223 | + Hidden size: 64 |
| | 224 | + Num layers: 2 |
| | 225 | + Epochs: 50 |
| | 226 | + Batch size: 32 |
| | 227 | + Learning rate: 0.001 |
| | 228 | + Device: cpu |
| | 229 | + Markov weight: 0.6 |
| | 230 | + LSTM weight: 0.4 |
| | 231 | + |
| | 232 | +============================================================ |
| | 233 | +Training: Science Fiction & Tech (scifi) |
| | 234 | +============================================================ |
| | 235 | + |
| | 236 | +Corpus size: 1609 words |
| | 237 | + |
| | 238 | +📚 Training LSTM... |
| | 239 | +Building vocabulary... |
| | 240 | +Vocabulary size: 32 |
| | 241 | +Train: 1448 words, Val: 161 words |
| | 242 | +Model parameters: 21,024 |
| | 243 | +Estimated model size: 82.1 KB |
| | 244 | + |
| | 245 | +Epoch 1/50 - Train Loss: 2.8456, Val Loss: 2.6123 |
| | 246 | +Epoch 2/50 - Train Loss: 2.3145, Val Loss: 2.2456 |
| | 247 | +... |
| | 248 | +Early stopping triggered after 35 epochs |
| | 249 | + |
| | 250 | +✓ Training complete! |
| | 251 | + Epochs trained: 35 |
| | 252 | + Best val loss: 1.4523 |
| | 253 | + Final train loss: 1.5012 |
| | 254 | + |
| | 255 | +🔗 Creating hybrid model... |
| | 256 | +✓ Hybrid model saved to .../hybrid_models/scifi |
| | 257 | + |
| | 258 | +🎲 Sample generations: |
| | 259 | + quanticore (LSTM confidence: 0.68) |
| | 260 | + photonix (LSTM confidence: 0.72) |
| | 261 | + starforge (LSTM confidence: 0.65) |
| | 262 | + cyberdyne (LSTM confidence: 0.71) |
| | 263 | + neurotex (LSTM confidence: 0.69) |
| | 264 | + |
| | 265 | +🎉 Training complete! Models saved to .../hybrid_models |
| | 266 | +``` |
| | 267 | + |
| | 268 | +## Evaluation |
| | 269 | + |
| | 270 | +### Compare Hybrid vs Pure Markov |
| | 271 | + |
| | 272 | +```bash |
| | 273 | +python manage.py evaluate_hybrid --corpus scifi --samples 100 |
| | 274 | +``` |
| | 275 | + |
| | 276 | +Expected improvements: |
| | 277 | +- **Pronounceability**: +5-15% (more phonetically natural) |
| | 278 | +- **Diversity**: +10-20% (unique character patterns) |
| | 279 | +- **Consistency**: Similar (both respect corpus style) |
| | 280 | + |
| | 281 | +### Detailed Analysis |
| | 282 | + |
| | 283 | +```bash |
| | 284 | +# Generate comparison report |
| | 285 | +python manage.py evaluate_hybrid --corpus scifi --samples 500 --report |
| | 286 | + |
| | 287 | +# Output saved to: jubjub/jubjubword/evaluation_reports/scifi_evaluation.json |
| | 288 | +``` |
| | 289 | + |
| | 290 | +## Railway Configuration |
| | 291 | + |
| | 292 | +### Current Setup (No Changes Needed) |
| | 293 | + |
| | 294 | +`railway.json` already includes Markov model prebuilding: |
| | 295 | + |
| | 296 | +```json |
| | 297 | +{ |
| | 298 | + "deploy": { |
| | 299 | + "startCommand": "python manage.py migrate && python manage.py load_corpora --verbosity=2 && python manage.py prebuild_markov_models && gunicorn jubjub.wsgi:application --bind 0.0.0.0:$PORT" |
| | 300 | + } |
| | 301 | +} |
| | 302 | +``` |
| | 303 | + |
| | 304 | +**Why we DON'T add hybrid training:** |
| | 305 | +- Hybrid models are pre-trained and committed to repo |
| | 306 | +- Railway loads models from disk (fast) |
| | 307 | +- No training needed on deployment (saves 10-15 minutes) |
| | 308 | +- Deployment stays under 1 minute |
| | 309 | + |
| | 310 | +### What Railway Does |
| | 311 | + |
| | 312 | +1. **Pulls repo** (includes pre-trained hybrid models) |
| | 313 | +2. **Installs PyTorch** (~200MB, used for inference only) |
| | 314 | +3. **Runs migrations** (sets up database) |
| | 315 | +4. **Loads corpora** (populates word lists) |
| | 316 | +5. **Prebuilds Markov models** (fast, ~1 second per corpus) |
| | 317 | +6. **Starts gunicorn** (hybrid models auto-load when requested) |
| | 318 | + |
| | 319 | +### Environment Variables (Optional) |
| | 320 | + |
| | 321 | +If you want to disable hybrid models temporarily: |
| | 322 | + |
| | 323 | +```bash |
| | 324 | +# Railway dashboard -> Environment Variables |
| | 325 | +ENABLE_HYBRID_MODELS=false |
| | 326 | +``` |
| | 327 | + |
| | 328 | +Then update `views.py` to check this flag. |
| | 329 | + |
| | 330 | +## Troubleshooting |
| | 331 | + |
| | 332 | +### Models Not Loading |
| | 333 | + |
| | 334 | +**Symptom**: `FileNotFoundError: hybrid_models/scifi/lstm_model.pt not found` |
| | 335 | + |
| | 336 | +**Fix**: |
| | 337 | +```bash |
| | 338 | +# Verify models exist in repo |
| | 339 | +ls backend/jubjub/jubjubword/hybrid_models/*/lstm_model.pt |
| | 340 | + |
| | 341 | +# If missing, train locally |
| | 342 | +python manage.py train_hybrid_models --all |
| | 343 | + |
| | 344 | +# Commit and push |
| | 345 | +git add jubjub/jubjubword/hybrid_models/ |
| | 346 | +git commit -m "fix: Add missing hybrid models" |
| | 347 | +git push |
| | 348 | +``` |
| | 349 | + |
| | 350 | +### Poor Generation Quality |
| | 351 | + |
| | 352 | +**Symptom**: Hybrid generates worse words than pure Markov |
| | 353 | + |
| | 354 | +**Possible Causes**: |
| | 355 | +1. **Corpus too small** (<500 words) - LSTM can't learn patterns |
| | 356 | +2. **Overfitting** - LSTM memorized training data |
| | 357 | +3. **Bad weights** - Ensemble favoring wrong model |
| | 358 | + |
| | 359 | +**Fix**: |
| | 360 | +```bash |
| | 361 | +# Retrain with early stopping and more validation data |
| | 362 | +python manage.py train_hybrid_models --corpus scifi --epochs 30 |
| | 363 | + |
| | 364 | +# Or adjust ensemble weights (more Markov, less LSTM) |
| | 365 | +python manage.py train_hybrid_models --corpus scifi \ |
| | 366 | + --markov-weight 0.8 \ |
| | 367 | + --lstm-weight 0.2 |
| | 368 | +``` |
| | 369 | + |
| | 370 | +### Slow Deployment |
| | 371 | + |
| | 372 | +**Symptom**: Railway deployment takes >5 minutes |
| | 373 | + |
| | 374 | +**Possible Causes**: |
| | 375 | +1. PyTorch installation slow (normal first time) |
| | 376 | +2. Accidentally training models on Railway (check railway.json) |
| | 377 | + |
| | 378 | +**Fix**: |
| | 379 | +```bash |
| | 380 | +# Verify railway.json does NOT include training |
| | 381 | +cat backend/railway.json | grep train_hybrid |
| | 382 | + |
| | 383 | +# Should return nothing - training is NOT in startCommand |
| | 384 | +``` |
| | 385 | + |
| | 386 | +### Large Repository Size |
| | 387 | + |
| | 388 | +**Symptom**: Git repo over 100MB |
| | 389 | + |
| | 390 | +**Possible Causes**: |
| | 391 | +1. Committing training checkpoints (best_model.pt) |
| | 392 | +2. Committing training history (training_history.json) |
| | 393 | + |
| | 394 | +**Fix**: |
| | 395 | +```bash |
| | 396 | +# Remove ignored files from git |
| | 397 | +git rm --cached backend/jubjub/jubjubword/hybrid_models/*/best_model.pt |
| | 398 | +git rm --cached backend/jubjub/jubjubword/hybrid_models/*/training_history.json |
| | 399 | + |
| | 400 | +# Verify .gitignore includes them |
| | 401 | +cat .gitignore | grep hybrid_models |
| | 402 | +``` |
| | 403 | + |
| | 404 | +## Monitoring |
| | 405 | + |
| | 406 | +### Check Model Status |
| | 407 | + |
| | 408 | +```python |
| | 409 | +# In Django shell |
| | 410 | +from jubjub.jubjubword.hybrid import HybridMarkovLSTM |
| | 411 | +from jubjub.jubjubword.markov import get_markov_instance |
| | 412 | +from pathlib import Path |
| | 413 | +import os |
| | 414 | + |
| | 415 | +# Check which models exist |
| | 416 | +models_dir = Path('jubjub/jubjubword/hybrid_models') |
| | 417 | +available = [d.name for d in models_dir.iterdir() if d.is_dir()] |
| | 418 | +print(f"Available hybrid models: {available}") |
| | 419 | + |
| | 420 | +# Load and test |
| | 421 | +markov = get_markov_instance(corpus_slug='scifi') |
| | 422 | +hybrid = HybridMarkovLSTM.load(models_dir / 'scifi', markov) |
| | 423 | + |
| | 424 | +# Generate with metadata |
| | 425 | +word, meta = hybrid.generate(max_length=10) |
| | 426 | +print(f"Word: {word}") |
| | 427 | +print(f"LSTM confidence: {meta['avg_lstm_confidence']:.2f}") |
| | 428 | +print(f"Markov influence: {meta['avg_markov_influence']:.2f}") |
| | 429 | +print(f"LSTM influence: {meta['avg_lstm_influence']:.2f}") |
| | 430 | +``` |
| | 431 | + |
| | 432 | +### Performance Metrics |
| | 433 | + |
| | 434 | +```bash |
| | 435 | +# Generate 1000 words and analyze |
| | 436 | +python manage.py evaluate_hybrid --corpus scifi --samples 1000 --report |
| | 437 | + |
| | 438 | +# Check pronounceability distribution |
| | 439 | +# Check diversity metrics |
| | 440 | +# Compare to pure Markov baseline |
| | 441 | +``` |
| | 442 | + |
| | 443 | +## Future Enhancements |
| | 444 | + |
| | 445 | +### Option B: Train on Deployment (If Needed) |
| | 446 | + |
| | 447 | +If corpora become dynamic (user-contributed words), switch to Option B: |
| | 448 | + |
| | 449 | +**railway.json** change: |
| | 450 | +```json |
| | 451 | +{ |
| | 452 | + "deploy": { |
| | 453 | + "startCommand": "python manage.py migrate && python manage.py load_corpora --verbosity=2 && python manage.py prebuild_markov_models && python manage.py train_hybrid_models --all --epochs 30 && gunicorn jubjub.wsgi:application --bind 0.0.0.0:$PORT" |
| | 454 | + } |
| | 455 | +} |
| | 456 | +``` |
| | 457 | + |
| | 458 | +**Tradeoffs**: |
| | 459 | +- ✅ Always fresh models |
| | 460 | +- ❌ 10-15 minute deployment time |
| | 461 | +- ❌ Higher compute costs |
| | 462 | + |
| | 463 | +### Git LFS (If Models Exceed 100MB) |
| | 464 | + |
| | 465 | +If you add many more corpora: |
| | 466 | + |
| | 467 | +```bash |
| | 468 | +# Install Git LFS |
| | 469 | +git lfs install |
| | 470 | + |
| | 471 | +# Track model files |
| | 472 | +git lfs track "*.pt" |
| | 473 | +git lfs track "*.pkl" |
| | 474 | + |
| | 475 | +# Update .gitattributes (already configured) |
| | 476 | +``` |
| | 477 | + |
| | 478 | +### Incremental Training |
| | 479 | + |
| | 480 | +Future feature to update models without full retrain: |
| | 481 | + |
| | 482 | +```python |
| | 483 | +# Add new words to existing model |
| | 484 | +from jubjub.jubjubword.hybrid_trainer import incremental_train |
| | 485 | + |
| | 486 | +incremental_train( |
| | 487 | + corpus_slug='scifi', |
| | 488 | + new_words=['quantumflux', 'nanocore', 'cyberdeck'], |
| | 489 | + epochs=10 # Fine-tune only |
| | 490 | +) |
| | 491 | +``` |
| | 492 | + |
| | 493 | +## Summary |
| | 494 | + |
| | 495 | +**Current Setup (Option A)**: |
| | 496 | +- ✅ Pre-trained models committed to repo |
| | 497 | +- ✅ Fast Railway deployments (<1 minute) |
| | 498 | +- ✅ No training overhead in production |
| | 499 | +- ✅ ~500KB model size (acceptable) |
| | 500 | + |
| | 501 | +**When corpora change**: |
| | 502 | +- Train locally: `python manage.py train_hybrid_models --corpus X` |
| | 503 | +- Commit models: `git add hybrid_models/X/ && git commit` |
| | 504 | +- Deploy: `git push` (Railway auto-deploys) |
| | 505 | + |
| | 506 | +**Maintenance**: |
| | 507 | +- Retrain when corpus words change |
| | 508 | +- ~15 minutes total training time (infrequent) |
| | 509 | +- Models stay in sync with corpus content |
| | 510 | + |
| | 511 | +This approach balances simplicity, performance, and maintainability for JubJub Word's current scale. |