@@ -0,0 +1,511 @@ |
| 1 | +# Hybrid Model Deployment Guide |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +JubJub Word uses **Option A: Pre-trained Models in Repository** for deploying hybrid Markov-LSTM models. This approach provides: |
| 6 | + |
| 7 | +- **Fast deployment** (<30 seconds startup) |
| 8 | +- **Consistent models** across all instances |
| 9 | +- **No training overhead** on Railway |
| 10 | +- **Predictable behavior** in production |
| 11 | + |
| 12 | +## How It Works |
| 13 | + |
| 14 | +### Architecture |
| 15 | + |
| 16 | +``` |
| 17 | +┌─────────────────────────────────────────────────────────┐ |
| 18 | +│ Railway Deployment │ |
| 19 | +│ │ |
| 20 | +│ 1. Install dependencies (includes PyTorch) │ |
| 21 | +│ 2. Run migrations │ |
| 22 | +│ 3. Load corpora from database │ |
| 23 | +│ 4. Prebuild Markov models (fast) │ |
| 24 | +│ 5. Load pre-trained hybrid models from repo │ |
| 25 | +│ 6. Start gunicorn │ |
| 26 | +│ │ |
| 27 | +│ Total Time: ~30 seconds │ |
| 28 | +└─────────────────────────────────────────────────────────┘ |
| 29 | + |
| 30 | +┌─────────────────────────────────────────────────────────┐ |
| 31 | +│ Local Training │ |
| 32 | +│ │ |
| 33 | +│ When corpora change: │ |
| 34 | +│ 1. Update corpus files │ |
| 35 | +│ 2. Run: python manage.py train_hybrid_models --all │ |
| 36 | +│ 3. Commit trained models to repo │ |
| 37 | +│ 4. Push to trigger Railway deployment │ |
| 38 | +│ │ |
| 39 | +│ Training Time: ~10-15 minutes (one-time) │ |
| 40 | +└─────────────────────────────────────────────────────────┘ |
| 41 | +``` |
| 42 | + |
| 43 | +## Model Storage |
| 44 | + |
| 45 | +### Directory Structure |
| 46 | + |
| 47 | +``` |
| 48 | +backend/jubjub/jubjubword/ |
| 49 | +├── hybrid_models/ # Committed to repo |
| 50 | +│ ├── scifi/ |
| 51 | +│ │ ├── lstm_model.pt # ~80KB (committed) |
| 52 | +│ │ ├── vocabulary.json # ~2KB (committed) |
| 53 | +│ │ ├── hybrid_config.json # ~200B (committed) |
| 54 | +│ │ ├── best_model.pt # Training checkpoint (ignored) |
| 55 | +│ │ └── training_history.json # Training log (ignored) |
| 56 | +│ ├── fantasy/ |
| 57 | +│ ├── food/ |
| 58 | +│ ├── corporate/ |
| 59 | +│ └── medical/ |
| 60 | +├── models/ # Markov models (generated) |
| 61 | +│ └── markov_n2_wbTrue_*.pkl # ~100KB each |
| 62 | +└── DEPLOYMENT_HYBRID.md # This file |
| 63 | +``` |
| 64 | + |
| 65 | +### What Gets Committed |
| 66 | + |
| 67 | +✅ **Committed** (for fast deployment): |
| 68 | +- `lstm_model.pt` - Final trained LSTM (~80KB per corpus) |
| 69 | +- `vocabulary.json` - Character vocabulary (~2KB) |
| 70 | +- `hybrid_config.json` - Ensemble weights (~200 bytes) |
| 71 | + |
| 72 | +❌ **Ignored** (training artifacts): |
| 73 | +- `best_model.pt` - Training checkpoints |
| 74 | +- `training_history.json` - Loss curves and metrics |
| 75 | + |
| 76 | +Total committed size: **~500KB for all 5 corpora** |
| 77 | + |
| 78 | +## When to Retrain Models |
| 79 | + |
| 80 | +### Scenarios Requiring Retraining |
| 81 | + |
| 82 | +1. **Adding words to a corpus** |
| 83 | + - Example: Adding 100 new sci-fi words |
| 84 | + - Impact: Hybrid model won't know new vocabulary |
| 85 | + - Action: Retrain affected corpus |
| 86 | + |
| 87 | +2. **Removing words from a corpus** |
| 88 | + - Example: Filtering out inappropriate words |
| 89 | + - Impact: Model may still generate removed patterns |
| 90 | + - Action: Retrain affected corpus |
| 91 | + |
| 92 | +3. **Creating a new corpus** |
| 93 | + - Example: Adding "mythology" corpus |
| 94 | + - Impact: No hybrid model exists |
| 95 | + - Action: Train new corpus |
| 96 | + |
| 97 | +4. **Changing Markov parameters** |
| 98 | + - Example: Switching from n=2 to n=3 |
| 99 | + - Impact: State space changed |
| 100 | + - Action: Retrain all corpora |
| 101 | + |
| 102 | +### Scenarios NOT Requiring Retraining |
| 103 | + |
| 104 | +- Changing frontend code |
| 105 | +- Updating Django views |
| 106 | +- Modifying API endpoints |
| 107 | +- Changing ensemble weights (can update hybrid_config.json directly) |
| 108 | +- Railway redeployments (models load from repo) |
| 109 | + |
| 110 | +## Training Workflow |
| 111 | + |
| 112 | +### Initial Setup (One-Time) |
| 113 | + |
| 114 | +```bash |
| 115 | +# 1. Install dependencies |
| 116 | +cd backend |
| 117 | +pip install -r requirements.txt |
| 118 | + |
| 119 | +# 2. Ensure database is populated |
| 120 | +python manage.py migrate |
| 121 | +python manage.py load_corpora |
| 122 | + |
| 123 | +# 3. Train all hybrid models (takes ~15 minutes) |
| 124 | +python manage.py train_hybrid_models --all |
| 125 | + |
| 126 | +# 4. Verify models were created |
| 127 | +ls -lh jubjub/jubjubword/hybrid_models/*/lstm_model.pt |
| 128 | +# Should see 5 files, ~80KB each |
| 129 | + |
| 130 | +# 5. Commit models to repo |
| 131 | +git add jubjub/jubjubword/hybrid_models/ |
| 132 | +git commit -m "feat: Add pre-trained hybrid models for all corpora" |
| 133 | +git push origin claude/your-branch |
| 134 | +``` |
| 135 | + |
| 136 | +### Updating Existing Corpus |
| 137 | + |
| 138 | +When you modify a corpus (e.g., adding words to sci-fi): |
| 139 | + |
| 140 | +```bash |
| 141 | +# 1. Update the corpus in database |
| 142 | +python manage.py load_corpora --verbosity=2 |
| 143 | + |
| 144 | +# 2. Retrain affected corpus only |
| 145 | +python manage.py train_hybrid_models --corpus scifi |
| 146 | + |
| 147 | +# 3. Test generation |
| 148 | +python manage.py shell |
| 149 | +>>> from jubjub.jubjubword.markov import get_markov_instance |
| 150 | +>>> from jubjub.jubjubword.hybrid import HybridMarkovLSTM |
| 151 | +>>> from pathlib import Path |
| 152 | +>>> markov = get_markov_instance(corpus_slug='scifi') |
| 153 | +>>> hybrid = HybridMarkovLSTM.load(Path('jubjub/jubjubword/hybrid_models/scifi'), markov) |
| 154 | +>>> word, meta = hybrid.generate(max_length=10) |
| 155 | +>>> print(f"Generated: {word}") |
| 156 | + |
| 157 | +# 4. Commit and push |
| 158 | +git add jubjub/jubjubword/hybrid_models/scifi/ |
| 159 | +git commit -m "feat: Retrain sci-fi hybrid model with expanded corpus" |
| 160 | +git push origin claude/your-branch |
| 161 | +``` |
| 162 | + |
| 163 | +### Adding New Corpus |
| 164 | + |
| 165 | +When adding a completely new corpus: |
| 166 | + |
| 167 | +```bash |
| 168 | +# 1. Create corpus in database (via admin or migration) |
| 169 | +# ... |
| 170 | + |
| 171 | +# 2. Train model for new corpus |
| 172 | +python manage.py train_hybrid_models --corpus mythology |
| 173 | + |
| 174 | +# 3. Commit new model directory |
| 175 | +git add jubjub/jubjubword/hybrid_models/mythology/ |
| 176 | +git commit -m "feat: Add hybrid model for mythology corpus" |
| 177 | +git push origin claude/your-branch |
| 178 | +``` |
| 179 | + |
| 180 | +## Training Options |
| 181 | + |
| 182 | +### Basic Training |
| 183 | + |
| 184 | +```bash |
| 185 | +# Train all corpora with defaults |
| 186 | +python manage.py train_hybrid_models --all |
| 187 | + |
| 188 | +# Train specific corpus |
| 189 | +python manage.py train_hybrid_models --corpus scifi |
| 190 | +``` |
| 191 | + |
| 192 | +### Advanced Options |
| 193 | + |
| 194 | +```bash |
| 195 | +# Larger model for better quality (takes longer) |
| 196 | +python manage.py train_hybrid_models --corpus scifi \ |
| 197 | + --hidden-size 128 \ |
| 198 | + --num-layers 3 \ |
| 199 | + --epochs 100 |
| 200 | + |
| 201 | +# Faster training for testing |
| 202 | +python manage.py train_hybrid_models --corpus scifi \ |
| 203 | + --hidden-size 32 \ |
| 204 | + --epochs 20 |
| 205 | + |
| 206 | +# Adjust ensemble weights |
| 207 | +python manage.py train_hybrid_models --corpus scifi \ |
| 208 | + --markov-weight 0.7 \ |
| 209 | + --lstm-weight 0.3 |
| 210 | + |
| 211 | +# GPU training (if available) |
| 212 | +python manage.py train_hybrid_models --corpus scifi --device cuda |
| 213 | +``` |
| 214 | + |
| 215 | +### Training Output |
| 216 | + |
| 217 | +Expect to see: |
| 218 | + |
| 219 | +``` |
| 220 | +🚀 Training hybrid models for 1 corpora |
| 221 | + |
| 222 | +Hyperparameters: |
| 223 | + Hidden size: 64 |
| 224 | + Num layers: 2 |
| 225 | + Epochs: 50 |
| 226 | + Batch size: 32 |
| 227 | + Learning rate: 0.001 |
| 228 | + Device: cpu |
| 229 | + Markov weight: 0.6 |
| 230 | + LSTM weight: 0.4 |
| 231 | + |
| 232 | +============================================================ |
| 233 | +Training: Science Fiction & Tech (scifi) |
| 234 | +============================================================ |
| 235 | + |
| 236 | +Corpus size: 1609 words |
| 237 | + |
| 238 | +📚 Training LSTM... |
| 239 | +Building vocabulary... |
| 240 | +Vocabulary size: 32 |
| 241 | +Train: 1448 words, Val: 161 words |
| 242 | +Model parameters: 21,024 |
| 243 | +Estimated model size: 82.1 KB |
| 244 | + |
| 245 | +Epoch 1/50 - Train Loss: 2.8456, Val Loss: 2.6123 |
| 246 | +Epoch 2/50 - Train Loss: 2.3145, Val Loss: 2.2456 |
| 247 | +... |
| 248 | +Early stopping triggered after 35 epochs |
| 249 | + |
| 250 | +✓ Training complete! |
| 251 | + Epochs trained: 35 |
| 252 | + Best val loss: 1.4523 |
| 253 | + Final train loss: 1.5012 |
| 254 | + |
| 255 | +🔗 Creating hybrid model... |
| 256 | +✓ Hybrid model saved to .../hybrid_models/scifi |
| 257 | + |
| 258 | +🎲 Sample generations: |
| 259 | + quanticore (LSTM confidence: 0.68) |
| 260 | + photonix (LSTM confidence: 0.72) |
| 261 | + starforge (LSTM confidence: 0.65) |
| 262 | + cyberdyne (LSTM confidence: 0.71) |
| 263 | + neurotex (LSTM confidence: 0.69) |
| 264 | + |
| 265 | +🎉 Training complete! Models saved to .../hybrid_models |
| 266 | +``` |
| 267 | + |
| 268 | +## Evaluation |
| 269 | + |
| 270 | +### Compare Hybrid vs Pure Markov |
| 271 | + |
| 272 | +```bash |
| 273 | +python manage.py evaluate_hybrid --corpus scifi --samples 100 |
| 274 | +``` |
| 275 | + |
| 276 | +Expected improvements: |
| 277 | +- **Pronounceability**: +5-15% (more phonetically natural) |
| 278 | +- **Diversity**: +10-20% (unique character patterns) |
| 279 | +- **Consistency**: Similar (both respect corpus style) |
| 280 | + |
| 281 | +### Detailed Analysis |
| 282 | + |
| 283 | +```bash |
| 284 | +# Generate comparison report |
| 285 | +python manage.py evaluate_hybrid --corpus scifi --samples 500 --report |
| 286 | + |
| 287 | +# Output saved to: jubjub/jubjubword/evaluation_reports/scifi_evaluation.json |
| 288 | +``` |
| 289 | + |
| 290 | +## Railway Configuration |
| 291 | + |
| 292 | +### Current Setup (No Changes Needed) |
| 293 | + |
| 294 | +`railway.json` already includes Markov model prebuilding: |
| 295 | + |
| 296 | +```json |
| 297 | +{ |
| 298 | + "deploy": { |
| 299 | + "startCommand": "python manage.py migrate && python manage.py load_corpora --verbosity=2 && python manage.py prebuild_markov_models && gunicorn jubjub.wsgi:application --bind 0.0.0.0:$PORT" |
| 300 | + } |
| 301 | +} |
| 302 | +``` |
| 303 | + |
| 304 | +**Why we DON'T add hybrid training:** |
| 305 | +- Hybrid models are pre-trained and committed to repo |
| 306 | +- Railway loads models from disk (fast) |
| 307 | +- No training needed on deployment (saves 10-15 minutes) |
| 308 | +- Deployment stays under 1 minute |
| 309 | + |
| 310 | +### What Railway Does |
| 311 | + |
| 312 | +1. **Pulls repo** (includes pre-trained hybrid models) |
| 313 | +2. **Installs PyTorch** (~200MB, used for inference only) |
| 314 | +3. **Runs migrations** (sets up database) |
| 315 | +4. **Loads corpora** (populates word lists) |
| 316 | +5. **Prebuilds Markov models** (fast, ~1 second per corpus) |
| 317 | +6. **Starts gunicorn** (hybrid models auto-load when requested) |
| 318 | + |
| 319 | +### Environment Variables (Optional) |
| 320 | + |
| 321 | +If you want to disable hybrid models temporarily: |
| 322 | + |
| 323 | +```bash |
| 324 | +# Railway dashboard -> Environment Variables |
| 325 | +ENABLE_HYBRID_MODELS=false |
| 326 | +``` |
| 327 | + |
| 328 | +Then update `views.py` to check this flag. |
| 329 | + |
| 330 | +## Troubleshooting |
| 331 | + |
| 332 | +### Models Not Loading |
| 333 | + |
| 334 | +**Symptom**: `FileNotFoundError: hybrid_models/scifi/lstm_model.pt not found` |
| 335 | + |
| 336 | +**Fix**: |
| 337 | +```bash |
| 338 | +# Verify models exist in repo |
| 339 | +ls backend/jubjub/jubjubword/hybrid_models/*/lstm_model.pt |
| 340 | + |
| 341 | +# If missing, train locally |
| 342 | +python manage.py train_hybrid_models --all |
| 343 | + |
| 344 | +# Commit and push |
| 345 | +git add jubjub/jubjubword/hybrid_models/ |
| 346 | +git commit -m "fix: Add missing hybrid models" |
| 347 | +git push |
| 348 | +``` |
| 349 | + |
| 350 | +### Poor Generation Quality |
| 351 | + |
| 352 | +**Symptom**: Hybrid generates worse words than pure Markov |
| 353 | + |
| 354 | +**Possible Causes**: |
| 355 | +1. **Corpus too small** (<500 words) - LSTM can't learn patterns |
| 356 | +2. **Overfitting** - LSTM memorized training data |
| 357 | +3. **Bad weights** - Ensemble favoring wrong model |
| 358 | + |
| 359 | +**Fix**: |
| 360 | +```bash |
| 361 | +# Retrain with early stopping and more validation data |
| 362 | +python manage.py train_hybrid_models --corpus scifi --epochs 30 |
| 363 | + |
| 364 | +# Or adjust ensemble weights (more Markov, less LSTM) |
| 365 | +python manage.py train_hybrid_models --corpus scifi \ |
| 366 | + --markov-weight 0.8 \ |
| 367 | + --lstm-weight 0.2 |
| 368 | +``` |
| 369 | + |
| 370 | +### Slow Deployment |
| 371 | + |
| 372 | +**Symptom**: Railway deployment takes >5 minutes |
| 373 | + |
| 374 | +**Possible Causes**: |
| 375 | +1. PyTorch installation slow (normal first time) |
| 376 | +2. Accidentally training models on Railway (check railway.json) |
| 377 | + |
| 378 | +**Fix**: |
| 379 | +```bash |
| 380 | +# Verify railway.json does NOT include training |
| 381 | +cat backend/railway.json | grep train_hybrid |
| 382 | + |
| 383 | +# Should return nothing - training is NOT in startCommand |
| 384 | +``` |
| 385 | + |
| 386 | +### Large Repository Size |
| 387 | + |
| 388 | +**Symptom**: Git repo over 100MB |
| 389 | + |
| 390 | +**Possible Causes**: |
| 391 | +1. Committing training checkpoints (best_model.pt) |
| 392 | +2. Committing training history (training_history.json) |
| 393 | + |
| 394 | +**Fix**: |
| 395 | +```bash |
| 396 | +# Remove ignored files from git |
| 397 | +git rm --cached backend/jubjub/jubjubword/hybrid_models/*/best_model.pt |
| 398 | +git rm --cached backend/jubjub/jubjubword/hybrid_models/*/training_history.json |
| 399 | + |
| 400 | +# Verify .gitignore includes them |
| 401 | +cat .gitignore | grep hybrid_models |
| 402 | +``` |
| 403 | + |
| 404 | +## Monitoring |
| 405 | + |
| 406 | +### Check Model Status |
| 407 | + |
| 408 | +```python |
| 409 | +# In Django shell |
| 410 | +from jubjub.jubjubword.hybrid import HybridMarkovLSTM |
| 411 | +from jubjub.jubjubword.markov import get_markov_instance |
| 412 | +from pathlib import Path |
| 413 | +import os |
| 414 | + |
| 415 | +# Check which models exist |
| 416 | +models_dir = Path('jubjub/jubjubword/hybrid_models') |
| 417 | +available = [d.name for d in models_dir.iterdir() if d.is_dir()] |
| 418 | +print(f"Available hybrid models: {available}") |
| 419 | + |
| 420 | +# Load and test |
| 421 | +markov = get_markov_instance(corpus_slug='scifi') |
| 422 | +hybrid = HybridMarkovLSTM.load(models_dir / 'scifi', markov) |
| 423 | + |
| 424 | +# Generate with metadata |
| 425 | +word, meta = hybrid.generate(max_length=10) |
| 426 | +print(f"Word: {word}") |
| 427 | +print(f"LSTM confidence: {meta['avg_lstm_confidence']:.2f}") |
| 428 | +print(f"Markov influence: {meta['avg_markov_influence']:.2f}") |
| 429 | +print(f"LSTM influence: {meta['avg_lstm_influence']:.2f}") |
| 430 | +``` |
| 431 | + |
| 432 | +### Performance Metrics |
| 433 | + |
| 434 | +```bash |
| 435 | +# Generate 1000 words and analyze |
| 436 | +python manage.py evaluate_hybrid --corpus scifi --samples 1000 --report |
| 437 | + |
| 438 | +# Check pronounceability distribution |
| 439 | +# Check diversity metrics |
| 440 | +# Compare to pure Markov baseline |
| 441 | +``` |
| 442 | + |
| 443 | +## Future Enhancements |
| 444 | + |
| 445 | +### Option B: Train on Deployment (If Needed) |
| 446 | + |
| 447 | +If corpora become dynamic (user-contributed words), switch to Option B: |
| 448 | + |
| 449 | +**railway.json** change: |
| 450 | +```json |
| 451 | +{ |
| 452 | + "deploy": { |
| 453 | + "startCommand": "python manage.py migrate && python manage.py load_corpora --verbosity=2 && python manage.py prebuild_markov_models && python manage.py train_hybrid_models --all --epochs 30 && gunicorn jubjub.wsgi:application --bind 0.0.0.0:$PORT" |
| 454 | + } |
| 455 | +} |
| 456 | +``` |
| 457 | + |
| 458 | +**Tradeoffs**: |
| 459 | +- ✅ Always fresh models |
| 460 | +- ❌ 10-15 minute deployment time |
| 461 | +- ❌ Higher compute costs |
| 462 | + |
| 463 | +### Git LFS (If Models Exceed 100MB) |
| 464 | + |
| 465 | +If you add many more corpora: |
| 466 | + |
| 467 | +```bash |
| 468 | +# Install Git LFS |
| 469 | +git lfs install |
| 470 | + |
| 471 | +# Track model files |
| 472 | +git lfs track "*.pt" |
| 473 | +git lfs track "*.pkl" |
| 474 | + |
| 475 | +# Update .gitattributes (already configured) |
| 476 | +``` |
| 477 | + |
| 478 | +### Incremental Training |
| 479 | + |
| 480 | +Future feature to update models without full retrain: |
| 481 | + |
| 482 | +```python |
| 483 | +# Add new words to existing model |
| 484 | +from jubjub.jubjubword.hybrid_trainer import incremental_train |
| 485 | + |
| 486 | +incremental_train( |
| 487 | + corpus_slug='scifi', |
| 488 | + new_words=['quantumflux', 'nanocore', 'cyberdeck'], |
| 489 | + epochs=10 # Fine-tune only |
| 490 | +) |
| 491 | +``` |
| 492 | + |
| 493 | +## Summary |
| 494 | + |
| 495 | +**Current Setup (Option A)**: |
| 496 | +- ✅ Pre-trained models committed to repo |
| 497 | +- ✅ Fast Railway deployments (<1 minute) |
| 498 | +- ✅ No training overhead in production |
| 499 | +- ✅ ~500KB model size (acceptable) |
| 500 | + |
| 501 | +**When corpora change**: |
| 502 | +- Train locally: `python manage.py train_hybrid_models --corpus X` |
| 503 | +- Commit models: `git add hybrid_models/X/ && git commit` |
| 504 | +- Deploy: `git push` (Railway auto-deploys) |
| 505 | + |
| 506 | +**Maintenance**: |
| 507 | +- Retrain when corpus words change |
| 508 | +- ~15 minutes total training time (infrequent) |
| 509 | +- Models stay in sync with corpus content |
| 510 | + |
| 511 | +This approach balances simplicity, performance, and maintainability for JubJub Word's current scale. |