@@ -0,0 +1,173 @@ |
| 1 | +# Storage |
| 2 | + |
| 3 | +shithub has two storage layers: |
| 4 | + |
| 5 | +1. **Object storage** — S3-compatible (MinIO in dev/test, DigitalOcean Spaces in prod). Used for avatars, attachments, and (post-MVP) LFS objects. |
| 6 | +2. **Repo filesystem storage** — bare git repositories on a local block-storage volume, in a sharded layout owned by the `RepoFS` helper. |
| 7 | + |
| 8 | +Both layers live behind the package `internal/infra/storage`. Path validation is the **security boundary** — every entry that takes user-supplied owner/repo names goes through `RepoPath`, which rejects unsafe inputs against a strict whitelist. If repo paths can be tricked, every later sprint inherits the bug; the test suite *over*-tests this. |
| 9 | + |
| 10 | +## Object storage |
| 11 | + |
| 12 | +### Interface |
| 13 | + |
| 14 | +```go |
| 15 | +type ObjectStore interface { |
| 16 | + Put(ctx, key, body io.Reader, opts PutOpts) (PutResult, error) |
| 17 | + Get(ctx, key) (io.ReadCloser, ObjectMeta, error) |
| 18 | + Stat(ctx, key) (ObjectMeta, error) |
| 19 | + Delete(ctx, key) error |
| 20 | + List(ctx, prefix, opts ListOpts) (ListResult, error) |
| 21 | + SignedURL(ctx, key, ttl, method) (string, error) |
| 22 | +} |
| 23 | +``` |
| 24 | + |
| 25 | +Two implementations: |
| 26 | + |
| 27 | +- `S3Store` — backed by minio-go. Works against any S3-compatible endpoint. `force_path_style=true` for MinIO; `false` for Spaces. |
| 28 | +- `MemoryStore` — in-process map for tests. Honors the same semantics including `If-None-Match`. |
| 29 | + |
| 30 | +### Bucket / key scheme |
| 31 | + |
| 32 | +Single bucket per environment: `shithub-dev`, `shithub-staging`, `shithub-prod`. Per-scope key prefixes ease policy and tenant isolation: |
| 33 | + |
| 34 | +``` |
| 35 | +lfs/<owner>/<repo>/<sha256> # LFS objects (post-MVP, key shape reserved) |
| 36 | +attachments/<scope>/<id>/<filename> # issue/PR/comment attachments |
| 37 | +avatars/<owner>/<size>.<ext> # rendered avatar variants |
| 38 | +backups/... # S37 |
| 39 | +``` |
| 40 | + |
| 41 | +Keys are always lowercase. |
| 42 | + |
| 43 | +### Semantics worth knowing |
| 44 | + |
| 45 | +- **Idempotent delete.** `Delete` returns nil for absent keys. |
| 46 | +- **`If-None-Match: "*"`** is the only precondition supported. Causes `Put` to fail with `ErrPreconditionFailed` when the destination already exists. Used to avoid overwrite races. |
| 47 | +- **`SignedURL`** supports `GET` and `PUT` only (no multipart). Avatar/attachment direct-uploads in later sprints rely on this; we wire it now even though no caller uses it yet. |
| 48 | +- **`List` with `Recursive=false`** uses `/` as a delimiter and surfaces folders in `CommonPrefixes` — matches S3 behavior. |
| 49 | +- **`ContentLength`** in `PutOpts` is a hint; pass 0 to let the backend buffer/stream. |
| 50 | + |
| 51 | +### MinIO vs Spaces drift |
| 52 | + |
| 53 | +The two backends share an interface but behavioral edges differ: |
| 54 | + |
| 55 | +- **Path-style addressing.** MinIO needs `force_path_style=true`. Spaces supports virtual-host-style (the default). |
| 56 | +- **Lifecycle rules.** Spaces and MinIO honor different subsets of the S3 lifecycle XML. Apply rules through their respective consoles, not via the SDK. |
| 57 | +- **ACL semantics.** Spaces supports `public-read` on objects; MinIO uses bucket policies. We don't rely on either today (all reads go through the app). |
| 58 | +- **Listing pagination.** Both honor `MaxKeys` + continuation tokens, but the page sizes they prefer differ. Don't assume an exact count per page. |
| 59 | + |
| 60 | +Run the integration tests against both backends periodically. Document new gotchas here as they surface. |
| 61 | + |
| 62 | +## Repo filesystem storage |
| 63 | + |
| 64 | +### Layout |
| 65 | + |
| 66 | +``` |
| 67 | +<root>/ |
| 68 | + <shard>/ # first 2 chars of lowercased owner ('_'-padded if shorter) |
| 69 | + <owner>/ |
| 70 | + <name>.git |
| 71 | +``` |
| 72 | + |
| 73 | +Two-character shard gives 1296 buckets — enough that no shard exceeds tens of thousands of subdirectories at our scale. Reversible (the shard is *derived*, not stored separately) and debuggable. We deliberately avoid hash-based sharding because it scatters related entries. |
| 74 | + |
| 75 | +`<root>` defaults to `/data/repos` and MUST live on the production block-storage volume — NOT the droplet root disk. The root disk is small and resets on droplet rebuilds. |
| 76 | + |
| 77 | +### Path validation rules (the security boundary) |
| 78 | + |
| 79 | +Owner and repo names must match `^[a-z0-9](?:[a-z0-9-]{0,37}[a-z0-9])?$`: |
| 80 | + |
| 81 | +- Lowercase ASCII letters, digits, hyphens only. |
| 82 | +- Cannot start or end with `-`. |
| 83 | +- Length 1..39 (matches GitHub username constraint). |
| 84 | +- No `..`, no leading `.`, no `/`, no absolute paths, no whitespace, no NUL bytes. |
| 85 | +- Display casing is a DB concern; path casing is normalized to lowercase here. |
| 86 | + |
| 87 | +Anything that fails the whitelist returns `ErrInvalidPath` with a precise reason. `RepoFS.Delete` and `RepoFS.Move` additionally guard against paths that resolve outside `<root>` (`ErrEscapesRoot`). |
| 88 | + |
| 89 | +### Default branch |
| 90 | + |
| 91 | +Every `git init --bare` invoked through `InitBare` uses `--initial-branch=trunk`. There is no path through this package that creates a bare repo with a different branch. Verified via `git symbolic-ref HEAD` returning `refs/heads/trunk` in `TestInitBare_HEADIsTrunk`. |
| 92 | + |
| 93 | +### Atomic operations |
| 94 | + |
| 95 | +`WriteAtomic(path, src)` writes to a tempfile (`.<basename>.tmp.<hex>`) in the **same directory**, fsyncs, then renames. A crash between write and rename leaves the temp file behind (callers may sweep these on startup) but never a partial file at the destination. `<root>` and any temp dir used for atomic ops MUST live on the same mount — `/data/repos/` and the temp space both live under `/data/`. |
| 96 | + |
| 97 | +`Move(old, new)` refuses to overwrite an existing destination, returning `ErrAlreadyExists`. This avoids silent corruption on concurrent moves; the loser surfaces a clear error. |
| 98 | + |
| 99 | +### Future: symlinks inside repos |
| 100 | + |
| 101 | +When tooling lands that walks repo *contents* (S17 code tab, S37 backup), it MUST use `O_NOFOLLOW` or equivalent to avoid traversing symlinks out of the repo. No content traversal happens in S04, but the constraint is captured here. |
| 102 | + |
| 103 | +## Configuration |
| 104 | + |
| 105 | +All storage settings flow through `internal/infra/config` (see `docs/internal/config.md`): |
| 106 | + |
| 107 | +| Key | Type | Default | Notes | |
| 108 | +|---|---|---|---| |
| 109 | +| `storage.repos_root` | string | `/data/repos` | Filesystem root for bare repos. Required. | |
| 110 | +| `storage.s3.endpoint` | string | `""` | Host[:port], no scheme. Empty disables S3. | |
| 111 | +| `storage.s3.region` | string | `us-east-1` | Region for SigV4 signing. | |
| 112 | +| `storage.s3.access_key_id` | string | `""` | | |
| 113 | +| `storage.s3.secret_access_key` | string | `""` | Redacted by `config print`. | |
| 114 | +| `storage.s3.bucket` | string | `""` | Single bucket per environment. | |
| 115 | +| `storage.s3.use_ssl` | bool | `false` | True for Spaces, false for local MinIO. | |
| 116 | +| `storage.s3.force_path_style` | bool | `true` | True for MinIO, false for Spaces. | |
| 117 | + |
| 118 | +If any S3 field is set, **all** required fields (endpoint, bucket, access_key_id, secret_access_key) must be set — `Validate` rejects partial configuration. |
| 119 | + |
| 120 | +## Operational helpers |
| 121 | + |
| 122 | +### `shithubd storage check` |
| 123 | + |
| 124 | +Exits 0 when: |
| 125 | + |
| 126 | +1. `storage.repos_root` exists, is a directory, and is writable (verified by creating + removing a probe file). |
| 127 | +2. PUT and GET round-trip successfully against the configured S3 bucket. |
| 128 | + |
| 129 | +When the S3 block is unconfigured, only (1) is checked — output makes the skip explicit. Used in deploy smoke tests and from operator terminals. |
| 130 | + |
| 131 | +```sh |
| 132 | +make storage-check |
| 133 | +# or: |
| 134 | +./bin/shithubd storage check |
| 135 | +``` |
| 136 | + |
| 137 | +### `make dev-storage` / `make dev-storage-down` / `make dev-storage-reset` |
| 138 | + |
| 139 | +Brings up MinIO via docker-compose, seeds the `shithub-dev` bucket via the `minio-init` one-shot, and prints the API/console URLs. Credentials are **non-default** even in dev — MinIO's defaults (minioadmin/minioadmin) are insecure. |
| 140 | + |
| 141 | +```sh |
| 142 | +make dev-storage |
| 143 | +# MinIO S3 API: http://127.0.0.1:9000 console: http://127.0.0.1:9001 |
| 144 | +# Credentials: shithub-dev / shithub-dev-secret-please-change |
| 145 | +``` |
| 146 | + |
| 147 | +## Quotas |
| 148 | + |
| 149 | +`Quota{Used, Limit}` is the placeholder type. S04 wires the type only — enforcement lives in a future policy package called from the push pipeline (S14) and attachment uploads. `Limit == 0` means unlimited. `WouldExceed(n)` and `Available()` give callers a uniform interface. |
| 150 | + |
| 151 | +When the `users` and `orgs` tables grow `disk_quota_used`/`disk_quota_limit` columns (S05/S09), this struct is the marshal target. |
| 152 | + |
| 153 | +## Testing |
| 154 | + |
| 155 | +- **Unit tests** (`*_test.go`) run with `go test ./internal/infra/storage/...` — no external dependencies. |
| 156 | + - Path-validation table covers `..`, absolute paths, leading/trailing dash, dotfiles, uppercase, unicode, length, NUL/newline, slash, punctuation. |
| 157 | + - WriteAtomic crash-survival via fault-injection reader — destination must not exist after a partial write, and no temp file may leak. |
| 158 | + - InitBare verifies `HEAD` resolves to `refs/heads/trunk`. |
| 159 | + - Memory store covers Put/Get/Stat/Delete/List (recursive + delimited)/SignedURL/IfNoneMatch/large-body round-trip. |
| 160 | +- **S3 integration tests** are in `s3_test.go` and gate on `SHITHUB_TEST_S3_ENDPOINT` (and the matching credentials). They skip cleanly when the env var is empty. CI sets these via the MinIO compose service. |
| 161 | + |
| 162 | +```sh |
| 163 | +SHITHUB_TEST_S3_ENDPOINT=127.0.0.1:9000 \ |
| 164 | +SHITHUB_TEST_S3_ACCESS_KEY_ID=shithub-dev \ |
| 165 | +SHITHUB_TEST_S3_SECRET_ACCESS_KEY=shithub-dev-secret-please-change \ |
| 166 | +SHITHUB_TEST_S3_BUCKET=shithub-dev \ |
| 167 | +go test ./internal/infra/storage/... |
| 168 | +``` |
| 169 | + |
| 170 | +## Related docs |
| 171 | + |
| 172 | +- `docs/internal/config.md` — configuration loader and env var conventions. |
| 173 | +- `docs/internal/observability.md` — metrics around storage will land in S14 (push pipeline). |