# SEO and crawler surfaces

shithub exposes a small, honest crawler surface for the hosted site and
self-hosted instances:

- `GET /robots.txt` allows public pages and points crawlers at
  `/sitemap.xml`.
- `GET /sitemap.xml` lists stable public marketing/discovery pages:
  `/`, `/about`, `/explore`, and `/trending`.
- The shared HTML layout emits a page description, canonical URL,
  Open Graph metadata, Twitter card metadata, and trusted JSON-LD when
  handlers provide those fields. Missing fields are optional for typed
  page-data structs.
- `/about` is the durable positioning page. It follows the README,
  SECURITY, CONTRIBUTING, and CODE_OF_CONDUCT posture: GitHub is good
  software, shithub exists because users should be able to host code
  without AI training concerns, and the community is AGPL, security-aware,
  and civil.

Use `auth.base_url` as the public origin in production. It drives the
canonical links generated by crawler endpoints. If it is empty in tests
or local dev, handlers fall back to the request host.

Do not add generated search result pages, settings pages, API routes,
admin pages, or smart-HTTP `.git` endpoints to the sitemap.