markdown · 1170 bytes Raw Blame History

SEO and crawler surfaces

shithub exposes a small, honest crawler surface for the hosted site and self-hosted instances:

  • GET /robots.txt allows public pages and points crawlers at /sitemap.xml.
  • GET /sitemap.xml lists stable public marketing/discovery pages: /, /about, /explore, and /trending.
  • The shared HTML layout emits a page description, canonical URL, Open Graph metadata, Twitter card metadata, and trusted JSON-LD when handlers provide those fields. Missing fields are optional for typed page-data structs.
  • /about is the durable positioning page. It follows the README, SECURITY, CONTRIBUTING, and CODE_OF_CONDUCT posture: GitHub is good software, shithub exists because users should be able to host code without AI training concerns, and the community is AGPL, security-aware, and civil.

Use auth.base_url as the public origin in production. It drives the canonical links generated by crawler endpoints. If it is empty in tests or local dev, handlers fall back to the request host.

Do not add generated search result pages, settings pages, API routes, admin pages, or smart-HTTP .git endpoints to the sitemap.

View source
1 # SEO and crawler surfaces
2
3 shithub exposes a small, honest crawler surface for the hosted site and
4 self-hosted instances:
5
6 - `GET /robots.txt` allows public pages and points crawlers at
7 `/sitemap.xml`.
8 - `GET /sitemap.xml` lists stable public marketing/discovery pages:
9 `/`, `/about`, `/explore`, and `/trending`.
10 - The shared HTML layout emits a page description, canonical URL,
11 Open Graph metadata, Twitter card metadata, and trusted JSON-LD when
12 handlers provide those fields. Missing fields are optional for typed
13 page-data structs.
14 - `/about` is the durable positioning page. It follows the README,
15 SECURITY, CONTRIBUTING, and CODE_OF_CONDUCT posture: GitHub is good
16 software, shithub exists because users should be able to host code
17 without AI training concerns, and the community is AGPL, security-aware,
18 and civil.
19
20 Use `auth.base_url` as the public origin in production. It drives the
21 canonical links generated by crawler endpoints. If it is empty in tests
22 or local dev, handlers fall back to the request host.
23
24 Do not add generated search result pages, settings pages, API routes,
25 admin pages, or smart-HTTP `.git` endpoints to the sitemap.