fortrangoingonforty/afs-ld / 73c9b05

Browse files

Wire performance closeout gate

Authored by mfwolffe <wolffemf@dukes.jmu.edu>
SHA
73c9b0500099e8b8e22e388a00f5afb246561cb5
Parents
43a662a
Tree
ac3de7e

3 changed files

StatusFile+-
M .docs/sprints/sprint28.md 31 12
M .github/workflows/parity-matrix.yml 6 0
M tests/perf_baseline.rs 76 6
.docs/sprints/sprint28.mdmodified
@@ -4,13 +4,20 @@
44
 Sprint 27 — correctness gate in place; can freely refactor for speed.
55
 
66
 ## Goals
7
-Make afs-ld fast enough to feel like a production tool. Target: within 2× of Apple `ld`'s wall time on the fortsh link. Mold demonstrates linkers can be very fast; we don't need mold's speed, but we need to not be painful.
7
+Make afs-ld fast enough to feel like a production tool. Sprint 28 establishes
8
+the profiling surface, parallelizes the obvious hot paths, and enforces
9
+hello/runtime-link budgets in CI. The fortsh 2× Apple `ld` gate remains the
10
+production target, but Sprint 29 owns the fortsh fixture and final comparison.
11
+Mold demonstrates linkers can be very fast; we don't need mold's speed, but we
12
+need to not be painful.
813
 
914
 ## Deliverables
1015
 
1116
 ### 1. Baseline profile
1217
 
13
-Profile the fortsh link (Sprint 29 produces the fixture). Categorize wall time:
18
+Profile representative hello-world and runtime-archive links in Sprint 28.
19
+Sprint 29 extends the same profile surface to the fortsh link once the fixture
20
+exists. Categorize wall time:
1421
 
1522
 - Input parsing (Mach-O headers, sections, symbols, relocations).
1623
 - Symbol resolution (hash-map probes, archive lookups).
@@ -37,11 +44,17 @@ One thread per 4 KiB page. SHA-256 is inherently sequential within a page but tr
3744
 
3845
 ### 5. Bump allocator for ephemeral data
3946
 
40
-Parser produces many small allocations (strings, reloc lists, atom descriptors). A per-input arena avoids fragmentation and makes bulk drop free. Implement as `src/arena.rs` — a std-only `Vec<Box<[u8]>>` chunker.
47
+Deferred. The current Sprint 28 profile work did not prove allocation churn is
48
+the next limiting bucket after the parallel parsing/relocation/signature and
49
+string-table clone fixes. If Sprint 29's fortsh profile shows parser allocation
50
+pressure, implement `src/arena.rs` as a std-only `Vec<Box<[u8]>>` chunker.
4151
 
4252
 ### 6. mmap for large inputs
4353
 
44
-`std::fs::File` + `memmap2`? No — memmap2 is an external crate. Use `libc::mmap` via an unsafe `src/mmap.rs` wrapper. Input files are always read-only; mmap saves a read syscall and lets us share parse state across threads cheaply. Fall back to `fs::read` for GNU-thin archive members whose external path doesn't mmap cleanly (rare).
54
+Deferred. Object/archive loading still uses `fs::read`; this keeps the Sprint 28
55
+closeout safe and std-only. If fortsh-sized inputs show file-read overhead as a
56
+real bucket in Sprint 29, add an unsafe `src/mmap.rs` wrapper and keep a
57
+`fs::read` fallback for archive members whose external path cannot be mapped.
4558
 
4659
 ### 7. Symbol-table hash map
4760
 
@@ -49,7 +62,10 @@ Profile shows std `HashMap` is fine for our scale. If not: replace with an open-
4962
 
5063
 ### 8. String interner
5164
 
52
-Single global `StringInterner` shared across inputs. Interning cost: one hash lookup per name. Optimize by batching per-input: each input parses its strings into a local table, then merges into the global interner in one pass.
65
+Deferred. Sprint 28 made the global string table thread-shareable and removed
66
+the cloned string-table offset map during output writing. Per-input local
67
+interners remain a candidate if Sprint 29 identifies symbol seeding as a
68
+fortsh-scale bottleneck.
5369
 
5470
 ### 9. No-alloc hot paths
5571
 
@@ -57,15 +73,16 @@ Reloc application and chain construction should not allocate per-reloc. Prealloc
5773
 
5874
 ### 10. Benchmarks
5975
 
60
-`afs-ld/bench/` (or a `#[bench]` behind `cargo +nightly bench`) with:
61
-- `bench_hello_world`: small, measures startup overhead.
62
-- `bench_runtime_link`: mid, measures symbol-table & reloc-apply.
63
-- `bench_fortsh_link`: large, measures end-to-end throughput.
76
+Sprint 28 uses CI-enforced integration benchmarks in `tests/perf_baseline.rs`:
77
+
78
+- `bench_hello_world_profile_reports_baseline_timings`: small, measures startup overhead.
79
+- `bench_runtime_link_profile_reports_baseline_timings`: mid, measures symbol-table, archive parsing, and reloc-apply.
80
+- `bench_fortsh_link`: deferred to Sprint 29 with the real fortsh fixture.
6481
 
6582
 Budget targets:
6683
 - hello-world: ≤ 20 ms.
6784
 - runtime link: ≤ 150 ms.
68
-- fortsh link: ≤ 2× Apple `ld`'s wall time on the same machine.
85
+- fortsh link: ≤ 2× Apple `ld`'s wall time on the same machine, enforced in Sprint 29.
6986
 
7087
 ### 11. Determinism preserved
7188
 
@@ -73,14 +90,16 @@ Parallelism must not reorder output. Each worker produces a deterministic result
7390
 
7491
 ## Testing Strategy
7592
 
76
-- Benchmarks land as regression gates: nightly CI records throughput; > 10% regression fails.
93
+- Benchmark gate: CI runs `tests/perf_baseline.rs` with hello/runtime budgets on every push and PR.
94
+- Nightly throughput recording and a relative >10% regression gate are deferred until the fortsh fixture lands in Sprint 29.
7795
 - Determinism: 100 parallel runs of the same input, assert byte-identical output every time.
7896
 - Sprint 27 parity must remain green — no correctness regression.
7997
 - Single-threaded fallback (`-j 1`) for debugging.
8098
 
8199
 ## Definition of Done
82100
 
83
-- fortsh link wall time within 2× of `ld`'s.
101
+- hello/runtime performance budgets are enforced in CI.
102
+- fortsh 2× comparison is explicitly handed to Sprint 29 with its fixture.
84103
 - All Sprint 27 scenarios still byte-identical.
85104
 - Determinism bulletproof across parallelism.
86105
 - No external dependencies added.
.github/workflows/parity-matrix.ymlmodified
@@ -26,6 +26,12 @@ jobs:
2626
       - name: Run determinism gate
2727
         run: cargo test --test determinism -- --nocapture
2828
 
29
+      - name: Run performance budget gate
30
+        env:
31
+          AFS_LD_HELLO_BUDGET_MS: "20"
32
+          AFS_LD_RUNTIME_BUDGET_MS: "150"
33
+        run: cargo test --test perf_baseline -- --nocapture
34
+
2935
       - name: Run parity matrix
3036
         env:
3137
           PARITY_MATRIX_ARTIFACT_DIR: ${{ github.workspace }}/parity-matrix-artifacts
tests/perf_baseline.rsmodified
@@ -1,10 +1,14 @@
1
+use std::fs;
12
 use std::path::{Path, PathBuf};
3
+use std::process::Command;
24
 use std::time::Duration;
35
 
46
 mod common;
57
 
68
 use afs_ld::{LinkOptions, LinkProfile, Linker};
7
-use common::harness::{assemble, have_xcrun, have_xcrun_tool, scratch, sdk_path, sdk_version};
9
+use common::harness::{
10
+    assemble, have_tool, have_xcrun, have_xcrun_tool, scratch, sdk_path, sdk_version,
11
+};
812
 
913
 fn find_runtime_archive() -> Option<PathBuf> {
1014
     let workspace = Path::new(env!("CARGO_MANIFEST_DIR")).join("..");
@@ -20,6 +24,69 @@ fn find_runtime_archive() -> Option<PathBuf> {
2024
     None
2125
 }
2226
 
27
+fn runtime_archive_fixture() -> Result<PathBuf, String> {
28
+    if let Some(runtime) = find_runtime_archive() {
29
+        return Ok(runtime);
30
+    }
31
+    build_synthetic_runtime_archive()
32
+}
33
+
34
+fn build_synthetic_runtime_archive() -> Result<PathBuf, String> {
35
+    if !have_tool("libtool") {
36
+        return Err("libtool unavailable".into());
37
+    }
38
+
39
+    let members = [
40
+        ("init", "_afs_program_init"),
41
+        ("finalize", "_afs_program_finalize"),
42
+        ("write_i32", "_afs_write_i32"),
43
+        ("write_f64", "_afs_write_f64"),
44
+        ("write_newline", "_afs_write_newline"),
45
+        ("read_i32", "_afs_read_i32"),
46
+        ("alloc", "_afs_alloc"),
47
+        ("dealloc", "_afs_dealloc"),
48
+        ("bounds_check", "_afs_bounds_check"),
49
+        ("stop", "_afs_stop"),
50
+        ("date_and_time", "_afs_date_and_time"),
51
+        ("cpu_time", "_afs_cpu_time"),
52
+        ("random_seed", "_afs_random_seed"),
53
+        ("random_number", "_afs_random_number"),
54
+        ("open_unit", "_afs_open_unit"),
55
+        ("close_unit", "_afs_close_unit"),
56
+    ];
57
+    let mut objects = Vec::with_capacity(members.len());
58
+    for (stem, symbol) in members {
59
+        let obj = scratch(&format!("perf-runtime-{stem}.o"));
60
+        let src = format!(
61
+            "\
62
+            .text\n\
63
+            .globl {symbol}\n\
64
+            .p2align 2\n\
65
+            {symbol}:\n\
66
+                ret\n\
67
+            .subsections_via_symbols\n",
68
+        );
69
+        assemble(&src, &obj)?;
70
+        objects.push(obj);
71
+    }
72
+
73
+    let archive = scratch("libafs-perf-runtime.a");
74
+    let _ = fs::remove_file(&archive);
75
+    let output = Command::new("libtool")
76
+        .args(["-static", "-o"])
77
+        .arg(&archive)
78
+        .args(&objects)
79
+        .output()
80
+        .map_err(|e| format!("spawn libtool archive: {e}"))?;
81
+    if !output.status.success() {
82
+        return Err(format!(
83
+            "libtool archive failed: {}",
84
+            String::from_utf8_lossy(&output.stderr)
85
+        ));
86
+    }
87
+    Ok(archive)
88
+}
89
+
2390
 fn executable_opts(inputs: Vec<PathBuf>, output: PathBuf) -> LinkOptions {
2491
     LinkOptions {
2592
         inputs,
@@ -139,7 +206,7 @@ fn assert_profile_basics(name: &str, profile: &LinkProfile) {
139206
 }
140207
 
141208
 #[test]
142
-fn hello_world_profile_reports_baseline_timings() {
209
+fn bench_hello_world_profile_reports_baseline_timings() {
143210
     if !have_xcrun() || !have_xcrun_tool("ld") {
144211
         eprintln!("skipping: xcrun as/ld unavailable");
145212
         return;
@@ -174,14 +241,17 @@ fn hello_world_profile_reports_baseline_timings() {
174241
 }
175242
 
176243
 #[test]
177
-fn runtime_link_profile_reports_baseline_timings() {
244
+fn bench_runtime_link_profile_reports_baseline_timings() {
178245
     if !have_xcrun() || !have_xcrun_tool("ld") {
179246
         eprintln!("skipping: xcrun as/ld unavailable");
180247
         return;
181248
     }
182
-    let Some(runtime) = find_runtime_archive() else {
183
-        eprintln!("skipping: libarmfortas_rt.a not built");
184
-        return;
249
+    let runtime = match runtime_archive_fixture() {
250
+        Ok(runtime) => runtime,
251
+        Err(reason) => {
252
+            eprintln!("skipping: {reason}");
253
+            return;
254
+        }
185255
     };
186256
 
187257
     let obj = scratch("perf-runtime.o");