Commits

trunk
Switch branches/tags
All users
Until May 8, 2026
May 2026
Su Mo Tu We Th Fr Sa
26 27 28 29 30 1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31 1 2 3 4 5 6

Commits on May 8, 2026

  1. Populate rank-remap pointer descriptor for section RHS
    F2018 §10.2.2.3: rank-remap pointer assignment
    
      real(sp), pointer :: tau(:)
      real(sp), target  :: q(5, 5)
      tau(1:k) => q(1:k, 1)
    
    lower_rank_remap_pointer_assignment used to require RHS = bare Name
    and bail on any FunctionCall (section/element) — the pointer
    descriptor never got its base_addr, rank, or extents populated.
    Subsequent 'geqrf(..., tau, ...)' (assumed-size dummy 'tau(*)') then
    received tau.base_addr = NULL, and slarfg's '*tau = ...' SEGV'd at
    depth.  Surfaced as SEGVs across stdlib's qr/eig/schur cluster:
    example_qr, example_qr_space, example_pivoting_qr*, example_eig*,
    example_schur*.
    
    Extend the source-shape match to handle FunctionCall (section
    designator on a Name): convert each Range(start:..) into Element(start),
    compute the address of the FIRST included element via
    lower_array_element_addr, and use that as the descriptor's base_addr
    with the target's bounds.
    mfwolffe committed
  2. Honor type_spec in reshape ArrayConstructor lowering
    F2018 §7.8: a typed array constructor '[T :: ...]' has element type T
    regardless of the element expressions' types. The reshape lowering at
    lower_reshape_array_expr_descriptor inferred elem_ty solely via
    first_array_constructor_type_info, which examines the first value
    expression. For 'reshape([real(dp) :: 1, 2, 3, 4], [2, 2])' the values
    are integer literals (4 bytes), so the materialised descriptor was
    elem_size=4 instead of 8.
    
    The malformed elem_size propagated through the reshape result; when
    passed to an assumed-shape dummy 'a(:,:)' and used as SOURCE= in an
    ALLOCATE, afs_prepare_array_copy saw 'dest.elem_size != source.elem_size'
    (8 != 4), freed the freshly-allocated dest buffer, zeroed base_addr,
    and the next read of 'amat(1,1)' SEGV'd. Surfaced across stdlib's det /
    determinant / eig / qr clusters whose examples invoke
    'det(reshape([real(dp)::1,2,3,4], [2,2]))'.
    
    Consult type_spec first; fall back to first-element inference only when
    no type_spec is present.
    mfwolffe committed
  3. Accept non-allocatable source in afs_prepare_array_copy
    F2018 §9.7.1.2: SOURCE-expr in ALLOCATE need only be a value of the
    right type/kind/shape — it doesn't have to itself be an ALLOCATABLE.
    The common stdlib pattern is
    
        pure module function det(a) result(d)
          real(dp), intent(in) :: a(:,:)        ! assumed-shape dummy
          real(dp), allocatable :: amat(:,:)
          allocate(amat(size(a,1), size(a,2)), source=a)
    
    afs_prepare_array_copy required both dest.is_allocated() AND
    source.is_allocated().  Assumed-shape dummies carry flags=CONTIGUOUS
    only — they're bound to the caller's data, not owned — so
    source.is_allocated() returned false, the routine freed the
    fresh dest buffer, zeroed dest.base_addr, and the next read of
    amat(1,1) faulted.  Surfaced as SEGV across stdlib's det / determinant
    / eig / qr / lstsq / solve_chol / solve_custom clusters.
    
    Replace source.is_allocated() with !source.base_addr.is_null():
    the source is valid as long as it points to data, regardless of
    whether it owns it.
    mfwolffe committed

Commits on May 7, 2026

  1. Extract base from descriptor actual passed to bare-pointer dummy
    lower_arg_by_ref_full's tail path evaluates the actual via
    lower_expr_full and returns the value as-is when it's a pointer.  But
    for shapes that lower_expr_full returns as a 384-byte descriptor
    (array sections, array binops, array-result intrinsics), the callee —
    which under the assumed-size / explicit-shape ABI expects a bare element
    pointer — would read the descriptor's first 8 bytes (= base_addr field)
    as if they were the array's first element.  Empirically this surfaced
    post-db04b9d as bounds-check failures of the form 'index <huge> outside
    [1, n]' when stdlib's solve/lapack_getrf chain was rebuilt with the
    WIP-aware ABI.
    
    Detect Ptr<[i8; 384]> at the tail and load through to extract the
    descriptor's base_addr before returning it as the bare-pointer arg.
    mfwolffe committed
  2. Re-apply assumed-size bare-pointer ABI lost in lower.rs split
    The lower.rs → lower/core.rs split (71a0cc3) silently dropped commit
    db04b9d's fix when the file was extracted: ArraySpec::AssumedSize was
    re-added to the descriptor-using set in arg_uses_descriptor_from_decls.
    Per F2018 §15.5.2.4 an assumed-size dummy 'a(lda,*)' is passed as a
    bare element pointer; flagging it as descriptor-bearing made every
    'a(i,j)' reference go through array_descriptor_addr → descriptor base,
    yielding descriptor_base + 16 (= upper-half of base_addr field) instead
    of the actual element. Restores db04b9d and adds an explanatory comment
    so a future refactor doesn't drop it again.
    mfwolffe committed
  3. Merge pull request #25 from FortranGoingOnForty/compiler-edges
    compiler-edges: stdlib hash unblocker + descriptor/storage drill (10 commits)
    Matthew Forrester Wolffe committed
  4. Reject non-array-intrinsic callees at top of lower_array_intrinsic
    lower_array_intrinsic dispatched on `name` with a match late in the
    function but materialised the first arg's descriptor (alloca + memset
    + afs_create_section) unconditionally before that match — so calling
    it on a user-procedure name would still emit a 384-byte throwaway
    descriptor before returning None.
    
    expr.rs's FunctionCall handler reaches lower_array_intrinsic from
    two places (the `!has_named_interface` arm and the post-generic-
    resolve fallback), so for any non-generic non-intrinsic call (e.g.
    `pick(key(0:))`) each section actual lowered THREE times: twice as
    unused descriptors emitted before this dispatcher returned None,
    and once for the legitimate ref_arg_vals descriptor passed to the
    real call.
    
    Bail at the top of lower_array_intrinsic when `name` is not one of
    the array intrinsics it actually handles
    (size/lbound/ubound/shape/allocated/sum/product/maxval/minval/
    maxloc/minloc/matmul/dot_product/transpose/huge/tiny/epsilon/
    precision/range/digits/norm2).
    
    Result on stdlib_hash_32bit_water.f90 (cumulative with eeee0e5 +
    0592c14):
      trunk baseline:  123 GB peak (OOM, uncompilable)
      after eeee0e5:    26.6 GB,  18.4 s
      after 0592c14:    922 MB,    1.49 s
      after this fix:    35 MB,    0.22 s    (~3500x lighter than trunk)
    afs_create_section calls in -S: 5647 → 982 → 126.
    
    cli_driver 579/579 PASS.  Regression test threshold tightened from
    60 to 24 emissions for an 8-section fixture (observed-good: 14).
    mfwolffe committed
  5. Skip resolution and intrinsic arg-probes when no caller can use them
    expr.rs's FunctionCall lowering eagerly built two probe-arg-vec slices
    before any branch decision:
    * resolution_arg_vals — only consumed by resolve_generic_call_actuals
      (which short-circuits to None if the callee isn't a NamedInterface)
      and the structure-ctor fallback (gated on has_named_interface);
    * intrinsic_arg_vals — only consumed by lower_intrinsic, which matches
      on a fixed set of intrinsic names and returns None otherwise.
    
    For a non-generic non-procptr non-intrinsic callee both slices are
    discarded, but lowering each arg cost a section-descriptor
    materialization (alloca + memset + afs_create_section) per array
    section actual.  Inside nested intrinsic chains that compounded
    multiplicatively with the third "real" lowering in ref_arg_vals.
    
    stdlib_hash_32bit_water.f90's water_hash inner loop —
    `ieor(waterr32(key(i:)), waterp1)` repeated 4 times across 16 SELECT
    CASE arms — produced 5647 `afs_create_section` calls and a 26 GB
    compile peak.  Gate the resolution probe behind has_named_interface ||
    procptr_target.is_some(), and route the intrinsic probe through
    sema::validate::is_intrinsic_name(&key).
    
    Result on stdlib_hash_32bit_water.f90:
    * compile peak: 26.6 GB → 922 MB (~28x)
    * wall time:    18.4s → 1.49s (~12x)
    * afs_create_section calls in -S: 5647 → 982 (~5.7x)
    
    Combined with the earlier eeee0e5 the same file is now 133x lighter
    than its trunk-baseline 123 GB peak.  cli_driver 578/578 PASS.
    
    Adds an asm-level regression test that compiles a fixed-shape
    ieor/user-call chain and asserts the per-source-section descriptor
    emission ratio stays below the multiplicative-probe regime.
    mfwolffe committed
  6. Type ComplexBuffer ABI return temp as [fN x 2] so binops see complex
    The caller-side hidden-output buffer for ComplexBuffer ABI returns was
    allocated as `[i8 x 8]` / `[i8 x 16]`, making the call's return
    value `Ptr<[i8 x N]>`.  is_complex_ty only recognises `[fN x 2]`
    or `Ptr<[fN x 2]>`, so for `complex_local - complex_call(...)` the
    binop's complex-arithmetic branch did not fire — execution fell to
    the int/float promotion path and emitted `fsub %ptr<[i8 x 8]>` against
    the buffer pointer.  IR-verify rejected with `float op has non-float
    operand : ptr<[i8 x 8]>`, blocking stdlib_lapack_solve_chol_comp's
    CPOTF2/ZPOTF2:
        ajj = real( real(a(j,j),sp) - cdotc(...), sp )
    
    Type the alloca as `[fN x 2]` (sized by the existing kind-aware
    hidden_result_temp_bytes_for_callee), with N=4 for sp (8 bytes) and
    N=8 for dp (16 bytes).  Both the Name-callee and the type-bound
    Component-callee paths get the same treatment.  is_complex_ty,
    materialize_complex_operand, and the binop branch all then recognise
    the buffer as a complex pair and emit lane-wise fadd/fsub/fmul
    correctly.
    mfwolffe committed
  7. Memcpy ComplexBuffer-ABI return into ALLOCATE source= scalar slot
    emit_scalar_allocate_source_init_on_success used to pipe the source
    expression through coerce_to_type(raw, dest_ty) → b.store(coerced,
    dest_base).  When the source is a scalar complex(sp/dp) function call,
    lower_expr returns a pointer to the ComplexBuffer the callee wrote
    into — typed Ptr<[f32/f64 x 2]>, Ptr<[i8 x 8/16]>, or bare Ptr<i8>.
    coerce_to_type has no Ptr→Array path so it returned the pointer
    unchanged, and b.store then tried to write a pointer-sized value
    into a [f32/f64 x 2] slot.  IR-verify rejected the store with
    `value type ptr<[i8 x 8]> doesn't match pointee type [f32 x 2]`,
    leaving stdlib_stats_moment_mask uncompilable
    (`allocate(mean_, source = mean(x, 1, mask))` where mean returns
    scalar complex(sp) — the same shape across roughly a dozen routines).
    
    Fix recognises the buffer-pointer return at the source-init site and
    memcpys the lane pair from the buffer to the freshly allocated
    destination slot, parallel to the assignment-from-complex-call path
    elsewhere.  Regression test gates compilation only — the runtime
    path through allocatable complex scalars hits a separate
    pre-existing bug in afs_assign_allocatable / real(m_) reads on
    complex allocatables that surfaces once IR-verify is no longer
    masking it.
    mfwolffe committed
  8. Skip redundant lower_array_expr_descriptor for scalar user-function probes
    generic_dispatch_probe_value already calls array_function_result_elem_type
    at the top.  When that returns None for a Name(callee) FunctionCall and
    the callee is neither a transformational array intrinsic (pack/reshape/
    sum/merge/matmul/transpose/conjg/aimag/abs/cmplx/shape/transfer/dimag)
    nor a local array, the subsequent lower_array_expr_descriptor call just
    walks named-intrinsic match arms (all miss), then redundantly calls
    array_function_result_elem_type a second time inside
    lower_array_function_result_descriptor, only to return None.  That
    second invocation recursively probes args, and the arg-probes themselves
    re-run lower_array_expr_descriptor — O(2^depth) for nested calls.
    
    stdlib_hash_32bit_water.f90's water_hash inner loop nests four user
    function calls deep across 16 SELECT CASE arms; the compile peak ran
    to ~123 GB and the kernel SIGKILL'd the process under memory pressure.
    
    Skipping the redundant path here drops the same compile to ~26 GB peak
    (still high, but no longer triggers OOM).  The
    internal_subprogram_call_under_intrinsic_under_user_call_keeps_mangled_name
    regression remains green: lower_expr_full at the bottom of the probe
    does the real evaluation with internal_funcs threaded through, so
    internal CONTAINS-block subprograms keep their mangled link names.
    mfwolffe committed
  9. Use column-major running strides in afs_allocate_like_with_elem_size
    Setting every dim's stride to 1 worked for the dim[0]-only flat
    iteration paths but collided byte offsets in every per-dim consumer.
    For `m = (y > 3.)` over `real :: y(2,3)` the flat compare loop wrote
    m[0..5] correctly (it iterates over dim[0].stride * elem_size), but
    the masked sum-along-dim helper indexed mask via
    `Σ idx_d * dim[d].stride * elem_size` and the all-1 strides made
    distinct (i,j) tuples land on the same byte — only 4 of 6 mask bytes
    were actually consulted, so `sum(y, 1, y > 3.)` quietly dropped the
    column-2 mask hit and returned [0, 0, 6] instead of [0, 4, 11].
    Match the column-major running stride convention that
    `materialize_array_descriptor_for_info` already uses.
    mfwolffe committed
  10. Add masked sum-along-dim helpers and route pack mask through mask_byte_is_true
    Adds `afs_array_sum_real8_dim_mask` and `afs_array_sum_int_dim_mask`,
    plus a shared `for_each_reduce_along_dim_with_mask` traversal that
    honors both source and mask per-dim strides, and a small
    `mask_byte_is_true(mask, byte_off)` helper that dispatches on the
    mask's `elem_size` (1, 2, 4, or 8 — `logical(int8)` through
    `logical(int64)` and the default 1-byte bool storage all reach the
    same predicate).
    
    `afs_array_pack` previously read the mask via a fixed `as *const i32`
    load, which crashed with a misaligned dereference once
    `elem_size=1` started flowing through for default logical arrays —
    switched to `mask_byte_is_true` so it works for every kind.
    mfwolffe committed
  11. Align logical descriptor elem_size with bool storage and route sum(dim,mask)
    Two coupled changes:
    
    1. `ir_scalar_byte_size` and `descriptor_element_size_bytes` now
       return 1 for `IrType::Bool`, matching `IrType::Bool::size_bytes()`
       and the bytes-per-element layout that `alloca [Bool x N]` actually
       produces. The previous 4-byte report only made sense when the
       scalar type was widened to a 4-byte slot, which never happened —
       storage stayed at 1 byte. The mismatch silently broke every
       consumer of the descriptor's `elem_size` for logical arrays:
       `mask_at`, `afs_array_sum_real8_mask`, the new `_dim_mask`
       helpers, and the whole-array broadcast loop all stepped 4×
       past real data.  `sum(y, mask=m)` returned the unmasked sum,
       `sum(y, dim, mask)` returned the unmasked column sums, and
       `arr(i) = .true.` for `logical :: arr(N)` wrote 3 bytes past
       the slot. The whole-array broadcast loop now uses
       `ir_scalar_byte_size` directly so they stay paired.
    
    2. `lower_array_sum_dim_descriptor` no longer bails when a mask is
       present; it lowers the mask actual into a descriptor and
       dispatches to `afs_array_sum_{real8,int}_dim_mask`. Surfaced in
       `example_var`'s `var(y, 1, y > 3.)` line, which previously fell
       through to a scalar broadcast and crashed in
       `afs_assign_allocatable` with a misaligned source pointer.
    mfwolffe committed
  12. Merge pull request #24 from FortranGoingOnForty/regalloc-phi-resolution
    Refuse to split phi-like vregs in linearscan; remove regalloc gates
    Matthew Forrester Wolffe committed