Hypothesis 011-b: L_1 edge-level message passing on COLLAB (triangle-rich graphs)

Status. Preregistered 2026-05-25. Smoke test (1 seed, 1 epoch) completed on container: L_1 0.668 vs MLP 0.520 (directional only, not a claim). Full 18-seed run timed out on GitHub Actions (6h limit exceeded). Awaiting local execution on higher-compute hardware.

Falsification target. Whether L_1 edge-level message passing provides a classification advantage on a dataset with rich triangle structure, where the up-Laplacian component ∂_2 ∂_2^T is non-trivial.

Prior result. H011 on NCI1 was uninformative: 96% of NCI1 graphs have 0 triangles, so L_1 degenerates to the down-Laplacian. A proper test requires a dataset where L_1’s shared-triangle adjacency has signal to propagate.

Why COLLAB. 5000 scientific-collaboration ego-network graphs, 3 classes (High Energy Physics, Condensed Matter, Astrophysics). Mean 9,290 triangles per graph. 100% of graphs have triangles. No node features (degree used as 1-dim input). Classification depends entirely on graph structure — exactly the setting where higher-order topology should matter if it matters anywhere.

1. Design

Same architecture as H011, applied to COLLAB:

Arm	Operator	Level	Residual
`l1-hodge-residual`	L_1 (edge Laplacian with up-component)	Edges	External
`hodge-mp-residual`	L_0 (node Laplacian)	Nodes	External
`gin-residual`	I - L_tilde (normalised adjacency)	Nodes	External
`mlp-baseline`	None	Nodes	N/A

All arms use degree as the 1-dim node feature (input_dim=1).

2. Preregistered sub-hypotheses

ID	Sub-hypothesis	Prediction	Rationale	Falsified if
H51	l1-hodge-residual outperforms mlp-baseline on COLLAB	p_BH < 0.05	COLLAB has no node features — structure IS the signal. L_1 accesses triangle-level structure that MLP (operating on degree alone) cannot.	p_BH >= 0.05
H52	l1-hodge-residual outperforms hodge-mp-residual (L_0) on COLLAB	p_BH < 0.05	COLLAB is triangle-rich; L_1’s up-Laplacian component provides structural signal beyond node-level adjacency. This is the core test of higher-order Hodge theory.	p_BH >= 0.05
H53	l1-hodge-residual outperforms gin-residual on COLLAB	p_BH < 0.05	Same reasoning as H52 — L_1 encodes triangle co-boundary structure inaccessible to any L_0-based method.	p_BH >= 0.05

3. Outcome decision tree

Pattern	Interpretation
H51 + H52 + H53 confirmed	Higher-order Hodge structure provides unique classification signal on triangle-rich graphs. L_1 captures structural information that L_0-based methods (Hodge, GIN) cannot access. This is the vindication of the Hodge approach — the value is in L_k for k >= 1, not in L_0.
H51 confirmed, H52/H53 refuted (L_1 beats MLP but not L_0)	L_1 captures structural signal, but L_0-based methods already capture it. The triangle-level information is redundant with node-level neighbourhood information on COLLAB.
H51 refuted (L_1 does not beat MLP on COLLAB)	Edge-level message passing with degree features fails on COLLAB. Possible causes: 1-dim degree input is insufficient, edge-to-graph pooling loses discrimination, or the L_1 computation on dense graphs is numerically unstable.

4. Experimental design

Dataset: COLLAB (5000 graphs, 3 classes), 1-dim degree features.
Models: l1-hodge-residual, hodge-mp-residual, gin-residual, mlp-baseline.
Seeds: 30.
Epochs: 10.
Optimiser: Adam(lr=1e-2).
Hidden dim: 32.
Note: COLLAB graphs are denser than NCI1 (mean 873 edges vs 32). Per-graph L_1 computation takes 0.02-0.13s. Estimated total wall time: ~8-12 hours on CPU.

5. Reproduction

python -m benchmarks.hodge \
  --datasets collab \
  --models l1-hodge-residual hodge-mp-residual gin-residual mlp-baseline \
  --seeds 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 \
  --n-epochs 10 \
  --output notebooks/results/h011b_collab_l1_30seeds.json \
  --markdown notebooks/results/h011b_collab_l1_30seeds.md