Hypothesis 010: Does the high-pass vs low-pass operator distinction predict cross-dataset performance?

Status. Resolved 2026-05-25. H42 directionally confirmed but for the wrong reason (gin-residual wins on MUTAG AND NCI1, not MUTAG-specifically); H43 neither arm distinguishable on PROTEINS; H44 partially confirmed (gap is dataset-dependent in magnitude but not direction); H45 refuted (Hodge loses to MLP on MUTAG even with external residual); H46 refuted (neither arm significantly outperforms MLP on PROTEINS). See §6.

Falsification target. Whether the choice between high-pass (Hodge Laplacian L_tilde) and low-pass (normalised adjacency I - L_tilde) propagation operators produces dataset-dependent classification differences when both arms use external residual. H008-c showed the operators are interchangeable on NCI1 (gin-residual 0.629 vs Hodge 0.609). This experiment tests whether the same holds on MUTAG and PROTEINS, or whether the high-pass/low-pass distinction interacts with dataset-level structural properties.

Prior results motivating this hypothesis.

H008-c: On NCI1, gin-residual (low-pass + external residual) slightly outperforms Hodge (high-pass + external residual) at p_BH = 0.010.
H006: The constant-feature Hodge signal has the rank ordering MUTAG (+0.098) > PROTEINS (+0.088) > NCI1 (+0.071) — the inverse of the full-feature Hodge-vs-MLP gain.
H001: On MUTAG, Hodge-residual (high-pass + external residual) underperforms MLP by 4 pp (p_BH = 0.019). The low-pass variant (gin-residual) has not been tested on MUTAG.

Theoretical context. The normalised Laplacian L_tilde acts as a high-pass filter in the spectral domain: its eigenvalues are in [0, 2], with 0 corresponding to the constant eigenvector (DC component) and 2 corresponding to the maximally oscillating eigenvector. Propagation via L_tilde @ h attenuates low-frequency (smooth) signals and amplifies high-frequency (varying) signals across the graph. Conversely, (I - L_tilde) @ h is a low-pass filter that smooths features across neighbours.

On homophilic graphs (connected nodes share labels/features), low-pass smoothing reinforces the class signal. On heterophilic graphs (connected nodes differ), high-pass filtering preserves the class signal while low-pass smoothing destroys it. The TUDataset benchmarks have varying degrees of structural homophily, which may interact with the filter choice.

1. Design

Run hodge-mp-residual (high-pass) and gin-residual (low-pass) on all three datasets (MUTAG, PROTEINS, NCI1) with external residual on both. MLP baseline as control.

Dataset	Hodge (H008-c)	gin-residual (H008-c)	MLP	New data needed?
NCI1	0.609	0.629	0.523	No (reuse H008-c)
MUTAG	?	?	0.789 (H001)	Yes
PROTEINS	?	?	0.675 (H002)	Yes

2. Preregistered sub-hypotheses

ID	Sub-hypothesis	Prediction	Rationale	Falsified if
H42	gin-residual outperforms Hodge on MUTAG	gin-residual > Hodge (p_BH < 0.05)	MUTAG is strongly homophilic (aromatic rings, functional groups share atom types); low-pass averaging should be more effective than high-pass differencing	p_BH >= 0.05 or Hodge >= gin-residual
H43	gin-residual outperforms Hodge on PROTEINS	Uncertain — PROTEINS may have intermediate homophily	Protein secondary-structure elements (helix/sheet/turn) may or may not be homophilically connected	Hodge strictly beats gin-residual at p_BH < 0.05
H44	The gin-residual vs Hodge gap is dataset-dependent	The operator advantage (gin-residual median - Hodge median) correlates with some dataset-level property	H006 showed graph-structural separability differs across datasets; the operator preference may track this	All three datasets show the same direction and magnitude
H45	Both gin-residual and Hodge outperform MLP on MUTAG with external residual	p_BH < 0.05 for both	The external residual should rescue the Hodge-residual arm’s failure on MUTAG (H001 used 20 epochs; this uses 10, but the residual architecture is the same)	Either arm <= MLP
H46	Both gin-residual and Hodge outperform MLP on PROTEINS with external residual	p_BH < 0.05 for both	H002 showed all arms matched MLP without the operator comparison; the external residual may or may not help	Either arm <= MLP

3. Outcome decision tree

Pattern	Interpretation
H42 confirmed (gin-residual > Hodge on MUTAG), consistent with NCI1	Low-pass is universally better than high-pass at this capacity. The Hodge Laplacian’s high-pass filtering is a disadvantage, not an advantage. The operator distinction exists but favours adjacency averaging.
H42 refuted (Hodge >= gin-residual on MUTAG), opposite of NCI1	The operator preference is dataset-dependent. High-pass helps on some graphs, low-pass on others. This would be a genuinely novel finding — identifying conditions under which each operator is preferred. The H006 rank-inversion may have a mechanistic explanation.
H45 refuted (both arms <= MLP on MUTAG with external residual)	The external residual is necessary but not sufficient on MUTAG. Dataset size (188 graphs) and/or feature dimensionality (7-dim) create a regime where topology-aware message passing does not help even with optimal residual architecture.
All three datasets show gin-residual ≈ Hodge	The high-pass/low-pass distinction does not matter at any tested dataset. The operator is truly irrelevant once the residual is present — a stronger version of the H008-c conclusion.

4. Experimental design

Datasets: MUTAG (188 graphs, 20 epochs) and PROTEINS (1113 graphs, 10 epochs). NCI1 results reused from H008-c.
Models: hodge-mp-residual, gin-residual, mlp-baseline.
Seeds: 30, matched to prior experiments.
Epochs: MUTAG: 20 (matched to H001); PROTEINS: 10 (matched to H002).
Optimiser: Adam(lr=1e-2), matched.
Hidden dim: 32, matched.
Statistical procedure: Pairwise paired Wilcoxon, BH-FDR at alpha=0.05.

5. Reproduction

# MUTAG (20 epochs, matched to H001)
python -m benchmarks.hodge \
  --datasets mutag \
  --models hodge-mp-residual gin-residual mlp-baseline \
  --seeds 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 \
  --n-epochs 20 \
  --output notebooks/results/h010_mutag_operator_30seeds.json \
  --markdown notebooks/results/h010_mutag_operator_30seeds.md

# PROTEINS (10 epochs, matched to H002)
python -m benchmarks.hodge \
  --datasets proteins \
  --models hodge-mp-residual gin-residual mlp-baseline \
  --seeds 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 \
  --n-epochs 10 \
  --output notebooks/results/h010_proteins_operator_30seeds.json \
  --markdown notebooks/results/h010_proteins_operator_30seeds.md

6. Resolved outcome (2026-05-25, 30 seeds, MUTAG 20 epochs / PROTEINS 10 epochs)

Per-arm reports in notebooks/results/h010_{mutag,proteins}_operator_30seeds.{json,md}.

Cross-dataset summary (with NCI1 from H008-c)

Dataset	Hodge (high-pass)	gin-residual (low-pass)	MLP	Hodge vs gin-residual p_BH	Direction
MUTAG (188)	0.750 [0.724, 0.789]	0.789 [0.763, 0.816]	0.789 [0.763, 0.816]	7.44 x 10^-3	low-pass wins
PROTEINS (1113)	0.686 [0.670, 0.717]	0.675 [0.657, 0.709]	0.675 [0.596, 0.706]	0.292	no difference
NCI1 (4110)	0.609 [0.581, 0.625]	0.629 [0.607, 0.641]	0.523 [0.513, 0.566]	1.01 x 10^-2	low-pass wins

Sub-hypotheses resolved

H42 (gin-residual > Hodge on MUTAG): CONFIRMED directionally (gin-residual 0.789 > Hodge 0.750, p_BH = 7.44 x 10^-3). However, the prediction that this would be MUTAG-specific due to homophily is not supported — the same direction holds on NCI1.
H43 (gin-residual vs Hodge on PROTEINS): Neither arm is distinguishable from the other or from MLP (all p_BH > 0.29). PROTEINS does not discriminate between operators at this configuration.
H44 (dataset-dependent gap): PARTIALLY CONFIRMED. The gap magnitude varies (significant on MUTAG and NCI1, null on PROTEINS), but the direction never reverses. The low-pass operator is consistently equal or better than the high-pass operator across all three datasets.
H45 (both arms beat MLP on MUTAG): REFUTED. Hodge (0.750) strictly underperforms MLP (0.789) at p_BH = 8.61 x 10^-3. gin-residual matches MLP (p_BH = 0.438). The external residual does not rescue the Hodge arm on MUTAG — high-pass filtering actively harms classification on this dataset.
H46 (both arms beat MLP on PROTEINS): REFUTED. Neither arm significantly outperforms MLP (Hodge vs MLP: p_BH = 0.29; gin-residual vs MLP: p_BH = 0.78).

Interpretation

The high-pass (Hodge Laplacian) vs low-pass (normalised adjacency) operator distinction does not produce a dataset-dependent advantage that favours the Hodge Laplacian on any tested dataset. The low-pass operator is consistently equal or superior:

MUTAG: low-pass matches MLP; high-pass loses by 4 pp. The Laplacian’s high-pass filtering attenuates the smooth class signal that MLP captures directly from atom-type features.
PROTEINS: neither operator adds measurable value over MLP. The dataset does not discriminate between architectures at this capacity (consistent with H002).
NCI1: both operators beat MLP with external residual; low-pass slightly ahead (+2 pp over Hodge, p_BH = 0.010).

What the full investigation establishes (H001-H010)

The complete preregistered investigation, comprising 13 hypotheses and 46 falsifiable sub-predictions across three datasets, converges on the following:

Topology-aware message passing with external residual outperforms MLP on NCI1 (+8-10 pp, robust across operators). This is the one positive claim that survives the full ablation series.
The operative architectural factor is the external residual connection, not the propagation operator. Without external residual, all message-passing architectures (GIN, GAT, normalised GIN) collapse to class prior on NCI1.
The Hodge Laplacian does not confer a unique advantage on any tested dataset. The normalised adjacency operator (low-pass) matches or outperforms the Hodge Laplacian (high-pass) on all three datasets when both use external residual.
The high-pass Hodge Laplacian is actively harmful on MUTAG, where it attenuates the class signal that the MLP captures from features alone.
The NCI1 advantage does not transfer to MUTAG or PROTEINS at this capacity and epoch budget.