Threat Model¶
First-cut STRIDE-style threat model for v1.1.0-rc1. This document is the baseline for the security review work in v1.2.0. Findings from the v1.2.0 audit (see
AUDIT.md) will be appended here as new threat IDs or amendments to existing ones.
Scope¶
This threat model covers the MCP server (alphafold-sovereign-mcp)
as deployed by an end user on their own machine, communicating over
stdio with an MCP client (typically Claude Desktop) and calling
public biomedical APIs over the internet (or refusing to, in
offline mode).
Out of scope:
- The MCP client itself (Claude Desktop, etc.).
- The upstream APIs' own security posture.
- Network-layer attacks beyond what the host OS provides (TLS is
handled by
httpx). - Streamable HTTP / OAuth (planned for v1.3 — see STATUS.md "Roadmap").
Trust boundaries¶
┌─────────────────────────────────────────────────────────────┐
│ User's machine │
│ │
│ ┌──────────────┐ stdio JSON-RPC ┌─────────────────┐ │
│ │ MCP client │ ◄──────────────────► │ alphafold- │ │
│ │ (Claude │ │ sovereign-mcp │ │
│ │ Desktop) │ │ │ │
│ └──────────────┘ └──┬──────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────┐ │
│ │ Local SQLite knowledge graph │ │
│ │ ~/.alphafold-sovereign-mcp/ │ │
│ └────────────────────────────────┘ │
└──────────────────────────────│──────────────────────────────┘
│ HTTPS (off by default in offline mode)
▼
┌──────────────────────────────┐
│ 14 upstream biomedical APIs │
│ (Ensembl, ClinVar, gnomAD, │
│ AlphaFold DB, …) │
└──────────────────────────────┘
Three trust boundaries:
- Client → Server: stdio is on the same host but the MCP client
process is a different program. The server trusts only the JSON-RPC
protocol surface, not the client's intent — but it accepts every
tools/callthe client sends. - Server → Local SQLite: same-host. The DB file is at a path
controlled by
platformdirs; permissions follow the OS user. - Server → Upstream APIs: outbound HTTPS. The server can be
pinned via
ALPHAFOLD_ALLOW_HOSTSor disabled entirely viaALPHAFOLD_OFFLINE=1.
STRIDE table¶
| ID | Threat | Category | Surface | Mitigation | Code receipt |
|---|---|---|---|---|---|
| T01 | A malicious local user impersonates Claude Desktop and invokes tools that exfiltrate cached data. | Spoofing | stdio | stdio runs on the same OS user; no cross-user authentication. The cache file is OS-permission-protected. The server has no concept of "user identity" because everything is local. | server/stdio.py, storage/knowledge_graph.py |
| T02 | A compromised upstream API returns adversarial JSON to corrupt the knowledge graph. | Tampering | client → upstream | All responses are deserialised through Pydantic models (domain/) with strict types. Unknown/malformed fields are dropped, not stored. Schema drift surfaces as a ValidationError, not silent corruption. |
domain/*.py, every clients/* returns typed models |
| T03 | A malformed tools/call argument causes a SQL injection in the knowledge graph. |
Tampering | client → server → DB | All SQL is parameterised. _ALLOWED_TABLES allow-list guards export_to_dict(tables=...). CWE-89 closed; CodeQL security-extended runs on every push. |
storage/knowledge_graph.py (_fetchall, _executemany); CI workflow .github/workflows/ci.yml |
| T04 | A user disputes that a tool was invoked or returned a certain result. | Repudiation | server | Every tool invocation is recorded in the SQLite knowledge graph with timestamp and arguments. The cache file is the audit trail; signing it (Sigstore Rekor or local ed25519) is on the v1.3 roadmap. | storage/knowledge_graph.py record_* methods |
| T05 | The MCP client (or a malicious tool argument) extracts sensitive cached data. | Information disclosure | client → server | The knowledge graph holds only public biomedical metadata — no PHI, no credentials. The server does not read environment variables to obtain secrets (upstream APIs we use are unauthenticated). If a future API requires a token, it will be loaded from a config file the user explicitly creates. | All clients/*.py — no os.environ.get("*_API_KEY") calls in v1.1.0-rc1 |
| T06 | Logs leak sensitive query content (e.g., a patient identifier inadvertently passed as a gene_symbol argument). |
Information disclosure | server → stdout/stderr | structlog JSON logs include argument values. Recommended deployment: redirect stderr to a file the user owns. The server itself does not log to remote endpoints. |
server/stdio.py uses structlog.get_logger; no remote handlers |
| T07 | An upstream API rate-limits or 5xx-storms the server, blocking legitimate requests. | Denial of service | server → upstream | aiolimiter token-bucket per host; tenacity exponential backoff with jitter; circuit breaker (CircuitBreaker in clients/_base.py) opens after failure_threshold consecutive failures and refuses requests for cooldown_seconds. |
clients/_base.py (UpstreamConfig, CircuitBreaker, RetryConfig) |
| T08 | A buggy or malicious upstream returns a 100 MB JSON payload, OOMing the server. | Denial of service | server → upstream | httpx requests use a default response timeout and the client modules read responses into bounded Pydantic models. No streaming-into-memory of unbounded payloads. We do not yet enforce a max-response-bytes; this is tracked. |
clients/_base.py; gap: max-bytes ceiling — track for v1.2.0 |
| T09 | A user calls export_research_dataset with an arbitrary table name, getting access to internal tables. |
Elevation of privilege | client → server → DB | _ALLOWED_TABLES allow-list explicitly enumerates exportable tables. Any other table name returns a ValueError. Unit test exists. |
storage/knowledge_graph.py:_ALLOWED_TABLES; tests/test_knowledge_graph.py |
| T10 | A path-traversal argument tricks the server into writing the SQLite DB outside its allowed directory. | Elevation of privilege | client → server | The DB path is computed via platformdirs.user_data_dir(...) and is not configurable from the MCP API surface. No tool exposes a path= argument. |
storage/knowledge_graph.py (constructor uses platformdirs) |
| T11 | A user runs the server in offline mode but the cache contains stale or attacker-tainted data from an earlier online session. | Tampering / Information disclosure | server → cache | This is a known limitation: in offline mode, the cache is the source of truth. Users responsible for the integrity of their own cache file. The v1.3 air-gap bundle work will introduce a signed bundle format. | server/stdio.py (ALPHAFOLD_OFFLINE flag); gap: signed bundle — v1.3 |
| T12 | A malicious PR introduces a dependency with a backdoor. | Supply chain | repo | Apache 2.0 + Dependabot + Bandit + Safety + pip-audit + CodeQL on every PR. SBOM (CycloneDX) emitted in CI. SLSA L3 build provenance + cosign signing on the release artefacts (Phase E of the polish sprint). | .github/workflows/ci.yml, release.yml; OpenSSF Scorecard badge in README |
Risk register summary¶
| Risk level | Count | IDs |
|---|---|---|
| High | 0 | — |
| Medium (with named mitigation) | 4 | T03, T07, T09, T12 |
| Medium (with gap) | 2 | T08, T11 |
| Low | 6 | T01, T02, T04, T05, T06, T10 |
Identified gaps (tracked for v1.2.0+)¶
- T08: enforce a configurable max-response-bytes in
clients/_base.py. - T11: signed offline-bundle format so a tampered cache is detectable.
- T04: append-only signed audit log (Sigstore Rekor option).
How to add or amend a threat¶
When a new threat is identified (e.g., by a security audit or a researcher report):
- Open a GitHub issue with the
threat-modellabel. - Add a new
T<NN>row to the STRIDE table in a PR; or amend an existing row with new mitigations or new gaps. - If the threat is exploitable today, follow the disclosure process in SECURITY.md.
Last updated: 2026-05-11.