Security audit — `uniprot-mcp` v1.0.1 pre-flip¶

Audit date: 2026-04-25 Auditor: Santiago Maniches (with mechanical assistance from Claude Opus 4.7) Target: branch hardening-v2 head 46cf081, 38 tools, 357 offline + 4 live integration tests (audit anchored at this commit; subsequent releases expand to 41 tools and 874 offline + 44 live tests on main — see CHANGELOG) License: Apache-2.0

This is the formal security audit performed in the run-up to the public flip. It complements docs/THREAT_MODEL.md (architectural threats and mitigations) by running each defended property through a mechanical check and recording the receipts.

1. Static-analysis matrix¶

Check	Tool	Scope	Result
CVE in runtime deps	`pip-audit --strict`	`httpx`, `mcp` (resolved transitive set)	No known vulnerabilities
Source-level security smells	`bandit -r src/uniprot_mcp` (LOW-severity, all confidences)	3,938 lines of code	0 issues at any severity
Type safety	`mypy --strict` (project config)	`src/uniprot_mcp/*.py` (6 source files)	clean
Lint correctness	`ruff check`	src + tests	clean
Format consistency	`ruff format --check`	src + tests	clean

2. Manual code-review audit¶

2.1 Dangerous-pattern grep¶

Pattern	Where searched	Hits
`http://` (cleartext URL)	`src/uniprot_mcp/`	0
`eval(` / `exec(` (raw, not `re.compile`)	`src/uniprot_mcp/`	0
`pickle` (serialization with code-execution risk)	`src/uniprot_mcp/`	0
`subprocess.` / `os.system` / `os.popen`	`src/uniprot_mcp/`	0
`shell=True`	`src/uniprot_mcp/`	0
`yaml.load(` (vs `yaml.safe_load`)	`src/uniprot_mcp/`	0
`open(` (file-system surface)	`src/uniprot_mcp/`	0¹
`__import__(` (dynamic import)	`src/uniprot_mcp/`	0
Bare `except:`	`src/uniprot_mcp/`	0

2.2 Network surface¶

Origin	Where declared	Tools that consult it
`https://rest.uniprot.org`	`BASE_URL` constant in `src/uniprot_mcp/client.py:38`	All 32 UniProt-resident tools
`https://alphafold.ebi.ac.uk`	`ALPHAFOLD_API_BASE` constant in `src/uniprot_mcp/client.py:42`	`uniprot_get_alphafold_confidence`
`https://eutils.ncbi.nlm.nih.gov/entrez/eutils`	`NCBI_EUTILS_BASE` constant in `src/uniprot_mcp/client.py:43`	`uniprot_resolve_clinvar`

Every origin is HTTPS. Adding a new origin requires modifying client.py and THREAT_MODEL.md and PRIVACY.md in the same commit, per the policy in docs/THREAT_MODEL.md §T3b.

2.3 Timeout coverage¶

Every httpx.AsyncClient instantiation in the codebase carries an explicit httpx.Timeout(...):

Location	Timeout
`src/uniprot_mcp/client.py:289` (the singleton `UniProtClient` shared by all UniProt-resident tools)	`httpx.Timeout(TIMEOUT)` where `TIMEOUT = 30.0`
`src/uniprot_mcp/client.py:499` (`get_clinvar_records` ephemeral client)	`httpx.Timeout(TIMEOUT)`
`src/uniprot_mcp/client.py:552` (`get_alphafold_summary` ephemeral client)	`httpx.Timeout(TIMEOUT)`
`src/uniprot_mcp/server.py:1619` (`provenance_verify` ephemeral client)	`httpx.Timeout(30.0)`

There is no path where a network call escapes the 30-second timeout.

2.4 Retry budget¶

Bound	Constant	Enforced in
Maximum retries per request	`MAX_RETRIES = 3`	`_req` and `id_mapping_submit`
Maximum server-dictated `Retry-After` wait	`MAX_RETRY_AFTER_SECONDS = 120.0`	`parse_retry_after`
`id_mapping_results` polling cap	30 iterations × 1 s	hard-coded loop

A client that hits 429 or 5xx forever still terminates: MAX_RETRIES + 1 attempts, then RuntimeError("Request failed after 4 attempts") propagates and the tool's _safe_error envelope kicks in.

2.5 Input validation matrix¶

Every tool that accepts an identifier validates it before any HTTP call. The full coverage:

Tool	Validator	Anchored regex
`uniprot_get_entry` / `_get_sequence` / `_get_features` / `_get_variants` / `_get_go_terms` / `_get_cross_refs`	`_check_accession`	`\A(?:[OPQ][0-9][A-Z0-9]{3}[0-9]\\|[A-NR-Z][0-9](?:[A-Z][A-Z0-9]{2}[0-9]){1,2})\Z`
`uniprot_get_keyword`	`_check_keyword_id`	`\AKW-[0-9]{4}\Z`
`uniprot_get_subcellular_location`	`_check_subcellular_location_id`	`\ASL-[0-9]{4}\Z`
`uniprot_get_uniref`	`_check_uniref_id`	`\AUniRef(?:50\\|90\\|100)_…\Z`
`uniprot_get_uniparc`	`_check_uniparc_id`	`\AUPI[A-F0-9]{10}\Z`
`uniprot_get_proteome`	`_check_proteome_id`	`\AUP[0-9]{9,11}\Z`
`uniprot_get_citation`	`_check_citation_id`	`\A[0-9]{1,12}\Z`
`uniprot_features_at_position` (position)	`_check_position`	int ∈ [1, 100000]
`uniprot_lookup_variant` (change)	`_parse_variant_change`	`\A[A-Z][1-9][0-9]{0,4}[A-Z*]\Z`
`uniprot_resolve_clinvar` (change, optional)	`_parse_variant_change`	same
`uniprot_search` / etc. (free-text query)	`_check_len("query", value, MAX_QUERY_LEN=500)`	length cap; chars not constrained
`uniprot_search` (organism filter)	`_check_len("organism", value, MAX_ORGANISM_LEN=100)` + double-quote sanitise	length cap
`uniprot_provenance_verify` (URL)	`_check_len("url", value, MAX_PROVENANCE_URL_LEN=1000)` + `startswith("https://rest.uniprot.org/")`	scheme + host pinned
`uniprot_replay_from_cache` (URL)	`_check_len`	length cap; cache lookup is local-only

Every regex uses \A...\Z anchors — no re.MULTILINE slip-pasts. Length caps are constants in src/uniprot_mcp/server.py:99-128 so they show up at one location for review.

2.6 SSRF posture¶

Two controlled redirect surfaces:

id_mapping_results redirect: when UniProt returns redirectURL in the polling response, the URL is dispatched through the same httpx.AsyncClient whose base_url is hardcoded to https://rest.uniprot.org (src/uniprot_mcp/client.py:296-300, follow_redirects=True). base_url resolves relative redirectURL values against rest.uniprot.org, but it does not block an absolute cross-origin URL in the redirectURL field — httpx would follow it. The mitigation today is therefore UniProt's trustworthiness as upstream plus the fact that no secrets are attached to our outbound requests (no Authorization header, no API key). (Still tracked at v1.1.6; target window v1.2.0: an explicit allowlist check before dispatch — see THREAT_MODEL.md §T3 for the precise statement.)
Cross-origin allowlist: the only other origins consulted are alphafold.ebi.ac.uk and eutils.ncbi.nlm.nih.gov, each declared by named constant and used in exactly one method (get_alphafold_summary, get_clinvar_records).

A compromise of either upstream origin would let an attacker return malicious metadata. The provenance subsystem records source URL + canonical SHA-256 so a poisoned answer is detectable via uniprot_provenance_verify — but not prevented. (Detection is the security claim; prevention requires upstream-side mitigations we cannot ship.)

2.7 Error-channel safety¶

_safe_error (src/uniprot_mcp/server.py:135-160) is the single chokepoint for tool error responses:

_InputError (our own validation type) is forwarded verbatim — agent-actionable.
ReleaseMismatchError (a controlled type from client.py) is rewritten into the standard format with the env-var name to unset; both pinned and observed release values are surfaced because they originate from our own state plus an upstream header (no raw stack-trace contents).
Any other Exception produces the canonical message "Error in <tool>: upstream request failed; see server logs for details." — the actual exception is logger.exception-ed to stderr but never reaches the LLM.

Pinned by tests:

tests/unit/test_server_validation.py::test_safe_error_hides_internal_exception_text — asserts 0xdeadbeef and sensitive words never appear in the agent-visible error envelope.
tests/unit/test_pin_release.py::test_safe_error_formats_release_mismatch_distinctly — asserts the release-mismatch path.

2.8 Provenance integrity¶

Every successful request sets client.last_provenance immediately after raise_for_status succeeds. A failing request never overwrites a prior successful provenance — pinned by tests/unit/test_provenance.py::test_client_last_provenance_unchanged_after_4xx.
canonical_response_hash parses JSON → re-serialises with sort_keys=True + compact separators → SHA-256 the canonical UTF-8 bytes. Within-release key-order changes do not break verification.
uniprot_provenance_verify uses a fresh httpx.AsyncClient so the verifier itself is not subject to the singleton's pin-release config. A pinned-release client can still verify against a different release.

3. Cryptographic-commitment audit¶

Property	Evidence
Pre-registered benchmark prompts (30)	`tests/benchmark/prompts.jsonl` on `main`; immutable from b1549f6 onward
Per-prompt expected-answer hashes (30, all unique)	`tests/benchmark/expected.hashes.jsonl` on `main`
Cryptographic round-trip	`python tests/benchmark/verify.py expected.jsonl expected.hashes.jsonl` → `OK: 30 commitments verified`
Live-REST third-party reproducibility	`python tests/benchmark/verify_answers.py expected.jsonl` → `OK: all 30 prompts verified against https://rest.uniprot.org` (re-verified 2026-04-25)

The plaintext expected.jsonl is held local-only per .gitignore. Three tests pin this rule (tests/contract/test_benchmark_integrity.py).

4. Supply-chain audit¶

4.1 Dependencies¶

Production runtime: only httpx >= 0.27 and mcp >= 1.2. Test extras: pytest, pytest-asyncio, pytest-cov, pytest-socket, respx, hypothesis, syrupy. Dev extras: ruff, mypy, bandit, pip-audit, pre-commit. Docs extras: mkdocs, mkdocs-material. All from PyPI-reputable maintainers.

Dependabot watches pip weekly (per .github/dependabot.yml).

4.2 GitHub Actions¶

Every uses: reference in every workflow file is SHA-pinned to the resolved commit, with the human-readable tag preserved as a trailing comment. Dependabot watches the github-actions ecosystem weekly to bump the pins safely (commit 843ace5).

4.3 Release artefacts¶

release.yml attaches:

SLSA build-provenance attestation via actions/attest-build-provenance@v1.
CycloneDX SBOM generated by cyclonedx-py requirements.
SBOM attestation via actions/attest-sbom@v1 — the SBOM itself is provenance-tied to the artefact.
Sigstore keyless signature via sigstore/gh-action-sigstore-python@v3.0.0.
PyPI Trusted Publishing — no long-lived API tokens.

Every release artefact is independently verifiable post-flip:

gh attestation verify dist/*.whl --repo smaniches/uniprot-mcp
gh attestation verify dist/*.whl --repo smaniches/uniprot-mcp --predicate-type https://cyclonedx.org/bom
python -m sigstore verify identity --cert-identity \
    'https://github.com/smaniches/uniprot-mcp/.github/workflows/release.yml@refs/tags/v1.0.1' dist/*.whl

5. Privacy posture¶

uniprot-mcp is a stateless gateway. No PII collected, no analytics SDK, no persistent session, no telemetry, no cookies. Three third parties:

Third party	What it sees	Necessity
`rest.uniprot.org`	source IP, User-Agent (`uniprot-mcp/<version>`), request path/query	Required — this is what the server proxies
`alphafold.ebi.ac.uk`	source IP, User-Agent, the UniProt accession in the path	Optional — used only by `uniprot_get_alphafold_confidence`
`eutils.ncbi.nlm.nih.gov`	source IP, User-Agent, the gene symbol (and optional HGVS shorthand) in query	Optional — used only by `uniprot_resolve_clinvar`

Full privacy notice: PRIVACY.md.

6. Operational maturity¶

Artefact	Purpose
`docs/THREAT_MODEL.md`	12-threat STRIDE walk + cross-origin allowlist policy
`docs/INCIDENT_POLICY.md`	What triggers a postmortem; blameless discipline; sunset rule
`docs/POSTMORTEM_TEMPLATE.md`	Header / timeline / root-cause / impact / detection / resolution / follow-up / lessons / 2030-compliance-officer view
`docs/INCIDENT_LOG.md`	Append-only, currently empty (project pre-public)
`tests/contract/test_incident_policy.py` (5 tests)	Drift prevention — every log entry must point at a real file; every postmortem file must be referenced from the log

7. Findings¶

Zero P0/P1 findings. Two P3 hardening items deferred (already tracked elsewhere):

Severity	Finding	Tracking	Mitigation while open
P3	Explicit `redirectURL` allowlist in `id_mapping_results` (currently no client-side redirect allowlist — an absolute cross-origin `redirectURL` would be followed)	`THREAT_MODEL.md` §T3 — "Deferred hardening"	`base_url` is not an allowlist; mitigation is UniProt's trustworthiness as the upstream plus the fact that no `Authorization` header or API key is ever sent — see §T3
P3	NFKC Unicode normalisation on free-text inputs (`query`, `organism`)	`THREAT_MODEL.md` §T12	All identifier validation uses ASCII subsets with `\A...\Z` anchors; impact limited to free-text query construction

Both are documented for v1.1.

8. Conclusion¶

The static-analysis matrix is fully green. The manual review found no defects. The supply chain, cross-origin, validation, retry, and error-channel surfaces all carry their own automated tests pinning their behaviour. The release-artefact verification chain (SLSA + Sigstore + SBOM + Trusted Publishing) is wired and ready to fire on the v1.0.1 tag once GitHub Actions billing resets.

The defended posture matches the documented threat model.

— Signed off by Santiago Maniches, ORCID 0009-0005-6480-1987, 2026-04-25.

The cache.py module performs file I/O via pathlib.Path.read_text/write_text and tempfile.NamedTemporaryFile. Both are safer than raw open() but the pattern is acknowledged: cache writes only when the user opts in via UNIPROT_MCP_CACHE_DIR, and atomic-write via os.replace is enforced. ↩

Security audit — uniprot-mcp v1.0.1 pre-flip¶