Skip to content

Knowledge-graph tools

alphafold_sovereign.tools.knowledge_graph_tools

MCP tools for querying the local AlphaFold Sovereign Knowledge Graph.

These tools read and traverse the local SQLite knowledge graph — a genuine ACID store (WAL journalling, versioned migrations, schema v3) with foreign-key integrity across proteins, variants, diseases, drugs and their relationships.

The graph ships with a curated boot seed (loaded automatically when the store is empty; disable with AFSMCP_DISABLE_KG_SEED=1) so these tools return representative results out of the box. It is extended by writing through the knowledge-graph storage API; the analysis tools do not write to it on their own (no automatic per-invocation persistence). On top of that store the tools provide:

  • Recall of stored entities with no upstream API call required
  • Cross-entity pattern queries ("which HIGH-tier variants share a WARM target?")
  • Batch export to JSON for pandas/ML pipelines
  • Optional provenance/audit tables (opt-in; empty by default)
Tool inventory
  1. query_variant_database — search stored variant triage results
  2. query_protein_database — search stored protein assessments
  3. get_knowledge_graph_stats — database health and coverage summary
  4. export_research_dataset — export to JSON for pandas/ML pipelines
  5. find_drug_gene_network — traverse the stored drug-gene-disease graph

query_variant_database async

query_variant_database(params: VariantQueryInput) -> dict[str, Any]

Search the local knowledge graph for stored variants.

Returns variants matching the filter criteria. No upstream API calls are made — all data is served from the local SQLite knowledge graph, which is populated by the curated boot seed and by any explicit writes through the knowledge-graph storage API (the analysis tools do not write to it on their own).

Parameters:

Name Type Description Default
params.gene

Gene symbol filter.

required
params.tier

Clinical tier (HIGH/MEDIUM/LOW/UNKNOWN).

required
params.clinvar_class

ClinVar classification string.

required
params.min_am_score

Minimum AlphaMissense score.

required
params.max_gnomad_af

Maximum gnomAD allele frequency.

required
params.limit

Maximum results.

required

query_protein_database async

query_protein_database(params: ProteinQueryInput) -> dict[str, Any]

Search the local knowledge graph for stored proteins.

Returns proteins matching the filter criteria. Serves from the local SQLite knowledge graph — no upstream API calls. The store is populated by the curated boot seed and by explicit writes through the storage API.

Parameters:

Name Type Description Default
params.druggability_tier

HOT/WARM/COLD/NOT_DRUGGABLE filter.

required
params.min_plddt

Minimum AF2 confidence score.

required
params.limit

Maximum results.

required

get_knowledge_graph_stats async

get_knowledge_graph_stats() -> dict[str, Any]

Return statistics about the local knowledge graph.

Shows entity counts, database size, and last activity — useful for understanding the current contents and coverage of the local store.

export_research_dataset async

export_research_dataset(params: ExportInput) -> dict[str, Any]

Export the stored knowledge-graph data for downstream analysis.

Returns all stored entities as JSON-serialisable dicts, suitable for: - Loading into pandas DataFrames for ML feature engineering - Importing into R or Julia for statistical analysis - Feeding into downstream bioinformatics pipelines

Example (Python)::

import pandas as pd
result = await export_research_dataset(ExportInput(tables=["variants"]))
df = pd.DataFrame(result["data"]["variants"])
high_tier = df[df["clinical_tier"] == "HIGH"]

Parameters:

Name Type Description Default
params.tables

Tables to export (empty = all entity tables).

required
params.limit_per_table

Maximum rows per table.

required

find_drug_gene_network async

find_drug_gene_network(params: DrugNetworkInput) -> dict[str, Any]

Traverse the local knowledge graph from a seed entity.

Given a seed (UniProt ID, gene symbol, or MONDO disease ID), expands its immediate neighbourhood in the stored drug-gene-disease graph: a gene symbol resolves to its encoded proteins and reported variants, a UniProt accession resolves to its stored protein record, and a MONDO disease resolves to drugs with an indication for it. The store is populated by the curated boot seed and by explicit writes through the storage API.

Parameters:

Name Type Description Default
params.seed

Starting entity identifier.

required
params.max_hops

Graph traversal depth (1–3).

required