Skip to content
All skills
OPERATOR research v1.0.0 · Apache-2.0

Graphify

Build knowledge graphs from codebases, docs, papers, and images using Graphify. Produces interactive HTML visualization, queryable JSON, and audit report with god nodes, communities, and surprising connections. Use when analyzing unfamiliar repos, understanding architecture, or building structured context before a complex task.

Audited
Source
SHA-256
Last reviewed
How we audit →

Install in your agent

Tell your agent: "install the recipes skill, then add graphify"
Or via curl: curl -sL https://recipes.wisechef.ai/skill -o ~/.claude/skills/recipes/SKILL.md

Full skill source · SKILL.md

Graphify — Code/Doc → Knowledge Graph

Install: pipx install 'graphifyy[all]' Binary: graphify (CLI for install/hooks/benchmark only) Python: ${HOME}/.local/share/pipx/venvs/graphifyy/bin/python Repo: https://github.com/safishamsi/graphify (5k stars, v0.3.6)

When to Use

  • Analyzing unfamiliar codebases before modifying them
  • Understanding architecture and cross-component relationships
  • Building structured context for a complex multi-file task
  • Finding "god nodes" (most connected abstractions) and surprising connections
  • Pre-loading knowledge before running AutoAgent or similar optimization loops
  • Comparing different tool options: use Graphify for docs+code+images, GitNexus for pure code call graphs

When NOT to Use

When NOT to Use

  • Repo <5 files and <50K words — just read the files directly (Graphify will warn you)
  • Pure code-only analysis — GitNexus (gitnexus analyze) is faster for call graphs
  • You need real-time file watching — use gitnexus with MCP instead

Modes

Graphify has two modes depending on your content type:

Code Mode (default)

For codebases — extracts functions, classes, imports, call graphs. Best for understanding unfamiliar repos.

Knowledge Mode (--mode knowledge)

For prose-heavy directories — Obsidian vaults, documentation, research papers, meeting notes. Extracts domain concepts instead of code symbols.

When to use knowledge mode:

  • Obsidian vaults (.md/.txt heavy)
  • Documentation directories
  • Research paper collections
  • Meeting notes or decision logs
  • Any corpus where the value is in concepts and their relationships, not code structure

Why it matters: Running code mode on prose produces function-level noise (e.g., main(), frontmatter() as god nodes) instead of conceptual structure (e.g., "WiseChef", "Agent Coordination", "Cognee" as god nodes).

Architecture

Graphify runs in two passes (code mode) or one pass (knowledge mode):

  1. AST pass (deterministic, free) — Tree-sitter extracts classes, functions, imports, call graphs from code files. Skipped in knowledge mode.
  2. Semantic pass (LLM, costs tokens) — Extracts concepts and relationships:
    • Code mode: Extracts technical concepts from docs, papers, images
    • Knowledge mode: Extracts domain concepts, entities, and cross-document relationships from all files

Results merge into a NetworkX graph → Leiden community detection → outputs:

  • graphify-out/graph.html — interactive visualization (click nodes, search, filter)
  • graphify-out/graph.json — persistent queryable graph
  • graphify-out/GRAPH_REPORT.md — god nodes, communities, surprising connections, knowledge gaps

Quick Start (Inside a Coding Agent)

If running from Claude Code, Codex, or OpenClaw with Agent/subagent support:

/graphify /path/to/repo

This uses the skill.md workflow with parallel subagent dispatch for semantic extraction. Fastest and most complete.

Running from Hermes (Without Subagent Dispatch)

Hermes doesn't have the Agent tool that Graphify expects for parallel semantic extraction. Use the Python API directly:

Step 1: Detect Files

PYTHON=${HOME}/.local/share/pipx/venvs/graphifyy/bin/python
$PYTHON -c "
import json
from graphify.detect import detect
from pathlib import Path
result = detect(Path('/path/to/repo'))
print(json.dumps(result, indent=2))
"

Check total_files, total_words, and needs_graph. If corpus is tiny, just read the files.

Step 2: AST Extraction (Code Files)

Write a script file (don't use f-strings in terminal — escaping issues):

# save as _run_ast.py in the repo
import json
from graphify.extract import collect_files, extract
from pathlib import Path

code_files = [Path(f) for f in [
    '/path/to/repo/file1.py',
    '/path/to/repo/file2.py',
]]
result = extract(code_files)
Path('.graphify_ast.json').write_text(json.dumps(result, indent=2))
print(f"AST: {len(result['nodes'])} nodes, {len(result['edges'])} edges")

Step 3: Semantic Extraction (Manual)

Since we can't dispatch subagents, read the doc files yourself and create semantic nodes/edges manually:

semantic_nodes = [
    {"id": "unique_id", "type": "concept", "label": "Human Label",
     "description": "What this is", "source": "file.md", "provenance": "EXTRACTED"},
]
semantic_edges = [
    {"source": "node_a", "target": "node_b", "relation": "uses",
     "provenance": "EXTRACTED"},  # or "INFERRED" with confidence
]

Node types: concept, artifact, process, config, constraint, directive, dependency, benchmark, boundary Provenance: EXTRACTED (explicit in source), INFERRED (reasonable inference, add confidence: 0.0-1.0), AMBIGUOUS (uncertain)

Step 3b: Knowledge Mode Extraction (prose/docs)

When running on prose-heavy content (Obsidian vaults, docs, papers), skip the AST pass entirely and extract domain-level concepts as nodes with these types:

semantic_nodes = [
    {"id": "wisechef", "type": "project", "label": "WiseChef",
     "description": "AI platform for autonomous agents, €248/mo MRR", "source": "business-strategy.md", "provenance": "EXTRACTED"},
    {"id": "agent_coordination", "type": "concept", "label": "Agent Coordination Protocol",
     "description": "Multi-agent coordination via Discord markers and ACK system", "source": "agent-map.md", "provenance": "EXTRACTED"},
    {"id": "cognee", "type": "technology", "label": "Cognee",
     "description": "Knowledge graph engine for AI memory", "source": "tools-skills-inventory.md", "provenance": "EXTRACTED"},
    {"id": "adam", "type": "person", "label": "Adam Krawczyk",
     "description": "Founder, Kraków-based, late-night worker", "source": "contacts.md", "provenance": "EXTRACTED"},
]

semantic_edges = [
    {"source": "wisechef", "target": "agent_coordination", "relation": "uses",
     "provenance": "EXTRACTED"},
    {"source": "wisechef", "target": "cognee", "relation": "depends_on",
     "provenance": "EXTRACTED"},
    {"source": "wisechef", "target": "adam", "relation": "owned_by",
     "provenance": "EXTRACTED"},
    {"source": "agent_coordination", "target": "cognee", "relation": "stores_state_in",
     "provenance": "INFERRED", "confidence": 0.7},
]

Knowledge mode node types:

Type Use For Example
project Named projects/products WiseChef, AgentPact, WiseVision
concept Abstract ideas, protocols Agent Coordination, Zero-Touch Deals
technology Tools, frameworks, infra Cognee, Honcho, Hetzner, Docker
person Named individuals Adam, Olek, Mariusz
organization Companies, teams WiseChef, MIT
decision Key decisions made "use GLM for entity extraction"
process Workflows, pipelines Nightly Ingest, Graphify nightly

Knowledge mode edge relations: depends_on, uses, owned_by, stores_state_in, relates_to, builds_on, contradicts, is_part_of, managed_by, competes_with

Key principle: Prioritize cross-document connections over within-document structure. The value of a knowledge graph is revealing relationships that aren't visible from reading any single document.

Custom prompt for LLM extraction (if using subagents):

"Extract domain concepts, named entities, project names, technologies, and abstract ideas as nodes. Extract relationships like 'depends on', 'is part of', 'relates to', 'contradicts', 'builds on' as edges. Prioritize cross-document connections. Do NOT extract code-level artifacts (functions, classes, imports) — only domain-level concepts."

Step 4: Build Graph + Cluster + Export

import json
from pathlib import Path
from graphify.build import build_from_json
from graphify.cluster import cluster, score_all
from graphify.analyze import god_nodes, surprising_connections, suggest_questions
from graphify.report import generate
from graphify.export import to_html, to_json

# Merge AST + semantic
ast = json.loads(Path('.graphify_ast.json').read_text())
merged = {"nodes": ast['nodes'] + semantic_nodes,
          "edges": ast['edges'] + semantic_edges,
          "input_tokens": 0, "output_tokens": 0}

# Build → Cluster → Analyze
G = build_from_json(merged)
communities = cluster(G)  # Returns dict[int, list[str]]
cohesion = score_all(G, communities)

# Community labels
community_labels = {}
for cid, members in communities.items():
    labels = [G.nodes[m].get('label', m) for m in members if G.has_node(m)]
    community_labels[cid] = labels[0] if labels else f"Community {cid}"

god_list = god_nodes(G)
surprise_list = surprising_connections(G)
questions = suggest_questions(G, communities, community_labels)

# Export
OUT = Path('graphify-out')
OUT.mkdir(exist_ok=True)
to_json(G, communities, str(OUT / 'graph.json'))
to_html(G, communities, str(OUT / 'graph.html'))

# Report
detection_result = {"files": {"code": N, "document": N}, "total_files": N, "total_words": N}
token_cost = {"input_tokens": 0, "output_tokens": 0}
report = generate(G, communities, cohesion, community_labels, god_list,
                  surprise_list, detection_result, token_cost, str(REPO), questions)
(OUT / 'GRAPH_REPORT.md').write_text(report)

API Reference (Discovered — Not Documented)

Module Key Functions Notes
graphify.detect detect(path: Path) -> dict Returns files, total_words, needs_graph
graphify.extract extract(files: list[Path]) -> dict, collect_files(path) -> list AST extraction, returns nodes+edges
graphify.cache check_semantic_cache(files) -> (nodes, edges, hyperedges, uncached) SHA256-based cache
graphify.build build_from_json(data: dict) -> nx.Graph NOT build_graph
graphify.cluster cluster(G) -> dict[int, list[str]], score_all(G, communities) -> dict[int, float] Leiden community detection
graphify.analyze god_nodes(G), surprising_connections(G), suggest_questions(G, communities, labels) Graph analysis
graphify.report generate(G, communities, cohesion, labels, gods, surprises, detect, cost, root, questions) Markdown report
graphify.export to_html(G, communities, path), to_json(G, communities, path), to_svg, to_graphml, to_cypher Multiple formats

Always-On Integration

After building a graph, install the always-on hook so your agent reads the graph before grepping:

cd /path/to/repo
graphify claude install    # Adds CLAUDE.md section + PreToolUse hook
# or: graphify codex install / graphify claw install

Querying an Existing Graph

/graphify query "what connects X to Y?"          # BFS traversal
/graphify query "what connects X to Y?" --dfs    # DFS — trace specific path
/graphify path "NodeA" "NodeB"                    # Shortest path
/graphify explain "SomeNode"                      # Plain-language explanation

Pitfalls

  1. PyPI package is graphifyy (double y) — the graphify name is being reclaimed
  2. Don't use f-strings in terminal for the Python scripts — shell escaping breaks. Write script files instead.
  3. build_from_json not build_graph — the function name differs from what you'd expect
  4. cluster() returns dict[int, list[str]] not a modified graph — you pass communities as a separate arg to export functions
  5. to_json/to_html take 3 args(G, communities, path_string), not just (G, path)
  6. suggest_questions needs community_labels — compute labels before calling it
  7. Semantic extraction without subagents requires manual node/edge creation — read the docs yourself and build the semantic layer by hand
  8. AST rationale nodes get warnings about invalid file_type='rationale' — these are cosmetic, the graph still builds correctly
  9. Large repos (>200 files): detection will suggest running on a subfolder. Respect this.
  10. The graphify CLI is just for install/hooks/benchmark — the actual pipeline runs inside a coding agent or via Python API
  11. Running code mode on prose produces noise — function-level nodes (main(), frontmatter()) instead of domain concepts. Always use knowledge mode for .md/.txt/.rst heavy directories like Obsidian vaults.
  12. Knowledge mode graphs need cross-document edges — within-document edges alone produce disconnected communities. Spend 50% of extraction effort on finding connections between documents.

Running on Our Obsidian Vault

# First, detect what we're working with
PYTHON=${HOME}/.local/share/pipx/venvs/graphifyy/bin/python
$PYTHON -c "
from graphify.detect import detect
from pathlib import Path
r = detect(Path('${HOME}/obsidian-vault'))
print(f'Files: {r[\"total_files\"]}, Words: {r[\"total_words\"]}')
"

Then follow the knowledge mode workflow (Step 3b) — read each .md file, extract domain concepts as nodes, create cross-document edges. Skip AST pass entirely (no code to analyze in the vault).

Expected output for our vault (~79 notes, ~52K words):

  • Nodes: ~100-150 domain concepts (projects, technologies, people, decisions, processes)
  • Edges: ~150-250 cross-document relationships
  • God nodes: WiseChef, Cognee, Adam, Agent Coordination, Infrastructure (not function names)
  • Communities: Business strategy, Agent infra, Knowledge systems, Client operations, Development tools

Verification

which graphify                                    # Should show ~/.local/bin/graphify
PYTHON=${HOME}/.local/share/pipx/venvs/graphifyy/bin/python
$PYTHON -c "import graphify; print('OK')"         # Should print OK
$PYTHON -c "from graphify.detect import detect; print('detect OK')"
$PYTHON -c "from graphify.build import build_from_json; print('build OK')"