EchoGraph is a mono-repository that orchestrates the ingestion, enrichment, matching, and human validation of regulatory documents against a corpus of cloud provider guidelines. It provides ready-to-run pipelines, APIs, and user interfaces to power governance and compliance mapping workflows.
/ingestion # Python ingestion workers and n8n workflow definitions
/processing # Text cleanup, chunking, embeddings, and relationship discovery
/api # FastAPI backend that exposes sections, matches, and metadata
/frontend # React single-page application for reviewers and knowledge workers
/infra # Docker, Kubernetes, and CI/CD automation
/docs # Architecture, playbooks, and tutorials
/tests # Unit and integration tests
/data # Storage location for raw, cleaned, and demo datasets
- Docker and Docker Compose for local orchestration
- Python 3.10+
- Node.js 18+
- Poetry (optional) for Python dependency management
- pnpm or npm/yarn for frontend dependency management
make bootstrapThe bootstrap script will:
- Create Python virtual environments for ingestion and processing workers
- Install FastAPI backend dependencies
- Install frontend dependencies with pnpm
- Download demo documents into
data/demo_docs
docker compose up --buildServices provided:
ingestion-worker: Executes scheduled or on-demand document ingestion jobsprocessing-worker: Cleans, chunks, and embeds documents, and writes vectors to Qdrantpostgres: Stores canonical document sections, matches, and reviewer annotationsqdrant: Holds document embeddings for similarity searchapi: FastAPI server serving guideline data and matchesfrontend: React app for exploring guidelines and validating matches
The stack ships with a bundled Caddy reverse proxy. When running locally you can access the
reviewer UI at https://localhost (after trusting the autogenerated certificate) and the API at
https://localhost/api. On remote hosts Caddy listens on ports 80/443 and forwards requests to the
internal frontend container. You can browse via https://<vm-ip> (recommended) or
http://<vm-ip> if you need a quick check before trusting the generated certificate. Directly
visiting http://<vm-ip>:5173 is still blocked because that port only binds to the loopback
interface inside the VM.
The reviewer UI now supports end-to-end analysis without leaving the browser:
- Upload documents directly – Drop new cloud guidelines or regulatory frameworks, and the backend will extract, segment, embed, and generate candidate matches automatically.
- Interactive footnotes – Selecting a guideline reveals inline highlights that act like live footnotes. Hovering or clicking a highlight surfaces the linked regulation text, similarity rationale, and confidence estimates.
- Context-rich inspection – The match panel summarizes rationale, excerpts, and metadata so IT teams can quickly judge whether internal guidance aligns with external obligations.
- Ingestion: Python workers, orchestrated by n8n, download documents, extract text
using
pdfplumber,python-docx, or Apache Tika, and write raw JSONL files todata/raw. - Processing: Cleanup and chunking pipelines normalize text and create embeddings using
Sentence Transformers. Cleaned chunks and metadata are written to
data/processedand mirrored into Qdrant orpgvector. - Relationship Discovery: Matching jobs look up related regulation sections for each cloud guideline chunk, summarize the rationale with an LLM, and produce candidate matches.
- Human Validation: Reviewers validate or reject matches in the frontend UI; their decisions are persisted in PostgreSQL.
- Architecture Overview
- Ingestion Playbook
- Human Validation Guide
- Deployment
- HTTPS Setup
- Bare-metal Ubuntu Deployment
Please read CONTRIBUTING.md for coding standards, pull request etiquette, and how to participate in the community.
EchoGraph is released under the GPL-3.0 license. See LICENSE for details.