> ## Documentation Index
> Fetch the complete documentation index at: https://docs.fabric.bulldogtechnologies.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Architecture

> Four layers on top of one Postgres instance — connectors, ingestion, storage, query — all tenant-scoped, all inspectable.

Fabric is four layers on top of **one Postgres instance**. Connectors feed an ingestion pipeline; the pipeline populates a knowledge graph, a vector store, and a memory graph; those power a unified query layer for chat, the REST API, and the graph explorer.

<Info>
  **One database.** No Elasticsearch to sync. No Neo4j to maintain. No vector DB sidecar. pgvector, tsvector, and recursive CTEs handle everything.
</Info>

## Data flow

<Frame>
  <img src="https://mintlify.s3.us-west-1.amazonaws.com/bulldogtechnologies/images/architecture-flow.svg" alt="Connectors → Ingestion → Storage → Query, all on Postgres" />
</Frame>

<CardGroup cols={4}>
  <Card title="Connectors" icon="plug" color="#3b82f6">
    Gmail, Drive, Slack, Fireflies, IMAP, Postgres, MySQL
  </Card>

  <Card title="Ingestion" icon="gears" color="#10a37f">
    Extract content · Chunk + embed · Extract graph · Extract memory
  </Card>

  <Card title="Storage" icon="database" color="#8b5cf6">
    graph\_nodes · graph\_edges · rag\_chunks · observations · connections
  </Card>

  <Card title="Query" icon="terminal" color="#f97316">
    Chat (SSE) · REST API · Graph UI
  </Card>
</CardGroup>

## Connectors layer

Each connector is a Python adapter in `backend/sources/` that implements three operations:

<Steps>
  <Step title="Discover" icon="magnifying-glass">
    Enumerate what's available — folders, channels, mailboxes, schemas.
  </Step>

  <Step title="Sync" icon="arrows-rotate">
    Incremental fetch by cursor (message ID, file modification time, thread timestamp). Each connector persists its own cursor so runs only pull deltas.
  </Step>

  <Step title="Normalize" icon="diagram-project">
    Turn raw source data into graph nodes and typed edges with metadata.
  </Step>
</Steps>

Connectors run on a schedule via Celery Beat. Each tracks its own sync state so runs fetch only deltas.

| Source                 | Auth                    | Sync granularity                |
| ---------------------- | ----------------------- | ------------------------------- |
| **Gmail**              | OAuth 2.0               | Per message / thread            |
| **Google Drive**       | OAuth 2.0               | Per file modification time      |
| **Slack**              | OAuth 2.0               | Per channel timestamp           |
| **Fireflies**          | OAuth 2.0               | Per transcript ID               |
| **IMAP**               | Credentials (encrypted) | Per message UID                 |
| **PostgreSQL / MySQL** | Credentials (encrypted) | On-demand (schema + query exec) |

## Ingestion pipeline

<Tabs>
  <Tab title="Content extraction">
    Strip formatting, unwrap HTML, handle attachments, normalize to plain text.

    The content adapter produces a `SourceItem` per content unit with stable IDs for deduplication across syncs.
  </Tab>

  <Tab title="Entity + relationship extraction">
    The adapter emits typed graph nodes (`person`, `domain`, `email_thread`, `slack_message`, `meeting`, `file`, `folder`) and typed edges (`sent_by`, `attended`, `replied_to`, etc.) that upsert into `graph_nodes` and `graph_edges`.
  </Tab>

  <Tab title="Chunking + embedding">
    Content splits on structural boundaries and embeds via OpenAI `text-embedding-3-small` (1536 dims). Chunks land in `rag_chunks` with HNSW indexes and `tsvector` columns for BM25.
  </Tab>

  <Tab title="Memory extraction">
    On conversations only. Claude Haiku extracts typed observations from question-answer pairs. Observations land in `observations` with embeddings and initial importance 0.5.
  </Tab>
</Tabs>

<Tip>
  Each stage is independently re-runnable. Regenerate embeddings without rebuilding the graph. Re-extract observations without re-chunking. The pipeline is checkpointed per-item.
</Tip>

## Storage layer

One Postgres database. Extensions: `pgvector` for embeddings, built-in `tsvector` for full-text search.

| Table           | Role                                        | Key indexes                                               |
| --------------- | ------------------------------------------- | --------------------------------------------------------- |
| `graph_nodes`   | Typed entities extracted from content       | HNSW(embedding), GIN(search\_vector), btree(source\_date) |
| `graph_edges`   | Typed relationships with weight + timestamp | btree(source\_id), btree(target\_id), btree(relation)     |
| `rag_chunks`    | Text chunks for retrieval                   | HNSW(embedding), GIN(search\_vector)                      |
| `observations`  | Typed memory facts with importance decay    | HNSW(embedding), GIN(search\_vector), btree(importance)   |
| `connections`   | Per-tenant connector credentials (AES-256)  | btree(tenant\_id)                                         |
| `chat_sessions` | Conversation history + SDK session mapping  | btree(tenant\_id, updated\_at)                            |

<Warning>
  Every queryable table has `tenant_id` and `project_id` columns. Tenant isolation is enforced at query time — there's no query path that doesn't filter on tenant.
</Warning>

## Query layer

<CardGroup cols={3}>
  <Card title="Chat" icon="message" color="#10a37f">
    A Claude Agent SDK session with a tool surface — `search_knowledge`, `follow_edges`, `query_database`, `http_request`, and more. Streams as SSE with per-token deltas, tool calls, and cost accounting.
  </Card>

  <Card title="REST API" icon="terminal" color="#06b6d4">
    72+ endpoints in 10 modules. FastAPI auto-generates OpenAPI docs at `/docs` (Swagger) and `/redoc`. Bearer token auth; API keys for programmatic access.
  </Card>

  <Card title="Graph Explorer" icon="diagram-project" color="#8b5cf6">
    React SPA that renders `graph_nodes` and `graph_edges` via force-directed layout. Debug extraction, audit relationships, explore what Fabric has learned.
  </Card>
</CardGroup>

## Event-driven processing

Heavy work — sync, extraction, embedding, agent execution — runs in Celery workers. The API never blocks on long operations.

* **Sync tasks** fire on Celery Beat schedules configured per connection.
* **Extraction and embedding** chain per item so they can be scaled and retried independently.
* **Agent runs** execute SDK sessions inside worker processes with session persistence on a workspace filesystem.

## Observability

<Info>
  Langfuse traces every LLM call, embedding operation, and tool invocation. Per-query cost surfaces in the chat UI. Structured logs with `tenant_id` + `project_id` context. Prometheus-compatible metrics at `/metrics`.
</Info>

Most competitors are black boxes. Fabric shows its work.

## Deployment

<Tabs>
  <Tab title="Local / Docker Compose">
    ```bash theme={null}
    docker compose up
    ```

    Brings up Postgres, Redis, API, worker, and frontend. Useful for dev or single-user deployments.
  </Tab>

  <Tab title="AWS ECS Fargate">
    ECS Fargate for API and worker, RDS Postgres, ElastiCache Redis, ALB fronting the API, CloudFront fronting the S3-hosted frontend. Secrets in SSM Parameter Store, injected at container start.

    Infrastructure-as-code via AWS Copilot manifests in `copilot/`. Deployment script is `scripts/deploy.sh`.
  </Tab>
</Tabs>

<CardGroup cols={2}>
  <Card title="How It Works" icon="diagram-project" href="/how-it-works">
    The engineering behind each layer.
  </Card>

  <Card title="Knowledge Graph" icon="share-nodes" href="/knowledge-graph">
    Typed edges, multi-hop traversal, SQL you can run directly.
  </Card>
</CardGroup>
