> ## Documentation Index
> Fetch the complete documentation index at: https://docs.fabric.bulldogtechnologies.com/llms.txt
> Use this file to discover all available pages before exploring further.

# How It Works

> The engineering behind Fabric's four layers: typed knowledge graph, fused retrieval via RRF, semantic memory with decay, and first-class database connections.

Fabric stacks four layers that build on each other. Each is a moat on its own. Together they produce **cross-source, time-aware, source-cited answers** that no single layer can produce alone.

## Layer 1 — The Knowledge Graph

As content syncs from your connectors, an extraction pipeline identifies entities and relationships. Entities become typed nodes. Relationships become typed edges with weights and timestamps.

### Edge types extracted today

| Relation       | Meaning                      | Example                                                       |
| -------------- | ---------------------------- | ------------------------------------------------------------- |
| `sent_by`      | Message authored by a person | Slack message → user who posted it                            |
| `replied_to`   | Response relationship        | Email → the email it replies to                               |
| `in_thread`    | Message belongs to a thread  | Email → its parent thread node                                |
| `posted_in`    | Message posted in a channel  | Slack message → `#billing` channel                            |
| `attended`     | Person attended a meeting    | Fireflies transcript → person                                 |
| `organized_by` | Person organized the meeting | Meeting → organizer                                           |
| `participant`  | Person involved in a thread  | Email thread → each participant                               |
| `in_folder`    | File lives in a folder       | Drive file → parent folder                                    |
| `from_domain`  | Person's email domain        | Person → domain node                                          |
| `has_email`    | Person → their email address | Person → person:[email@company.com](mailto:email@company.com) |

### Why typed edges

<Info>
  A vector store knows `"Cole Smith"` is near `"Project Phoenix"` in embedding space. It doesn't know **why**. The graph knows Cole `attended` the Phoenix kickoff, `sent_by` three emails about the launch, and `replied_to` the legal review thread on April 7.
</Info>

Typed edges turn "find similar text" into "reason about who, when, and why."

### Multi-hop traversal

The reasoning loop walks 2–3 hops out from a seed node via recursive CTE:

```sql theme={null}
WITH RECURSIVE reachable AS (
  SELECT id, 0 AS depth FROM graph_nodes WHERE id = $seed
  UNION
  SELECT n.id, r.depth + 1
  FROM reachable r
  JOIN graph_edges e ON e.source_id = r.id OR e.target_id = r.id
  JOIN graph_nodes n ON n.id = CASE WHEN e.source_id = r.id
                                    THEN e.target_id ELSE e.source_id END
  WHERE r.depth < 2
)
SELECT * FROM reachable;
```

The graph lives in Postgres. You can `SELECT` against it, join it to your operational data, and inspect it in any Postgres client.

## Layer 2 — Fused Retrieval via Reciprocal Rank Fusion

Every query runs two rankers in parallel over the `graph_nodes` and `observations` tables:

<CardGroup cols={2}>
  <Card title="BM25 keyword relevance" icon="spell-check">
    Postgres full-text search with `ts_rank_cd` cover-density ranking on `tsvector` columns. Title weighted `A`, body weighted `B`. Parsed via `websearch_to_tsquery` for safe handling of punctuation.
  </Card>

  <Card title="Vector similarity" icon="vector-square">
    pgvector HNSW indexes with cosine distance. Embedding model: OpenAI `text-embedding-3-small` (1536 dimensions).
  </Card>
</CardGroup>

Results fuse by **rank position** via Reciprocal Rank Fusion with `k = 60`:

```
rrf_score = 1 / (60 + vec_rank) + 1 / (60 + bm25_rank)
```

### Why RRF and not a weighted sum

<Warning>
  Cosine similarity is bounded `[0, 1]` with most matches \~0.3–0.7. `ts_rank_cd` is unbounded and usually 0.01–0.3. Adding them with fixed weights means **vector dominates almost every query** — keyword matches on rare terms (names, table identifiers, proper nouns) get crowded out.
</Warning>

RRF sidesteps this entirely. It uses each ranker's *opinion* about ordering, not the raw score. Documents that rank well on both lists naturally rise to the top. `k = 60` is the standard constant from Cormack, Clarke & Büttcher (2009); it dampens the weight of top ranks slightly so a single ranker's #1 doesn't automatically win.

<Tip>
  Vespa does it this way. Elasticsearch's latest hybrid does it this way. We do it this way.
</Tip>

### Implementation sketch

<CodeGroup>
  ```python search.py theme={null}
  async def search_nodes(tenant, project, query, embedding=None, limit=20):
      """Returns graph_nodes ordered by RRF(vec_sim, bm25)."""
      rows = await pool.fetch("""
        WITH candidates AS (
          SELECT *,
                 1 - (embedding <=> $emb::vector) AS vec_sim,
                 ts_rank_cd(search_vector, websearch_to_tsquery('english', $q))
                   AS bm25
          FROM graph_nodes
          WHERE tenant_id=$t AND project_id=$p
            AND (
              (embedding IS NOT NULL AND 1 - (embedding <=> $emb::vector) > 0.2)
              OR (search_vector @@ websearch_to_tsquery('english', $q))
            )
        ),
        vec_ranked  AS (SELECT id, ROW_NUMBER() OVER (ORDER BY vec_sim DESC) r FROM candidates),
        bm25_ranked AS (SELECT id, ROW_NUMBER() OVER (ORDER BY bm25    DESC) r FROM candidates)
        SELECT c.*,
               COALESCE(1.0/(60 + v.r), 0) + COALESCE(1.0/(60 + b.r), 0) AS rrf
        FROM candidates c
        LEFT JOIN vec_ranked  v ON c.id = v.id
        LEFT JOIN bm25_ranked b ON c.id = b.id
        ORDER BY rrf DESC
        LIMIT $limit
      """, ...)
  ```

  ```sql schema.sql theme={null}
  -- tsvector column for BM25
  ALTER TABLE graph_nodes ADD COLUMN IF NOT EXISTS search_vector TSVECTOR;
  CREATE INDEX IF NOT EXISTS idx_nodes_search ON graph_nodes USING GIN (search_vector);

  -- Title weighted 'A', content weighted 'B'
  CREATE OR REPLACE FUNCTION graph_nodes_search_vector_update() RETURNS trigger AS $$
  BEGIN
      NEW.search_vector :=
          setweight(to_tsvector('english', coalesce(NEW.title, '')),   'A') ||
          setweight(to_tsvector('english', coalesce(NEW.content, '')), 'B');
      RETURN NEW;
  END;
  $$ LANGUAGE plpgsql;

  CREATE TRIGGER trg_graph_nodes_search_vector
      BEFORE INSERT OR UPDATE ON graph_nodes
      FOR EACH ROW EXECUTE FUNCTION graph_nodes_search_vector_update();
  ```
</CodeGroup>

## Layer 3 — Semantic Memory with Decay

Every user conversation produces observations — typed facts extracted by Claude Haiku from question-answer pairs:

| Type         | Meaning                                          |
| ------------ | ------------------------------------------------ |
| `fact`       | Stable truths about people, systems, preferences |
| `decision`   | Choices made and their rationale                 |
| `commitment` | Things someone said they'd do, with a deadline   |
| `risk`       | Concerns or blockers that were flagged           |
| `insight`    | Analytical conclusions drawn from data           |
| `pattern`    | Recurring behaviors or practices                 |

### Importance math

<Steps>
  <Step title="Initial score: 0.5">
    Every observation starts at importance 0.5.
  </Step>

  <Step title="Strengthened on reference: × 1.1">
    If the observation is pulled into a later conversation, importance multiplies by 1.1 (capped at 1.0).
  </Step>

  <Step title="Decayed when unused: × 0.9">
    Per conversation it's not referenced in, importance multiplies by 0.9.
  </Step>

  <Step title="Pruned below 0.05">
    Observations that decay past 0.05 are removed.
  </Step>
</Steps>

### Co-occurrence edges

When multiple observations are retrieved together enough times, a weighted edge forms between them. Over time, the memory graph encodes not just what Fabric knows but **what knowledge travels together**.

### Grounded in the knowledge graph

<Note>
  Every observation points back at the source content — the email thread, the meeting transcript, the Slack message where the fact originated. This is the difference between mem0 (floating memories with no provenance) and Fabric (facts with citations).
</Note>

## Layer 4 — Databases as First-Class Citizens

Fabric connects directly to PostgreSQL and MySQL. Not API wrappers — real connections with schema discovery.

<Steps>
  <Step title="Connect" icon="plug">
    Provide credentials once. Stored encrypted per-tenant with AES-256.
  </Step>

  <Step title="Discover" icon="magnifying-glass">
    Fabric introspects the schema: tables, columns, types, primary keys, foreign keys. The schema becomes queryable context for the agent.
  </Step>

  <Step title="Query" icon="terminal">
    Ask a natural-language question. Fabric generates SQL against your actual schema, executes it via `asyncpg` or `aiomysql`, and returns results in chat with the query visible for auditing.
  </Step>
</Steps>

### Example: cross-source join

> **Q:** Pull the top 10 customers by revenue from `public.customers` who opened a support ticket in the last 7 days, and show me any Slack `#support` threads that mention them.

Fabric generates:

```sql theme={null}
SELECT c.name, c.revenue
FROM public.customers c
JOIN public.support_tickets t ON t.customer_id = c.id
WHERE t.created_at > NOW() - INTERVAL '7 days'
ORDER BY c.revenue DESC LIMIT 10;
```

Executes against your Postgres. Then searches the graph for `#support` Slack threads whose content matches any of those customer names. Returns a unified result with both.

## Why the layers compound

| Layer                     | Provides                                                  | Can't do alone                        |
| ------------------------- | --------------------------------------------------------- | ------------------------------------- |
| **Knowledge graph**       | Relational reasoning, typed edges, cross-source timelines | Retrieve content to ground an answer  |
| **Fused retrieval (RRF)** | Keyword + semantic precision in one query                 | Relationships between entities        |
| **Semantic memory**       | Accumulated understanding that adapts over time           | Grounding in live data                |
| **Database connections**  | Real operational data in the same reasoning loop          | Structure across unstructured sources |

<Tip>
  Graph alone is a CRM with extra steps. Search alone is Elasticsearch. Memory alone is mem0. Database connections alone is Metabase with a chat wrapper. **The combination is Fabric.**
</Tip>
