Export AI Conversations — Build a Searchable Archive chats

I counted recently. Across ChatGPT, Claude, Hermes Agent, and a couple of other tools I've tried and abandoned, I've accumulated well over 2,000 AI conversations. Some are five-minute throwaways — "what's the syntax for that one SQLite pragma again?" Others are multi-hour deep dives into kernel internals, Docker networking, or Cambridge syllabus design decisions that I genuinely need to reference later.

And I can't find any of them.

Not easily, anyway. Every platform has its own UI, its own search bar with its own limitations, its own export format — or no export at all. My AI chat history is scattered across five proprietary silos, and every day I add more conversations that I'll probably never see again.

I saw the same problem framed directly in an r/n8n post as the "Chat Graveyard" crisis: hundreds of valuable LLM conversations spread across Claude, Gemini, and other accounts, full of architectures, workflows, and implementation notes, but trapped in isolated chat histories.

This post is my answer. I'm going to walk through how I'm fixing this for myself — exporting chats from every platform I use, building a local searchable archive with SQLite full-text search, and extracting actual learnings from my AI conversation history. By the end, you'll have a system that turns your chat graveyard into a searchable second brain.

The Problem Isn't Storage. It's Retrieval.

Most AI platforms do store your history. ChatGPT keeps it in a sidebar. Claude has a conversations list. Hermes Agent has a local session database. The issue isn't that the data is gone — it's that you can't use it.

Try searching for "that conversation where I debugged the Docker networking issue with the custom bridge" across three platforms. You can't. Each platform silo has its own search, and none of them index the content well enough to find old chats by topic, concept, or code snippet. Platform search is designed to answer "what did I talk about last Tuesday?" — not "show me every conversation where I discussed Linux cgroups."

Worse, some platforms don't let you export at all without going through a GDPR data request, and even then you get a JSON blob that's technically complete but practically useless without tooling.

This is a data management problem — and if you're a CS student, it's one that maps directly to your syllabus. Databases. File handling. Search algorithms. Full-text indexing. These aren't abstract concepts. They're the exact tools you use to solve the Chat Graveyard problem.

Step 1: Export Everything

First, get your data out. Every platform is different. Here's what works today.

ChatGPT

OpenAI gives you an export option under Settings → Data Controls → Export Data. You'll get an email with a download link containing a zip file. Inside: conversations.json — a single JSON file with every conversation, every message, every metadata field.

The format is straightforward but verbose. Each conversation is an object with a title, create_time, and a mapping of message nodes. Here's a quick Python script to flatten it into something workable:

import json
from datetime import datetime

with open("conversations.json") as f:
    data = json.load(f)

messages = []
for conv in data:
    title = conv.get("title", "Untitled")
    created = datetime.fromtimestamp(conv.get("create_time", 0))
    for node_id, node in conv.get("mapping", {}).items():
        msg = node.get("message")
        if msg and msg.get("content", {}).get("parts"):
            text = " ".join(msg["content"]["parts"])
            role = msg.get("author", {}).get("role", "unknown")
            messages.append({
                "conversation": title,
                "created": created.isoformat(),
                "role": role,
                "content": text.strip(),
            })

print(f"Extracted {len(messages)} messages from {len(data)} conversations")

This gives you a flat list of messages with their conversation title and timestamp. You can write this to a CSV, pipe it into SQLite, or feed it into whatever tool you're using.

Claude (Anthropic)

Claude's export story is less polished. Anthropic provides a data export request form under account settings, and you'll receive a zip of JSON files — one per conversation. Each file is named by conversation UUID and contains a structured chat log.

The parsing logic is different but the output shape is the same: extract text, attach metadata, flatten to rows.

import json
import glob
from pathlib import Path

messages = []
for filepath in glob.glob("exports/claude-export/*.json"):
    with open(filepath) as f:
        conv = json.load(f)
    title = conv.get("name", Path(filepath).stem)
    created = conv.get("created_at", "")
    for msg in conv.get("chat_messages", []):
        messages.append({
            "conversation": title,
            "created": created,
            "role": msg.get("sender", "unknown"),
            "content": msg.get("text", "").strip(),
        })

print(f"Extracted {len(messages)} messages from {len(glob.glob('exports/claude-export/*.json'))} files")

Hermes Agent (Local)

If you use Hermes Agent, you're in the best position of the three. Hermes stores everything locally in a SQLite session database — no export step needed. The database is already searchable by design. You can query it directly:

import sqlite3
import os

db = sqlite3.connect(os.path.expanduser("~/.hermes/state.db"))
rows = db.execute("""
    SELECT s.title, m.role, m.content, m.timestamp
    FROM messages m
    JOIN sessions s ON m.session_id = s.id
    ORDER BY m.timestamp
""").fetchall()

print(f"Found {len(rows)} messages across sessions")

This is the experience every platform should offer. Local-first, SQLite-backed, yours by default. Most aren't there yet — which is why we're building our own archive.

Other Platforms

GitHub Copilot Chat: Sessions are stored in VS Code's state database. You can pull them from ~/.config/Code/User/globalStorage/github.copilot-chat/ — it's JSON, but the schema changes between releases.
Perplexity, Poe, Pi: Export options vary by platform and account type, and they change over time. Check current account settings before assuming your data is trapped.
Google Gemini: Google Takeout includes Gemini conversations in a structured JSON format. The export is available through your Google Account data settings.

The pattern is the same regardless of platform: extract text, attach metadata (conversation title, timestamp, role), and flatten into rows. Once everything is in the same shape, you can merge it into one archive.

Step 2: Build a Searchable Archive with SQLite FTS5

Now you have flat message lists from each platform. The next step is getting them into a database where you can actually search them — and I don't mean LIKE '%cgroup%'. I mean full-text search.

SQLite has a built-in full-text search engine called FTS5. It's fast, it's included with Python's standard library via the sqlite3 module, and it handles the kinds of queries you actually want to run against your chat history: phrase search, prefix search, boolean operators, and ranked results.

Here's the database schema:

CREATE TABLE conversations (
    id INTEGER PRIMARY KEY,
    title TEXT NOT NULL,
    platform TEXT NOT NULL,      -- 'chatgpt', 'claude', 'hermes'
    created_at TEXT,
    source_file TEXT              -- original export file path
);

CREATE TABLE messages (
    id INTEGER PRIMARY KEY,
    conversation_id INTEGER REFERENCES conversations(id),
    role TEXT NOT NULL,           -- 'user', 'assistant', 'system'
    content TEXT NOT NULL,
    created_at TEXT,
    FOREIGN KEY (conversation_id) REFERENCES conversations(id)
);

-- The FTS index lives in a virtual table
CREATE VIRTUAL TABLE messages_fts USING fts5(
    content,
    role,
    content_rowid='id',
    content='messages'
);

The virtual table messages_fts is an external-content FTS5 index over the messages table. SQLite does not automatically keep this kind of index in sync. You either need triggers to update the FTS table when messages change, or you need to rebuild the FTS index after importing.

Here's the full import pipeline — it reads JSON exports from ChatGPT and Claude, normalizes everything into the schema above, and then rebuilds the FTS index after import:

import sqlite3
import json
import glob
from pathlib import Path

db = sqlite3.connect("chat_archive.db")
db.execute("PRAGMA journal_mode=WAL")

# Create schema
db.executescript("""
CREATE TABLE IF NOT EXISTS conversations (
    id INTEGER PRIMARY KEY,
    title TEXT NOT NULL,
    platform TEXT NOT NULL,
    created_at TEXT,
    source_file TEXT
);

CREATE TABLE IF NOT EXISTS messages (
    id INTEGER PRIMARY KEY,
    conversation_id INTEGER REFERENCES conversations(id),
    role TEXT NOT NULL,
    content TEXT NOT NULL,
    created_at TEXT
);

CREATE VIRTUAL TABLE IF NOT EXISTS messages_fts USING fts5(
    content,
    role,
    content_rowid='id',
    content='messages'
);
""")

def import_conversation(db, platform, title, created, source, messages):
    """Insert one conversation and all its messages into the archive."""
    cur = db.execute(
        "INSERT INTO conversations (title, platform, created_at, source_file) VALUES (?, ?, ?, ?)",
        (title, platform, created, source)
    )
    conv_id = cur.lastrowid
    for msg in messages:
        db.execute(
            "INSERT INTO messages (conversation_id, role, content, created_at) VALUES (?, ?, ?, ?)",
            (conv_id, msg["role"], msg["content"], msg.get("created", created))
        )
    return conv_id

# Import from ChatGPT export
with open("exports/chatgpt/conversations.json") as f:
    chatgpt_data = json.load(f)

for conv in chatgpt_data:
    msgs = []
    for node in conv.get("mapping", {}).values():
        msg = node.get("message")
        if msg and msg.get("content", {}).get("parts"):
            msgs.append({
                "role": msg.get("author", {}).get("role", "unknown"),
                "content": " ".join(msg["content"]["parts"]).strip(),
            })
    if msgs:
        import_conversation(db, "chatgpt", conv.get("title", "Untitled"),
                          conv.get("create_time", ""), "conversations.json", msgs)

# Import from Claude export
for filepath in glob.glob("exports/claude/*.json"):
    with open(filepath) as f:
        conv = json.load(f)
    msgs = [{"role": m["sender"], "content": m["text"].strip()}
            for m in conv.get("chat_messages", []) if m.get("text")]
    if msgs:
        import_conversation(db, "claude", conv.get("name", Path(filepath).stem),
                          conv.get("created_at", ""), filepath, msgs)
                          
db.execute("INSERT INTO messages_fts(messages_fts) VALUES('rebuild')")

db.commit()
print(f"Archive ready: {db.execute('SELECT COUNT(*) FROM messages').fetchone()[0]} messages indexed")

Now the fun part — actually finding things.

Search Examples

import sqlite3

db = sqlite3.connect("chat_archive.db")

# Simple search: find every message mentioning cgroups
def search(term, limit=10):
    return db.execute("""
        SELECT m.content, m.role, c.title, c.platform,
               snippet(messages_fts, 1, '<mark>', '</mark>', '...', 40) AS context
        FROM messages_fts f
        JOIN messages m ON f.rowid = m.id
        JOIN conversations c ON m.conversation_id = c.id
        WHERE messages_fts MATCH ?
        ORDER BY rank
        LIMIT ?
    """, (term, limit)).fetchall()

# Phrase search: exact phrase matching
results = search('"docker networking"')

# Boolean: find conversations with Docker but not Swarm
results = search('docker NOT swarm')

# Prefix: catch cgroup, cgroups, cgroup2 all at once
results = search('cgroup*')

for row in results:
    print(f"[{row[3]}] {row[2]}: {row[4]}")

The snippet() function returns the matching text with highlighted context — like a mini search engine for your own brain. FTS5's rank column sorts by relevance automatically. On a local SQLite database, simple FTS5 searches over tens of thousands of messages can feel nearly instant.

If you've studied databases for IGCSE or A-Level Computer Science, this is a real-world application of every concept in Chapter 9: tables, primary keys, foreign keys, SQL queries with SELECT/FROM/WHERE/ORDER BY, and indexing for performance. The difference is that instead of a textbook example about a library database, you're building a tool that actually solves a problem you have.

Step 3: Extract Learnings — Don't Just Archive

A searchable archive is better than a Chat Graveyard, but it's still just a search box over raw conversation logs. The real value is turning those conversations into durable knowledge — notes, summaries, and insights you can reference without digging through chat history.

Here's my approach. After importing everything into SQLite, I run a few analysis scripts:

Find Your Most-Discussed Topics

import sqlite3
from collections import Counter

db = sqlite3.connect("chat_archive.db")

# Count conversations by keyword
keywords = ["docker", "linux", "python", "database", "security",
            "kubernetes", "sql", "api", "networking", "testing"]
counts = Counter()

for kw in keywords:
    count = db.execute("""
        SELECT COUNT(DISTINCT m.conversation_id)
        FROM messages_fts f
        JOIN messages m ON f.rowid = m.id
        WHERE messages_fts MATCH ?
    """, (kw,)).fetchone()[0]
    counts[kw] = count

for kw, count in counts.most_common():
    print(f"  {kw}: {count} conversations")

This tells you which topics you've actually been exploring with AI. The results are often surprising — I discovered I'd spent far more time discussing filesystem internals than I would have guessed. That's a signal. Maybe I should write that up.

Identify High-Value Conversations

Not all chats are equal. A five-minute syntax check isn't worth revisiting. A two-hour architecture discussion is. I score conversations by message count, code snippet density, and recency:

def score_conversations(db, min_msgs=10):
    return db.execute("""
        SELECT c.id, c.title, c.platform, COUNT(m.id) as msg_count,
               SUM(CASE WHEN m.content LIKE '%```%' THEN 1 ELSE 0 END) as code_blocks
        FROM conversations c
        JOIN messages m ON m.conversation_id = c.id
        GROUP BY c.id
        HAVING msg_count >= ?
        ORDER BY msg_count DESC
    """, (min_msgs,)).fetchall()

for row in score_conversations(db):
    print(f"[{row[2]}] {row[1]}: {row[3]} msgs, {row[4]} code blocks")

Long conversations with lots of code are usually the ones worth reviewing. Extract the key insights into your actual note-taking system — Obsidian, Notion, a plain markdown file. The chat archive is the source of truth; your notes are the curated surface.

The CS Student Angle: This Is Data Management

If you're studying computer science at IGCSE or A-Level, you're learning about databases, SQL, file handling, and search algorithms in theory. The Chat Graveyard problem is those concepts applied to a real, personal use case.

Here's how this project maps to the Cambridge IGCSE Computer Science syllabus:

Databases (Chapter 9): You design a database with tables, fields, primary keys, and foreign keys. You write SQL queries using SELECT, FROM, WHERE, and ORDER BY.
File handling: You read and parse multiple file formats (JSON, SQLite databases) — exactly the file operations you need for Paper 2 programming tasks.
Data types: You work with text, integers, dates, and booleans across different schema designs.
Search and indexing: FTS5 is a production-grade implementation of the search and indexing concepts discussed in the syllabus. You can see how an inverted index works by querying FTS5's internal tables.

For A-Level students, the project extends further:

Data normalization: Merging exports from multiple platforms into a unified schema forces you to think about data consistency, deduplication, and schema design.
Python scripting: The import pipeline exercises file I/O, JSON parsing, database interaction, and error handling — all assessed in Paper 2 and Paper 4.

I've written about Python file handling for IGCSE and A-Level CS before, and this project is a natural extension. It's a real problem with real code, not a contrived textbook exercise.

Docker Compose: The Archive Stack

If you're running a homelab — and if you're reading this site, there's a decent chance you are — you can containerize the whole thing. Here's a docker-compose.yml that wraps the archive in a lightweight web UI using Datasette:

version: "3.8"
services:
  archive:
    image: python:3.11-slim
    volumes:
      - ./chat_archive.db:/data/chat_archive.db
      - ./exports:/exports:ro
      - ./import.py:/import.py
    working_dir: /data
    command: tail -f /dev/null  # run import.py manually

  datasette:
    image: datasetteproject/datasette:latest
    ports:
      - "8001:8001"
    volumes:
      - ./chat_archive.db:/data/chat_archive.db
    command: datasette /data/chat_archive.db -h 0.0.0.0 -p 8001

Datasette gives you a web-based SQL interface over your archive. You can run saved queries, browse tables, and even publish your (sanitized) archive as a static site. For CS students, it's a way to inspect the database structure visually while you learn SQL.

This fits into the same homelab pattern I've covered in my Proxmox homelab setup guide and my experiments running AI models on old hardware. Self-hosting your own data — whether it's AI models or AI conversations — is the theme.

What I've Learned from Two Thousand Conversations

After building this archive, I ran the topic analysis script and found something I didn't expect: about 40% of my AI conversations were about topics I never wrote about. Kernel internals. Filesystem design. Packet filtering internals. I asked a lot of deep technical questions and never turned the answers into anything durable.

That's the real Chat Graveyard — not the conversations themselves, but the follow-through that never happened. Every chat where I learned something and then closed the tab without writing it down is a wasted interaction.

The archive changes the equation. When I'm about to write a blog post now, I search the archive first. "What have I already discussed about Docker networking? What questions did I have six months ago that I've since answered?" The archive surfaces context I'd forgotten I had.

I also noticed which AI tools I use for what. ChatGPT is my brainstorming partner — lots of short, exploratory conversations. Claude handles the deep technical dives. Hermes Agent automates the repetitive stuff — which makes its conversation logs more like a sysadmin's audit trail than a chat history. Seeing the usage patterns across platforms was worth the import effort alone.

Don't Let Your Chats Die

The Chat Graveyard isn't inevitable. It's a consequence of platforms treating your conversation history as an afterthought — a sidebar feature, not a first-class data type.

Export your chats. Build the archive. Search it. Extract the learnings. If you're a CS student, this is a portfolio project that demonstrates database design, Python scripting, and systems thinking. If you're a developer, it's a second brain you actually own. If you run a homelab, it's one more service you can self-host.

The scripts in this post are a working starting point you can adapt to your own exports. The SQLite schema is simple enough to extend. Start with one platform, get the import working, and build from there. Your future self — the one trying to remember how you solved that obscure Docker networking bug at 2 AM — will thank you.

Related reading: If you're setting up Hermes Agent or other AI tools on your own hardware, I've covered