Add README

2026-04-24 04:05:05 +00:00
parent e5649148f7
commit a717be674a
1 changed files with 105 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,105 @@
+# LLM Trainer
+
+A web-based interface for building fine-tuning datasets and training LLMs on a remote GPU server. The frontend connects to a FastAPI backend that SSHes into your GPU machine, runs the [synthetic-data-kit](https://github.com/anthropics/synthetic-data-kit) pipeline, and streams live output back to the browser.
+
+## Architecture
+
+```
+Browser (React/Vite)
+    │
+    ▼
+FastAPI Backend (Docker, port 8080)
+    │  REST + WebSocket
+    ▼
+Remote GPU Server (SSH)
+    ├── synthetic-data-kit  →  parse / generate / curate / export
+    └── train.py            →  fine-tuning run
+```
+
+Ollama (port 11434 on the GPU server) is used for model management — pulling, listing, and deleting models.
+
+## Pipeline stages
+
+| Stage | Directory | Description |
+|-------|-----------|-------------|
+| `input` | `/opt/synthetic/…/data/input` | Raw source documents |
+| `parsed` | `/opt/synthetic/…/data/parsed` | Ingested plain text |
+| `generated` | `/opt/synthetic/…/data/generated` | Raw QA pairs |
+| `curated` | `/opt/synthetic/…/data/curated` | Filtered pairs (quality threshold) |
+| `final` | `/opt/synthetic/…/data/final` | Export-ready JSONL/CSV |
+
+## Getting started
+
+### Prerequisites
+
+- Docker + Docker Compose
+- A remote machine with:
+  - SSH access
+  - `miniconda3` with a `synthetic-data` conda env containing `synthetic-data-kit`
+  - `train.py` at `/opt/synthetic/train.py`
+  - Ollama running on port `11434`
+
+### Run
+
+```bash
+docker compose up --build
+```
+
+| Service | URL |
+|---------|-----|
+| Frontend | http://localhost:3000 |
+| Backend API | http://localhost:8080 |
+| API docs | http://localhost:8080/docs |
+
+The `OLLAMA_URL` environment variable in `docker-compose.yml` defaults to `http://192.168.2.47:11434` — update it to point to your GPU server.
+
+### Configuration
+
+The pipeline reads its config from `/opt/synthetic/synthetic-data-kit/config.yaml` on the remote server. You can edit it live from the **Config Editor** tab in the UI.
+
+## Project structure
+
+```
+├── backend/
+│   ├── main.py          # FastAPI app — all REST and WebSocket endpoints
+│   ├── pipeline.py      # Command builders for synthetic-data-kit stages
+│   ├── ssh_client.py    # Paramiko SSH manager (connect, stream, upload, shell)
+│   ├── gpu.py           # nvidia-smi GPU stats
+│   ├── requirements.txt
+│   └── Dockerfile
+├── frontend/
+│   ├── src/
+│   │   ├── App.jsx
+│   │   └── components/
+│   │       ├── ConnectionPanel.jsx   # SSH connect / GPU status
+│   │       ├── DocumentManager.jsx   # Upload & browse pipeline files
+│   │       ├── PipelineRunner.jsx    # Run ingest → create → curate → save
+│   │       ├── QAPairViewer.jsx      # Preview generated QA pairs
+│   │       ├── TrainingMonitor.jsx   # Launch training, live log stream
+│   │       ├── ModelManager.jsx      # Pull / delete Ollama models
+│   │       ├── ConfigEditor.jsx      # Edit remote config.yaml
+│   │       └── Terminal.jsx          # Interactive SSH terminal (xterm.js)
+│   └── Dockerfile
+├── packaging/
+│   └── build-deb.sh     # Build a .deb installer
+└── docker-compose.yml
+```
+
+## API reference
+
+Key endpoints (full docs at `/docs`):
+
+| Method | Path | Description |
+|--------|------|-------------|
+| `POST` | `/api/connect` | Open SSH connection |
+| `GET` | `/api/status` | Connection + GPU status |
+| `GET` | `/api/files/{stage}` | List files at a pipeline stage |
+| `POST` | `/api/upload` | Upload a file to the `input` stage |
+| `WS` | `/api/pipeline/ingest` | Stream ingest (parse) output |
+| `WS` | `/api/pipeline/create` | Stream QA pair generation |
+| `WS` | `/api/pipeline/curate` | Stream curation / filtering |
+| `WS` | `/api/pipeline/save` | Stream export to JSONL/CSV |
+| `WS` | `/api/train` | Stream fine-tuning run |
+| `WS` | `/api/terminal` | Interactive SSH shell |
+| `GET` | `/api/models` | List Ollama models |
+| `WS` | `/api/models/pull` | Pull an Ollama model |