From a717be674a61b208209446f52bb61d43c9316b0c Mon Sep 17 00:00:00 2001 From: tocmo0nlord Date: Fri, 24 Apr 2026 04:05:05 +0000 Subject: [PATCH] Add README --- README.md | 105 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 105 insertions(+) create mode 100644 README.md diff --git a/README.md b/README.md new file mode 100644 index 0000000..cfa5f99 --- /dev/null +++ b/README.md @@ -0,0 +1,105 @@ +# LLM Trainer + +A web-based interface for building fine-tuning datasets and training LLMs on a remote GPU server. The frontend connects to a FastAPI backend that SSHes into your GPU machine, runs the [synthetic-data-kit](https://github.com/anthropics/synthetic-data-kit) pipeline, and streams live output back to the browser. + +## Architecture + +``` +Browser (React/Vite) + │ + ▼ +FastAPI Backend (Docker, port 8080) + │ REST + WebSocket + ▼ +Remote GPU Server (SSH) + ├── synthetic-data-kit → parse / generate / curate / export + └── train.py → fine-tuning run +``` + +Ollama (port 11434 on the GPU server) is used for model management — pulling, listing, and deleting models. + +## Pipeline stages + +| Stage | Directory | Description | +|-------|-----------|-------------| +| `input` | `/opt/synthetic/…/data/input` | Raw source documents | +| `parsed` | `/opt/synthetic/…/data/parsed` | Ingested plain text | +| `generated` | `/opt/synthetic/…/data/generated` | Raw QA pairs | +| `curated` | `/opt/synthetic/…/data/curated` | Filtered pairs (quality threshold) | +| `final` | `/opt/synthetic/…/data/final` | Export-ready JSONL/CSV | + +## Getting started + +### Prerequisites + +- Docker + Docker Compose +- A remote machine with: + - SSH access + - `miniconda3` with a `synthetic-data` conda env containing `synthetic-data-kit` + - `train.py` at `/opt/synthetic/train.py` + - Ollama running on port `11434` + +### Run + +```bash +docker compose up --build +``` + +| Service | URL | +|---------|-----| +| Frontend | http://localhost:3000 | +| Backend API | http://localhost:8080 | +| API docs | http://localhost:8080/docs | + +The `OLLAMA_URL` environment variable in `docker-compose.yml` defaults to `http://192.168.2.47:11434` — update it to point to your GPU server. + +### Configuration + +The pipeline reads its config from `/opt/synthetic/synthetic-data-kit/config.yaml` on the remote server. You can edit it live from the **Config Editor** tab in the UI. + +## Project structure + +``` +├── backend/ +│ ├── main.py # FastAPI app — all REST and WebSocket endpoints +│ ├── pipeline.py # Command builders for synthetic-data-kit stages +│ ├── ssh_client.py # Paramiko SSH manager (connect, stream, upload, shell) +│ ├── gpu.py # nvidia-smi GPU stats +│ ├── requirements.txt +│ └── Dockerfile +├── frontend/ +│ ├── src/ +│ │ ├── App.jsx +│ │ └── components/ +│ │ ├── ConnectionPanel.jsx # SSH connect / GPU status +│ │ ├── DocumentManager.jsx # Upload & browse pipeline files +│ │ ├── PipelineRunner.jsx # Run ingest → create → curate → save +│ │ ├── QAPairViewer.jsx # Preview generated QA pairs +│ │ ├── TrainingMonitor.jsx # Launch training, live log stream +│ │ ├── ModelManager.jsx # Pull / delete Ollama models +│ │ ├── ConfigEditor.jsx # Edit remote config.yaml +│ │ └── Terminal.jsx # Interactive SSH terminal (xterm.js) +│ └── Dockerfile +├── packaging/ +│ └── build-deb.sh # Build a .deb installer +└── docker-compose.yml +``` + +## API reference + +Key endpoints (full docs at `/docs`): + +| Method | Path | Description | +|--------|------|-------------| +| `POST` | `/api/connect` | Open SSH connection | +| `GET` | `/api/status` | Connection + GPU status | +| `GET` | `/api/files/{stage}` | List files at a pipeline stage | +| `POST` | `/api/upload` | Upload a file to the `input` stage | +| `WS` | `/api/pipeline/ingest` | Stream ingest (parse) output | +| `WS` | `/api/pipeline/create` | Stream QA pair generation | +| `WS` | `/api/pipeline/curate` | Stream curation / filtering | +| `WS` | `/api/pipeline/save` | Stream export to JSONL/CSV | +| `WS` | `/api/train` | Stream fine-tuning run | +| `WS` | `/api/terminal` | Interactive SSH shell | +| `GET` | `/api/models` | List Ollama models | +| `WS` | `/api/models/pull` | Pull an Ollama model |