Remove Freeform and Find from UI. Allow Description to be added to Reviewed job

2026-06-29 13:09:01 +01:00
parent 48f958de6c
commit 04bbbebd5a
10 changed files with 394 additions and 403 deletions
--- a/README.md
+++ b/README.md
@@ -172,6 +172,13 @@ FRONTEND_PORT=3000
 MODEL_NAME=deepseek-ai/DeepSeek-OCR
 HF_HOME=/models

+# OCR model selection (DeepSeek + Ollama)
+ENABLE_DEEPSEEK_LOCAL=true                          # register the local GPU model
+OLLAMA_BASE_URL=http://host.docker.internal:11434   # external Ollama host
+OLLAMA_MODELS=glm-ocr,llama3.2-vision,minicpm-v,qwen2.5vl
+DEFAULT_OCR_MODEL=deepseek-local                    # deepseek-local or ollama:<tag>
+OLLAMA_TIMEOUT=300                                  # per-request timeout (seconds)
+
 # Upload Configuration
 MAX_UPLOAD_SIZE_MB=100  # Maximum file upload size

@@ -186,13 +193,47 @@ CROP_MODE=true         # Enable dynamic cropping for large images
 - `API_HOST`: Backend API host (default: 0.0.0.0)
 - `API_PORT`: Backend API port (default: 8000)
 - `FRONTEND_PORT`: Frontend port (default: 3000)
- `MODEL_NAME`: HuggingFace model identifier
+- `MODEL_NAME`: HuggingFace model identifier for the local DeepSeek-OCR model
 - `HF_HOME`: Model cache directory
+- `ENABLE_DEEPSEEK_LOCAL`: Register the local DeepSeek-OCR model (set `false` for an Ollama-only deployment with no GPU model loaded)
+- `OLLAMA_BASE_URL`: URL of an external Ollama server the backend calls for non-DeepSeek models
+- `OLLAMA_MODELS`: Comma-separated Ollama vision model tags to expose in the UI (pull them on the Ollama host first, e.g. `ollama pull glm-ocr`)
+- `DEFAULT_OCR_MODEL`: Model id selected by default (`deepseek-local` or `ollama:<tag>`)
+- `OLLAMA_TIMEOUT`: Per-request timeout in seconds for Ollama calls
 - `MAX_UPLOAD_SIZE_MB`: Maximum file upload size in megabytes
 - `BASE_SIZE`: Base image processing size (affects memory usage)
 - `IMAGE_SIZE`: Tile size for dynamic cropping
 - `CROP_MODE`: Enable/disable dynamic image cropping

+### Choosing an OCR Model
+
+The **Model** selector (next to the Mode selector) chooses which backend runs the OCR:
+
+- **DeepSeek-OCR (local GPU)** — the default. Loaded lazily on first use. Supports
+  every mode including grounding/bounding-box modes (Find), plus the Advanced
+  Settings (base size, crop mode, etc.).
+- **Ollama models** — any vision model pulled on your Ollama host and listed in
+  `OLLAMA_MODELS` (e.g. `glm-ocr`, `llama3.2-vision`). These run remotely on the
+  Ollama server. They return **plain text only**: bounding boxes are not produced,
+  so grounding modes (Find) and the DeepSeek-specific Advanced Settings are ignored
+  / disabled when an Ollama model is selected.
+
+Setup for Ollama models:
+
+```bash
+# On the machine running Ollama
+ollama pull glm-ocr
+ollama pull llama3.2-vision
+
+# Point the backend at it (in .env), then restart
+OLLAMA_BASE_URL=http://host.docker.internal:11434
+OLLAMA_MODELS=glm-ocr,llama3.2-vision
+```
+
+`GET /api/models` returns the registered models and their capabilities; the UI
+populates the selector from it. The model used for each job is stored on the job
+record (`ocr_model`) and shown in the Browse Jobs view.
+
 ## Tech Stack

 ### Frontend
@@ -377,6 +418,7 @@ For large images, the model uses dynamic cropping:

 **Parameters:**
 - `image` (file, required) - Image file to process (up to 100MB)
+- `model` (string) - OCR model id from `GET /api/models` (default: registry default). Grounding/Advanced settings apply to DeepSeek only.
 - `mode` (string) - OCR mode: `plain_ocr` | `describe` | `find_ref` | `freeform`
 - `prompt` (string) - Custom prompt for freeform mode
 - `grounding` (bool) - Enable bounding boxes (auto-enabled for find_ref)
@@ -416,6 +458,7 @@ Process PDF documents with OCR and export to various formats.

 **Parameters:**
 - `pdf_file` (file, required) - PDF file to process (up to 100MB)
+- `model` (string) - OCR model id from `GET /api/models` (default: registry default)
 - `mode` (string) - OCR mode: `plain_ocr` | `describe` | `find_ref` | `freeform`
 - `prompt` (string) - Custom prompt for freeform mode
 - `output_format` (string) - Output format: `markdown` | `html` | `docx` | `json`