Remove Freeform and Find from UI. Allow Description to be added to Reviewed job

This commit is contained in:
Aaron Roberts
2026-06-29 13:09:01 +01:00
parent 48f958de6c
commit 04bbbebd5a
10 changed files with 394 additions and 403 deletions

View File

@@ -172,6 +172,13 @@ FRONTEND_PORT=3000
MODEL_NAME=deepseek-ai/DeepSeek-OCR
HF_HOME=/models
# OCR model selection (DeepSeek + Ollama)
ENABLE_DEEPSEEK_LOCAL=true # register the local GPU model
OLLAMA_BASE_URL=http://host.docker.internal:11434 # external Ollama host
OLLAMA_MODELS=glm-ocr,llama3.2-vision,minicpm-v,qwen2.5vl
DEFAULT_OCR_MODEL=deepseek-local # deepseek-local or ollama:<tag>
OLLAMA_TIMEOUT=300 # per-request timeout (seconds)
# Upload Configuration
MAX_UPLOAD_SIZE_MB=100 # Maximum file upload size
@@ -186,13 +193,47 @@ CROP_MODE=true # Enable dynamic cropping for large images
- `API_HOST`: Backend API host (default: 0.0.0.0)
- `API_PORT`: Backend API port (default: 8000)
- `FRONTEND_PORT`: Frontend port (default: 3000)
- `MODEL_NAME`: HuggingFace model identifier
- `MODEL_NAME`: HuggingFace model identifier for the local DeepSeek-OCR model
- `HF_HOME`: Model cache directory
- `ENABLE_DEEPSEEK_LOCAL`: Register the local DeepSeek-OCR model (set `false` for an Ollama-only deployment with no GPU model loaded)
- `OLLAMA_BASE_URL`: URL of an external Ollama server the backend calls for non-DeepSeek models
- `OLLAMA_MODELS`: Comma-separated Ollama vision model tags to expose in the UI (pull them on the Ollama host first, e.g. `ollama pull glm-ocr`)
- `DEFAULT_OCR_MODEL`: Model id selected by default (`deepseek-local` or `ollama:<tag>`)
- `OLLAMA_TIMEOUT`: Per-request timeout in seconds for Ollama calls
- `MAX_UPLOAD_SIZE_MB`: Maximum file upload size in megabytes
- `BASE_SIZE`: Base image processing size (affects memory usage)
- `IMAGE_SIZE`: Tile size for dynamic cropping
- `CROP_MODE`: Enable/disable dynamic image cropping
### Choosing an OCR Model
The **Model** selector (next to the Mode selector) chooses which backend runs the OCR:
- **DeepSeek-OCR (local GPU)** — the default. Loaded lazily on first use. Supports
every mode including grounding/bounding-box modes (Find), plus the Advanced
Settings (base size, crop mode, etc.).
- **Ollama models** — any vision model pulled on your Ollama host and listed in
`OLLAMA_MODELS` (e.g. `glm-ocr`, `llama3.2-vision`). These run remotely on the
Ollama server. They return **plain text only**: bounding boxes are not produced,
so grounding modes (Find) and the DeepSeek-specific Advanced Settings are ignored
/ disabled when an Ollama model is selected.
Setup for Ollama models:
```bash
# On the machine running Ollama
ollama pull glm-ocr
ollama pull llama3.2-vision
# Point the backend at it (in .env), then restart
OLLAMA_BASE_URL=http://host.docker.internal:11434
OLLAMA_MODELS=glm-ocr,llama3.2-vision
```
`GET /api/models` returns the registered models and their capabilities; the UI
populates the selector from it. The model used for each job is stored on the job
record (`ocr_model`) and shown in the Browse Jobs view.
## Tech Stack
### Frontend
@@ -377,6 +418,7 @@ For large images, the model uses dynamic cropping:
**Parameters:**
- `image` (file, required) - Image file to process (up to 100MB)
- `model` (string) - OCR model id from `GET /api/models` (default: registry default). Grounding/Advanced settings apply to DeepSeek only.
- `mode` (string) - OCR mode: `plain_ocr` | `describe` | `find_ref` | `freeform`
- `prompt` (string) - Custom prompt for freeform mode
- `grounding` (bool) - Enable bounding boxes (auto-enabled for find_ref)
@@ -416,6 +458,7 @@ Process PDF documents with OCR and export to various formats.
**Parameters:**
- `pdf_file` (file, required) - PDF file to process (up to 100MB)
- `model` (string) - OCR model id from `GET /api/models` (default: registry default)
- `mode` (string) - OCR mode: `plain_ocr` | `describe` | `find_ref` | `freeform`
- `prompt` (string) - Custom prompt for freeform mode
- `output_format` (string) - Output format: `markdown` | `html` | `docx` | `json`