Remove Freeform and Find from UI. Allow Description to be added to Reviewed job
This commit is contained in:
45
README.md
45
README.md
@@ -172,6 +172,13 @@ FRONTEND_PORT=3000
|
||||
MODEL_NAME=deepseek-ai/DeepSeek-OCR
|
||||
HF_HOME=/models
|
||||
|
||||
# OCR model selection (DeepSeek + Ollama)
|
||||
ENABLE_DEEPSEEK_LOCAL=true # register the local GPU model
|
||||
OLLAMA_BASE_URL=http://host.docker.internal:11434 # external Ollama host
|
||||
OLLAMA_MODELS=glm-ocr,llama3.2-vision,minicpm-v,qwen2.5vl
|
||||
DEFAULT_OCR_MODEL=deepseek-local # deepseek-local or ollama:<tag>
|
||||
OLLAMA_TIMEOUT=300 # per-request timeout (seconds)
|
||||
|
||||
# Upload Configuration
|
||||
MAX_UPLOAD_SIZE_MB=100 # Maximum file upload size
|
||||
|
||||
@@ -186,13 +193,47 @@ CROP_MODE=true # Enable dynamic cropping for large images
|
||||
- `API_HOST`: Backend API host (default: 0.0.0.0)
|
||||
- `API_PORT`: Backend API port (default: 8000)
|
||||
- `FRONTEND_PORT`: Frontend port (default: 3000)
|
||||
- `MODEL_NAME`: HuggingFace model identifier
|
||||
- `MODEL_NAME`: HuggingFace model identifier for the local DeepSeek-OCR model
|
||||
- `HF_HOME`: Model cache directory
|
||||
- `ENABLE_DEEPSEEK_LOCAL`: Register the local DeepSeek-OCR model (set `false` for an Ollama-only deployment with no GPU model loaded)
|
||||
- `OLLAMA_BASE_URL`: URL of an external Ollama server the backend calls for non-DeepSeek models
|
||||
- `OLLAMA_MODELS`: Comma-separated Ollama vision model tags to expose in the UI (pull them on the Ollama host first, e.g. `ollama pull glm-ocr`)
|
||||
- `DEFAULT_OCR_MODEL`: Model id selected by default (`deepseek-local` or `ollama:<tag>`)
|
||||
- `OLLAMA_TIMEOUT`: Per-request timeout in seconds for Ollama calls
|
||||
- `MAX_UPLOAD_SIZE_MB`: Maximum file upload size in megabytes
|
||||
- `BASE_SIZE`: Base image processing size (affects memory usage)
|
||||
- `IMAGE_SIZE`: Tile size for dynamic cropping
|
||||
- `CROP_MODE`: Enable/disable dynamic image cropping
|
||||
|
||||
### Choosing an OCR Model
|
||||
|
||||
The **Model** selector (next to the Mode selector) chooses which backend runs the OCR:
|
||||
|
||||
- **DeepSeek-OCR (local GPU)** — the default. Loaded lazily on first use. Supports
|
||||
every mode including grounding/bounding-box modes (Find), plus the Advanced
|
||||
Settings (base size, crop mode, etc.).
|
||||
- **Ollama models** — any vision model pulled on your Ollama host and listed in
|
||||
`OLLAMA_MODELS` (e.g. `glm-ocr`, `llama3.2-vision`). These run remotely on the
|
||||
Ollama server. They return **plain text only**: bounding boxes are not produced,
|
||||
so grounding modes (Find) and the DeepSeek-specific Advanced Settings are ignored
|
||||
/ disabled when an Ollama model is selected.
|
||||
|
||||
Setup for Ollama models:
|
||||
|
||||
```bash
|
||||
# On the machine running Ollama
|
||||
ollama pull glm-ocr
|
||||
ollama pull llama3.2-vision
|
||||
|
||||
# Point the backend at it (in .env), then restart
|
||||
OLLAMA_BASE_URL=http://host.docker.internal:11434
|
||||
OLLAMA_MODELS=glm-ocr,llama3.2-vision
|
||||
```
|
||||
|
||||
`GET /api/models` returns the registered models and their capabilities; the UI
|
||||
populates the selector from it. The model used for each job is stored on the job
|
||||
record (`ocr_model`) and shown in the Browse Jobs view.
|
||||
|
||||
## Tech Stack
|
||||
|
||||
### Frontend
|
||||
@@ -377,6 +418,7 @@ For large images, the model uses dynamic cropping:
|
||||
|
||||
**Parameters:**
|
||||
- `image` (file, required) - Image file to process (up to 100MB)
|
||||
- `model` (string) - OCR model id from `GET /api/models` (default: registry default). Grounding/Advanced settings apply to DeepSeek only.
|
||||
- `mode` (string) - OCR mode: `plain_ocr` | `describe` | `find_ref` | `freeform`
|
||||
- `prompt` (string) - Custom prompt for freeform mode
|
||||
- `grounding` (bool) - Enable bounding boxes (auto-enabled for find_ref)
|
||||
@@ -416,6 +458,7 @@ Process PDF documents with OCR and export to various formats.
|
||||
|
||||
**Parameters:**
|
||||
- `pdf_file` (file, required) - PDF file to process (up to 100MB)
|
||||
- `model` (string) - OCR model id from `GET /api/models` (default: registry default)
|
||||
- `mode` (string) - OCR mode: `plain_ocr` | `describe` | `find_ref` | `freeform`
|
||||
- `prompt` (string) - Custom prompt for freeform mode
|
||||
- `output_format` (string) - Output format: `markdown` | `html` | `docx` | `json`
|
||||
|
||||
Reference in New Issue
Block a user