Adding missed files

Remove Freeform and Find from UI. Allow Description to be added to Reviewed job
Added job review toggle
2026-06-30 12:16:16 +01:00 · 2026-06-29 13:09:01 +01:00 · 2026-06-23 10:43:44 +01:00 · 2026-06-19 23:12:33 +01:00 · 2026-06-19 17:47:53 +01:00 · 2026-06-10 22:04:52 +01:00
25 changed files with 4010 additions and 505 deletions
--- a/.env
+++ b/.env
@@ -0,0 +1,23 @@
+# DeepSeek OCR Application Configuration
+
+# API Configuration
+API_HOST=0.0.0.0
+API_PORT=8000
+
+# Frontend Configuration
+FRONTEND_PORT=3000
+
+# Model Configuration
+MODEL_NAME=deepseek-ai/DeepSeek-OCR
+HF_HOME=/models
+
+# CORS Configuration (comma-separated origins, defaults to http://localhost:3000)
+CORS_ORIGINS=http://localhost:3000
+
+# Upload Configuration
+MAX_UPLOAD_SIZE_MB=100
+
+# Processing Configuration
+BASE_SIZE=1024
+IMAGE_SIZE=640
+CROP_MODE=true
--- a/.env.example
+++ b/.env.example
@@ -11,9 +11,34 @@ FRONTEND_PORT=3000
 MODEL_NAME=deepseek-ai/DeepSeek-OCR
 HF_HOME=/models

+# OCR model selection
+# Register the local DeepSeek-OCR model (set to false for an Ollama-only deployment)
+ENABLE_DEEPSEEK_LOCAL=true
+# External Ollama host the backend should call (no trailing slash)
+OLLAMA_BASE_URL=http://host.docker.internal:11434
+# Comma-separated Ollama vision model tags to surface in the UI.
+# Pull these on the Ollama host first, e.g. `ollama pull glm-ocr`.
+OLLAMA_MODELS=glm-ocr,llama3.2-vision,minicpm-v,qwen2.5vl
+# Default model id selected in the UI (deepseek-local or ollama:<tag>)
+DEFAULT_OCR_MODEL=deepseek-local
+# Per-request timeout (seconds) for Ollama calls
+OLLAMA_TIMEOUT=300
+
+# CORS Configuration (comma-separated origins, defaults to http://localhost:3000)
+CORS_ORIGINS=http://localhost:3000
+
 # Upload Configuration
 MAX_UPLOAD_SIZE_MB=100

+# PostgreSQL Configuration
+POSTGRES_USER=ocr_user
+POSTGRES_PASSWORD=ocr_password
+POSTGRES_DB=ocr_db
+DATABASE_URL=postgresql://ocr_user:ocr_password@postgres:5432/ocr_db
+
+# OCR Image Storage (host path mounted into container)
+OCR_IMAGES_DIR=/data/ocr_images
+
 # Processing Configuration
 BASE_SIZE=1024
 IMAGE_SIZE=640
--- a/.gitignore
+++ b/.gitignore
@@ -46,7 +46,7 @@ yarn.lock
 pnpm-lock.yaml

 # Environment
-.env
+#.env
 .env.local
 .env.development.local
 .env.test.local
--- a/21
+++ b/21
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2025 rdumasia303
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
--- a/README.md
+++ b/README.md
@@ -1,10 +1,54 @@
 # 🚀 DeepSeek OCR - React + FastAPI

-Modern OCR web application powered by DeepSeek-OCR with a stunning React frontend and FastAPI backend.
+Modern OCR web application powered by DeepSeek-OCR with a stunning React frontend and FastAPI backend. **Now with PDF processing and multi-format document conversion!**

 ![DeepSeek OCR in Action](assets/multi-bird.png)

-> **Recent Updates (v2.1.1)**
+## ✨ What's New in v2.2.0 - PDF Processing & Document Conversion
+
+We've added powerful PDF processing capabilities based on community feedback! Here's what you can do now:
+
+### 📄 Process Entire PDF Documents
+- Upload PDF files up to 100MB
+- Automatic multi-page OCR processing
+- Real-time progress tracking for large documents
+- Extract text from scanned PDFs or image-based documents
+
+### 🔄 Convert to Multiple Formats
+Export your OCR results in the format you need:
+- **Markdown (.md)** - Clean, structured text perfect for documentation
+- **HTML (.html)** - Styled documents with embedded images and tables
+- **Word (.docx)** - Professional documents with formatting, tables, and images
+- **JSON** - Structured data for programmatic access
+
+### 🖼️ Automatic Image Extraction
+- Detects and extracts images from PDF pages
+- Embeds images in exported documents
+- Preserves image placement and context
+
+### 📐 Formula & Formatting Preservation
+- Maintains mathematical formulas (LaTeX syntax)
+- Preserves tables, headings, and document structure
+- Cleans up special characters while keeping formatting intact
+
+### 🎯 Use Cases
+- **Document Digitization** - Convert scanned PDFs to editable formats
+- **Data Extraction** - Pull structured data from forms and invoices
+- **Content Migration** - Convert PDFs to Markdown for wikis/documentation
+- **Academic Papers** - Extract text and formulas from research papers
+- **Business Documents** - Convert reports to Word for editing
+
+---
+
+> **Latest Updates (v2.2.0)** - November 2025
+> - 🎉 **NEW: PDF Processing** - Upload PDFs and extract text from all pages
+> - 🎉 **NEW: Multi-Format Export** - Convert to Markdown, HTML, DOCX, or JSON
+> - 🎉 **NEW: Automatic Image Extraction** - Extract and preserve images from PDFs
+> - 🎉 **NEW: Progress Tracking** - Real-time progress for multi-page documents
+> - ✅ Dual mode: Image OCR + PDF Processing with format conversion
+> - ✅ Enhanced document processing with formula and formatting preservation
+>
+> **Previous Updates (v2.1.1)**
 > - ✅ Fixed image removal button - now properly clears and allows re-upload
 > - ✅ Fixed multiple bounding boxes parsing - handles `[[x1,y1,x2,y2], [x1,y1,x2,y2]]` format
 > - ✅ Simplified to 4 core working modes for better stability
@@ -37,24 +81,80 @@ Modern OCR web application powered by DeepSeek-OCR with a stunning React fronten
   - **Backend API**: http://localhost:8000 (or your configured API_PORT)
   - **API Docs**: http://localhost:8000/docs

+## 🎓 How to Use
+
+### Processing Images (Single Image OCR)
+
+1. Select **"Image OCR"** mode in the toggle
+2. Upload an image (PNG, JPG, WEBP, etc.)
+3. Choose your OCR mode:
+   - **Plain OCR** - Extract all text
+   - **Describe** - Get image description
+   - **Find** - Locate specific terms
+   - **Freeform** - Use custom prompts
+4. Click **"Analyze Image"**
+5. View results with bounding boxes (if enabled)
+6. Copy or download the extracted text
+
+### Processing PDFs (Multi-Page Documents) - NEW!
+
+1. Select **"PDF Processing"** mode in the toggle
+2. Upload a PDF file (up to 100MB)
+3. Choose your OCR mode (same as above)
+4. Select **output format**:
+   - 📝 **Markdown** - For documentation, wikis, GitHub
+   - 🌐 **HTML** - For web publishing, styled viewing
+   - 📄 **DOCX** - For Word editing, professional documents
+   - 📊 **JSON** - For programmatic access, data extraction
+5. Click **"Process PDF"**
+6. Watch the progress bar as pages are processed
+7. Your file downloads automatically when complete!
+
+### Tips for Best Results
+
+- **For scanned documents**: Use higher DPI (144-300) in advanced settings
+- **For tables**: The model excels at extracting structured data
+- **For formulas**: Mathematical notation is preserved in output
+- **For images in PDFs**: Enable "Extract Images" to include them in output
+- **For large PDFs**: JSON format is fastest, DOCX takes longer due to formatting
+
+### Output Format Comparison
+
+| Format | Best For | Features | File Size |
+|--------|----------|----------|-----------|
+| **Markdown** | Documentation, GitHub, wikis | Clean text, tables, code blocks | Smallest |
+| **HTML** | Web viewing, sharing | Styled output, embedded images, tables | Medium |
+| **DOCX** | Editing, professional docs | Full formatting, images, tables | Largest |
+| **JSON** | Data processing, APIs | Structured data, metadata, page info | Small |
+
 ## Features

-### 4 Core OCR Modes
+### Dual Processing Modes
+#### 📸 **Image OCR** (4 Core Modes)
 - **Plain OCR** - Raw text extraction from any image
 - **Describe** - Generate intelligent image descriptions
 - **Find** - Locate specific terms with visual bounding boxes
 - **Freeform** - Custom prompts for specialized tasks

+#### 📄 **PDF Processing** (NEW!)
+- **Multi-Page Processing** - Process entire PDF documents page by page
+- **Format Conversion** - Export to Markdown, HTML, DOCX, or JSON
+- **Image Extraction** - Automatically extract and preserve embedded images
+- **Formula Preservation** - Maintain mathematical formulas and special formatting
+- **Progress Tracking** - Real-time progress updates for large documents
+
 ### UI Features
 - 🎨 Glass morphism design with animated gradients
- 🎯 Drag & drop file upload (up to 100MB by default)
- 🗑️ Easy image removal and re-upload
+- 🎯 Drag & drop file upload (Images up to 10MB, PDFs up to 100MB)
+- 🔄 Easy file removal and re-upload
 - 📦 Grounding box visualization with proper coordinate scaling
 - ✨ Smooth animations (Framer Motion)
- 📋 Copy/Download results
+- 📋 Copy/Download results in multiple formats
 - 🎛️ Advanced settings dropdown
 - 📝 HTML and Markdown rendering for formatted output
 - 🔍 Multiple bounding box support (handles multiple instances of found terms)
+- 📊 Progress bars for multi-page PDF processing
+- 💾 Direct download for converted documents (MD, HTML, DOCX)

 ## Configuration

@@ -72,6 +172,13 @@ FRONTEND_PORT=3000
 MODEL_NAME=deepseek-ai/DeepSeek-OCR
 HF_HOME=/models

+# OCR model selection (DeepSeek + Ollama)
+ENABLE_DEEPSEEK_LOCAL=true                          # register the local GPU model
+OLLAMA_BASE_URL=http://host.docker.internal:11434   # external Ollama host
+OLLAMA_MODELS=glm-ocr,llama3.2-vision,minicpm-v,qwen2.5vl
+DEFAULT_OCR_MODEL=deepseek-local                    # deepseek-local or ollama:<tag>
+OLLAMA_TIMEOUT=300                                  # per-request timeout (seconds)
+
 # Upload Configuration
 MAX_UPLOAD_SIZE_MB=100  # Maximum file upload size

@@ -86,19 +193,68 @@ CROP_MODE=true         # Enable dynamic cropping for large images
 - `API_HOST`: Backend API host (default: 0.0.0.0)
 - `API_PORT`: Backend API port (default: 8000)
 - `FRONTEND_PORT`: Frontend port (default: 3000)
- `MODEL_NAME`: HuggingFace model identifier
+- `MODEL_NAME`: HuggingFace model identifier for the local DeepSeek-OCR model
 - `HF_HOME`: Model cache directory
+- `ENABLE_DEEPSEEK_LOCAL`: Register the local DeepSeek-OCR model (set `false` for an Ollama-only deployment with no GPU model loaded)
+- `OLLAMA_BASE_URL`: URL of an external Ollama server the backend calls for non-DeepSeek models
+- `OLLAMA_MODELS`: Comma-separated Ollama vision model tags to expose in the UI (pull them on the Ollama host first, e.g. `ollama pull glm-ocr`)
+- `DEFAULT_OCR_MODEL`: Model id selected by default (`deepseek-local` or `ollama:<tag>`)
+- `OLLAMA_TIMEOUT`: Per-request timeout in seconds for Ollama calls
 - `MAX_UPLOAD_SIZE_MB`: Maximum file upload size in megabytes
 - `BASE_SIZE`: Base image processing size (affects memory usage)
 - `IMAGE_SIZE`: Tile size for dynamic cropping
 - `CROP_MODE`: Enable/disable dynamic image cropping

+### Choosing an OCR Model
+
+The **Model** selector (next to the Mode selector) chooses which backend runs the OCR:
+
+- **DeepSeek-OCR (local GPU)** — the default. Loaded lazily on first use. Supports
+  every mode including grounding/bounding-box modes (Find), plus the Advanced
+  Settings (base size, crop mode, etc.).
+- **Ollama models** — any vision model pulled on your Ollama host and listed in
+  `OLLAMA_MODELS` (e.g. `glm-ocr`, `llama3.2-vision`). These run remotely on the
+  Ollama server. They return **plain text only**: bounding boxes are not produced,
+  so grounding modes (Find) and the DeepSeek-specific Advanced Settings are ignored
+  / disabled when an Ollama model is selected.
+
+Setup for Ollama models:
+
+```bash
+# On the machine running Ollama
+ollama pull glm-ocr
+ollama pull llama3.2-vision
+
+# Point the backend at it (in .env), then restart
+OLLAMA_BASE_URL=http://host.docker.internal:11434
+OLLAMA_MODELS=glm-ocr,llama3.2-vision
+```
+
+`GET /api/models` returns the registered models and their capabilities; the UI
+populates the selector from it. The model used for each job is stored on the job
+record (`ocr_model`) and shown in the Browse Jobs view.
+
 ## Tech Stack

- **Frontend**: React 18 + Vite 5 + TailwindCSS 3 + Framer Motion 11
- **Backend**: FastAPI + PyTorch + Transformers 4.46 + DeepSeek-OCR
+### Frontend
+- **Framework**: React 18 + Vite 5
+- **Styling**: TailwindCSS 3 + Custom Glass Morphism
+- **Animations**: Framer Motion 11
+- **HTTP Client**: Axios
+- **File Upload**: React Dropzone
+
+### Backend
+- **API Framework**: FastAPI (async Python web framework)
+- **ML/AI**: PyTorch + Transformers 4.46 + DeepSeek-OCR
+- **PDF Processing**: PyMuPDF (fitz) + img2pdf
+- **Document Conversion**:
+  - python-docx (Word documents)
+  - markdown (Markdown processing)
+  - Custom HTML generator
 - **Configuration**: python-decouple for environment management
- **Server**: Nginx (reverse proxy)
+
+### Infrastructure
+- **Server**: Nginx (reverse proxy & static file serving)
 - **Container**: Docker + Docker Compose with multi-stage builds
 - **GPU**: NVIDIA CUDA support (tested on RTX 3090, RTX 5090)

@@ -106,19 +262,26 @@ CROP_MODE=true         # Enable dynamic cropping for large images

 ```
 deepseek-ocr/
-├── backend/           # FastAPI backend
-│   ├── main.py
+├── backend/                  # FastAPI backend
+│   ├── main.py              # Main API with OCR and PDF endpoints
+│   ├── pdf_utils.py         # PDF processing utilities (NEW)
+│   ├── format_converter.py  # Document format conversion (NEW)
 │   ├── requirements.txt
 │   └── Dockerfile
-├── frontend/          # React frontend
+├── frontend/                 # React frontend
 │   ├── src/
 │   │   ├── components/
-│   │   ├── App.jsx
+│   │   │   ├── ImageUpload.jsx    # File upload (images & PDFs)
+│   │   │   ├── PDFProcessor.jsx   # PDF processing UI (NEW)
+│   │   │   ├── ModeSelector.jsx
+│   │   │   ├── ResultPanel.jsx
+│   │   │   └── AdvancedSettings.jsx
+│   │   ├── App.jsx           # Main app with dual mode support
 │   │   └── main.jsx
 │   ├── package.json
 │   ├── nginx.conf
 │   └── Dockerfile
-├── models/            # Model cache
+├── models/                   # Model cache
 └── docker-compose.yml
 ```

@@ -255,6 +418,7 @@ For large images, the model uses dynamic cropping:

 **Parameters:**
 - `image` (file, required) - Image file to process (up to 100MB)
+- `model` (string) - OCR model id from `GET /api/models` (default: registry default). Grounding/Advanced settings apply to DeepSeek only.
 - `mode` (string) - OCR mode: `plain_ocr` | `describe` | `find_ref` | `freeform`
 - `prompt` (string) - Custom prompt for freeform mode
 - `grounding` (bool) - Enable bounding boxes (auto-enabled for find_ref)
@@ -288,6 +452,64 @@ For large images, the model uses dynamic cropping:
 - **Supports multiple boxes**: When finding multiple instances, format is `[[x1,y1,x2,y2], [x1,y1,x2,y2], ...]`
 - Frontend automatically displays all boxes overlaid on the image with unique colors

+### POST /api/process-pdf (NEW!)
+
+Process PDF documents with OCR and export to various formats.
+
+**Parameters:**
+- `pdf_file` (file, required) - PDF file to process (up to 100MB)
+- `model` (string) - OCR model id from `GET /api/models` (default: registry default)
+- `mode` (string) - OCR mode: `plain_ocr` | `describe` | `find_ref` | `freeform`
+- `prompt` (string) - Custom prompt for freeform mode
+- `output_format` (string) - Output format: `markdown` | `html` | `docx` | `json`
+- `grounding` (bool) - Enable bounding boxes (default: false)
+- `include_caption` (bool) - Add image descriptions (default: false)
+- `extract_images` (bool) - Extract embedded images from PDF (default: true)
+- `dpi` (int) - PDF rendering resolution (default: 144)
+- `base_size` (int) - Base processing size (default: 1024)
+- `image_size` (int) - Tile size for cropping (default: 640)
+- `crop_mode` (bool) - Enable dynamic cropping (default: true)
+
+**Response Formats:**
+
+**JSON Format** (`output_format=json`):
+```json
+{
+  "success": true,
+  "total_pages": 5,
+  "pages": [
+    {
+      "page_number": 1,
+      "text": "Extracted and cleaned text...",
+      "raw_text": "Raw model output with tags...",
+      "boxes": [{"label": "field", "box": [x1, y1, x2, y2]}],
+      "images": ["base64_encoded_image_data..."],
+      "image_dims": {"w": 1920, "h": 1080}
+    }
+  ],
+  "metadata": {
+    "mode": "plain_ocr",
+    "grounding": false,
+    "extract_images": true,
+    "dpi": 144
+  }
+}
+```
+
+**File Downloads** (`output_format=markdown|html|docx`):
+- Returns the document as a downloadable file
+- Markdown: `.md` file with preserved formatting
+- HTML: `.html` file with embedded styling and images
+- DOCX: `.docx` Word document with tables and formatting
+
+**Features:**
+- 📄 Multi-page processing with progress tracking
+- 🖼️ Automatic image extraction and embedding
+- 📐 Formula and formatting preservation
+- 🎨 Styled HTML output with tables and code blocks
+- 📝 Clean Markdown with proper structure
+- 📋 Professional DOCX with headings and tables
+
 ## Examples

 Here are some example images showcasing different OCR capabilities:
@@ -325,3 +547,8 @@ docker-compose build frontend
 ## License

 This project uses the DeepSeek-OCR model. Refer to the model's license terms.
+
+
+<!-- Small note and direct link to license at the bottom -->
+<!-- MIT License: this repository is licensed under the MIT License. See the full text in the LICENSE file. -->
+Note: Licensed under the MIT License. View the full license: [LICENSE](./LICENSE)
--- a/backend/Dockerfile
+++ b/backend/Dockerfile
@@ -12,7 +12,7 @@ COPY requirements.txt .
 RUN pip install --upgrade pip && pip install -r requirements.txt

 # Copy backend code
-COPY main.py .
+COPY *.py .

 EXPOSE 8000

--- a/backend/database.py
+++ b/backend/database.py
@@ -0,0 +1,115 @@
+import os
+import psycopg2
+import psycopg2.extras
+from contextlib import contextmanager
+from decouple import config as env_config
+
+DATABASE_URL = env_config(
+    "DATABASE_URL",
+    default="postgresql://ocr_user:ocr_password@postgres:5432/ocr_db"
+)
+
+
+def _get_conn():
+    return psycopg2.connect(DATABASE_URL, cursor_factory=psycopg2.extras.RealDictCursor)
+
+
+def init_db():
+    """Create tables if they don't exist. Called once at startup."""
+    conn = None
+    try:
+        conn = _get_conn()
+        with conn.cursor() as cur:
+            cur.execute("""
+                CREATE TABLE IF NOT EXISTS ocr_jobs (
+                    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+                    author TEXT,
+                    book TEXT,
+                    chapter TEXT,
+                    page TEXT,
+                    submitted_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+                    image_path TEXT NOT NULL,
+                    original_filename TEXT,
+                    ocr_text TEXT,
+                    status TEXT NOT NULL DEFAULT 'unreviewed',
+                    reviewed_text TEXT,
+                    reviewer_name TEXT,
+                    reviewed_at TIMESTAMPTZ,
+                    mode TEXT
+                )
+            """)
+            # Index for fast full-text-style searches on common fields
+            cur.execute("""
+                CREATE INDEX IF NOT EXISTS ocr_jobs_status_idx ON ocr_jobs(status)
+            """)
+            cur.execute("""
+                CREATE INDEX IF NOT EXISTS ocr_jobs_submitted_at_idx ON ocr_jobs(submitted_at DESC)
+            """)
+            # Add columns introduced after initial schema (safe to run repeatedly)
+            cur.execute("""
+                ALTER TABLE ocr_jobs
+                ADD COLUMN IF NOT EXISTS describe_text TEXT
+            """)
+            cur.execute("""
+                ALTER TABLE ocr_jobs
+                ADD COLUMN IF NOT EXISTS freeform_text TEXT
+            """)
+            cur.execute("""
+                ALTER TABLE ocr_jobs
+                ADD COLUMN IF NOT EXISTS qdrant_synced_at TIMESTAMPTZ
+            """)
+            cur.execute("""
+                ALTER TABLE ocr_jobs
+                ADD COLUMN IF NOT EXISTS updated_at TIMESTAMPTZ
+            """)
+            # Which OCR model produced this job (e.g. "deepseek-local", "ollama:glm-ocr")
+            cur.execute("""
+                ALTER TABLE ocr_jobs
+                ADD COLUMN IF NOT EXISTS ocr_model TEXT
+            """)
+            # Trigger function: stamp updated_at on every row update
+            cur.execute("""
+                CREATE OR REPLACE FUNCTION set_updated_at()
+                RETURNS TRIGGER AS $$
+                BEGIN
+                    NEW.updated_at = NOW();
+                    RETURN NEW;
+                END;
+                $$ LANGUAGE plpgsql
+            """)
+            cur.execute("""
+                CREATE OR REPLACE TRIGGER ocr_jobs_set_updated_at
+                BEFORE UPDATE ON ocr_jobs
+                FOR EACH ROW EXECUTE FUNCTION set_updated_at()
+            """)
+            # Unique constraint: prevent duplicate (author, chapter, page) submissions.
+            # Applies only when all three fields are non-null.
+            cur.execute("""
+                CREATE UNIQUE INDEX IF NOT EXISTS ocr_jobs_author_chapter_page_unique
+                ON ocr_jobs (author, chapter, page)
+                WHERE author IS NOT NULL AND chapter IS NOT NULL AND page IS NOT NULL
+            """)
+        conn.commit()
+        print("Database initialized.")
+    except Exception as exc:
+        print(f"Database init failed: {exc}")
+        if conn:
+            conn.rollback()
+        raise
+    finally:
+        if conn:
+            conn.close()
+
+
+@contextmanager
+def get_db():
+    """Yield a connection and auto-commit/rollback."""
+    conn = _get_conn()
+    try:
+        yield conn
+        conn.commit()
+    except Exception:
+        conn.rollback()
+        raise
+    finally:
+        conn.close()
--- a/backend/format_converter.py
+++ b/backend/format_converter.py
@@ -0,0 +1,326 @@
+"""
+Document Format Conversion Utilities
+Handles conversion to Markdown, HTML, DOCX while preserving formatting
+"""
+
+import re
+from typing import List, Dict, Any
+from io import BytesIO
+from docx import Document
+from docx.shared import Pt, Inches, RGBColor
+from docx.enum.text import WD_PARAGRAPH_ALIGNMENT
+import markdown
+import base64
+from PIL import Image
+
+
+class DocumentConverter:
+    """Handles conversion of OCR results to various document formats"""
+
+    def __init__(self):
+        self.page_separator = '<--- Page Split --->'
+
+    def to_markdown(self, pages_content: List[Dict[str, Any]], include_images: bool = True) -> str:
+        """
+        Convert OCR results to Markdown format
+
+        Args:
+            pages_content: List of page dictionaries with text and metadata
+            include_images: Whether to include image references
+
+        Returns:
+            Markdown formatted string
+        """
+        md_content = []
+
+        for idx, page in enumerate(pages_content):
+            # Add page header
+            md_content.append(f"# Page {idx + 1}\n")
+
+            text = page.get('text', '')
+
+            # Process and clean the text
+            if include_images and 'images' in page:
+                # Replace image placeholders with actual markdown image syntax
+                for img_idx, img_data in enumerate(page.get('images', [])):
+                    placeholder = f"[IMAGE_{img_idx}]"
+                    img_ref = f"![Image {img_idx + 1}](data:image/jpeg;base64,{img_data})"
+                    text = text.replace(placeholder, img_ref)
+
+            md_content.append(text)
+            md_content.append("\n\n---\n\n")  # Page separator
+
+        return "\n".join(md_content)
+
+    def to_html(self, pages_content: List[Dict[str, Any]], include_images: bool = True) -> str:
+        """
+        Convert OCR results to HTML format
+
+        Args:
+            pages_content: List of page dictionaries with text and metadata
+            include_images: Whether to include images
+
+        Returns:
+            HTML formatted string
+        """
+        html_parts = []
+
+        # HTML header
+        html_parts.append("""
+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>OCR Results</title>
+    <style>
+        body {
+            font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
+            max-width: 900px;
+            margin: 40px auto;
+            padding: 20px;
+            line-height: 1.6;
+            background-color: #f5f5f5;
+        }
+        .page {
+            background: white;
+            padding: 40px;
+            margin-bottom: 30px;
+            box-shadow: 0 2px 8px rgba(0,0,0,0.1);
+            border-radius: 8px;
+        }
+        .page-header {
+            color: #333;
+            border-bottom: 2px solid #4CAF50;
+            padding-bottom: 10px;
+            margin-bottom: 20px;
+        }
+        table {
+            border-collapse: collapse;
+            width: 100%;
+            margin: 20px 0;
+        }
+        th, td {
+            border: 1px solid #ddd;
+            padding: 12px;
+            text-align: left;
+        }
+        th {
+            background-color: #4CAF50;
+            color: white;
+        }
+        tr:nth-child(even) {
+            background-color: #f9f9f9;
+        }
+        img {
+            max-width: 100%;
+            height: auto;
+            margin: 15px 0;
+            border-radius: 4px;
+        }
+        code {
+            background-color: #f4f4f4;
+            padding: 2px 6px;
+            border-radius: 3px;
+            font-family: 'Courier New', monospace;
+        }
+        pre {
+            background-color: #f4f4f4;
+            padding: 15px;
+            border-radius: 5px;
+            overflow-x: auto;
+        }
+    </style>
+</head>
+<body>
+    <h1>DeepSeek OCR Results</h1>
+""")
+
+        # Process each page
+        for idx, page in enumerate(pages_content):
+            html_parts.append(f'    <div class="page">')
+            html_parts.append(f'        <h2 class="page-header">Page {idx + 1}</h2>')
+
+            text = page.get('text', '')
+
+            # Handle images if present
+            if include_images and 'images' in page:
+                for img_idx, img_data in enumerate(page.get('images', [])):
+                    placeholder = f"[IMAGE_{img_idx}]"
+                    img_tag = f'<img src="data:image/jpeg;base64,{img_data}" alt="Image {img_idx + 1}" />'
+                    text = text.replace(placeholder, img_tag)
+
+            # Convert markdown to HTML if the text appears to be markdown
+            if self._is_markdown(text):
+                html_content = markdown.markdown(text, extensions=['tables', 'fenced_code'])
+            else:
+                # Otherwise, preserve the HTML or wrap in paragraph
+                html_content = text if '<' in text else f'<p>{text.replace(chr(10), "<br>")}</p>'
+
+            html_parts.append(f'        {html_content}')
+            html_parts.append('    </div>')
+
+        # HTML footer
+        html_parts.append("""
+</body>
+</html>
+""")
+
+        return "\n".join(html_parts)
+
+    def to_docx(self, pages_content: List[Dict[str, Any]], include_images: bool = True) -> BytesIO:
+        """
+        Convert OCR results to DOCX format
+
+        Args:
+            pages_content: List of page dictionaries with text and metadata
+            include_images: Whether to include images
+
+        Returns:
+            BytesIO object containing the DOCX file
+        """
+        doc = Document()
+
+        # Set default font
+        style = doc.styles['Normal']
+        font = style.font
+        font.name = 'Calibri'
+        font.size = Pt(11)
+
+        # Add title
+        title = doc.add_heading('DeepSeek OCR Results', 0)
+        title.alignment = WD_PARAGRAPH_ALIGNMENT.CENTER
+
+        # Process each page
+        for idx, page in enumerate(pages_content):
+            # Add page heading
+            page_heading = doc.add_heading(f'Page {idx + 1}', level=1)
+            page_heading.alignment = WD_PARAGRAPH_ALIGNMENT.LEFT
+
+            text = page.get('text', '')
+
+            # Handle images
+            if include_images and 'images' in page:
+                for img_idx, img_data in enumerate(page.get('images', [])):
+                    placeholder = f"[IMAGE_{img_idx}]"
+
+                    # Add image to document
+                    try:
+                        img_bytes = base64.b64decode(img_data)
+                        img_stream = BytesIO(img_bytes)
+                        doc.add_picture(img_stream, width=Inches(5))
+                        text = text.replace(placeholder, '')
+                    except Exception as e:
+                        print(f"Error adding image to DOCX: {e}")
+
+            # Process text content
+            self._add_formatted_text_to_doc(doc, text)
+
+            # Add page break (except for last page)
+            if idx < len(pages_content) - 1:
+                doc.add_page_break()
+
+        # Save to BytesIO
+        docx_buffer = BytesIO()
+        doc.save(docx_buffer)
+        docx_buffer.seek(0)
+
+        return docx_buffer
+
+    def _is_markdown(self, text: str) -> bool:
+        """Check if text appears to be markdown formatted"""
+        markdown_patterns = [
+            r'^#+\s',  # Headers
+            r'\*\*.*\*\*',  # Bold
+            r'\*.*\*',  # Italic
+            r'^\*\s',  # Lists
+            r'^\d+\.\s',  # Numbered lists
+            r'\[.*\]\(.*\)',  # Links
+            r'```',  # Code blocks
+        ]
+
+        for pattern in markdown_patterns:
+            if re.search(pattern, text, re.MULTILINE):
+                return True
+        return False
+
+    def _add_formatted_text_to_doc(self, doc: Document, text: str):
+        """
+        Add formatted text to document, preserving structure
+
+        Args:
+            doc: Document object
+            text: Text to add
+        """
+        # Split into paragraphs
+        paragraphs = text.split('\n\n')
+
+        for para in paragraphs:
+            if not para.strip():
+                continue
+
+            # Check for headers
+            if para.startswith('# '):
+                doc.add_heading(para.replace('# ', ''), level=1)
+            elif para.startswith('## '):
+                doc.add_heading(para.replace('## ', ''), level=2)
+            elif para.startswith('### '):
+                doc.add_heading(para.replace('### ', ''), level=3)
+            # Check for tables (simple detection)
+            elif '|' in para and para.count('|') > 2:
+                self._add_table_to_doc(doc, para)
+            # Check for code blocks
+            elif para.startswith('```'):
+                code_text = para.strip('```').strip()
+                p = doc.add_paragraph()
+                run = p.add_run(code_text)
+                run.font.name = 'Courier New'
+                run.font.size = Pt(10)
+            else:
+                # Regular paragraph
+                doc.add_paragraph(para.strip())
+
+    def _add_table_to_doc(self, doc: Document, table_text: str):
+        """
+        Add a table to the document from markdown-style table text
+
+        Args:
+            doc: Document object
+            table_text: Table in markdown format
+        """
+        rows = [row.strip() for row in table_text.split('\n') if row.strip()]
+
+        # Filter out separator rows
+        data_rows = [row for row in rows if not re.match(r'^[\|\s\-:]+$', row)]
+
+        if not data_rows:
+            return
+
+        # Parse table data
+        table_data = []
+        for row in data_rows:
+            cells = [cell.strip() for cell in row.split('|')]
+            cells = [c for c in cells if c]  # Remove empty cells
+            if cells:
+                table_data.append(cells)
+
+        if not table_data:
+            return
+
+        # Create table
+        max_cols = max(len(row) for row in table_data)
+        table = doc.add_table(rows=len(table_data), cols=max_cols)
+        table.style = 'Light Grid Accent 1'
+
+        # Populate table
+        for i, row_data in enumerate(table_data):
+            row = table.rows[i]
+            for j, cell_text in enumerate(row_data):
+                if j < len(row.cells):
+                    row.cells[j].text = cell_text
+
+                    # Make header row bold
+                    if i == 0:
+                        for paragraph in row.cells[j].paragraphs:
+                            for run in paragraph.runs:
+                                run.font.bold = True
--- a/backend/main.py
+++ b/backend/main.py
--- a/backend/pdf_utils.py
+++ b/backend/pdf_utils.py
@@ -0,0 +1,215 @@
+"""
+PDF Processing Utilities for DeepSeek OCR
+Handles PDF to image conversion and batch processing
+"""
+
+import ast
+import io
+import re
+from typing import List, Tuple, Dict, Any
+import fitz  # PyMuPDF
+import img2pdf
+from PIL import Image
+import numpy as np
+
+
+def pdf_to_images_high_quality(pdf_bytes: bytes, dpi: int = 144) -> List[Image.Image]:
+    """
+    Convert PDF pages to high-quality PIL images
+
+    Args:
+        pdf_bytes: PDF file as bytes
+        dpi: Resolution for rendering (default: 144)
+
+    Returns:
+        List of PIL Image objects, one per page
+    """
+    images = []
+
+    # Open PDF from bytes
+    pdf_document = fitz.open(stream=pdf_bytes, filetype="pdf")
+
+    # Calculate zoom factor from DPI
+    zoom = dpi / 72.0
+    matrix = fitz.Matrix(zoom, zoom)
+
+    # Process each page
+    for page_num in range(pdf_document.page_count):
+        page = pdf_document[page_num]
+
+        # Render page to pixmap
+        pixmap = page.get_pixmap(matrix=matrix, alpha=False)
+
+        # Allow reasonably large images (200 megapixels) but not decompression bombs
+        Image.MAX_IMAGE_PIXELS = 200_000_000
+
+        # Convert to PIL Image
+        img_data = pixmap.tobytes("png")
+        img = Image.open(io.BytesIO(img_data))
+
+        # Ensure RGB mode
+        if img.mode in ('RGBA', 'LA'):
+            background = Image.new('RGB', img.size, (255, 255, 255))
+            background.paste(img, mask=img.split()[-1] if img.mode == 'RGBA' else None)
+            img = background
+        elif img.mode != 'RGB':
+            img = img.convert('RGB')
+
+        images.append(img)
+
+    pdf_document.close()
+    return images
+
+
+def images_to_pdf(pil_images: List[Image.Image]) -> bytes:
+    """
+    Convert list of PIL images to PDF bytes
+
+    Args:
+        pil_images: List of PIL Image objects
+
+    Returns:
+        PDF file as bytes
+    """
+    if not pil_images:
+        return b''
+
+    image_bytes_list = []
+
+    for img in pil_images:
+        # Ensure RGB mode
+        if img.mode != 'RGB':
+            img = img.convert('RGB')
+
+        # Convert to JPEG bytes
+        img_buffer = io.BytesIO()
+        img.save(img_buffer, format='JPEG', quality=95)
+        img_bytes = img_buffer.getvalue()
+        image_bytes_list.append(img_bytes)
+
+    # Convert to PDF
+    pdf_bytes = img2pdf.convert(image_bytes_list)
+    return pdf_bytes
+
+
+def extract_ref_patterns(text: str) -> Tuple[List[Tuple], List[str], List[str]]:
+    """
+    Extract reference patterns from OCR output
+
+    Args:
+        text: OCR output text with reference tags
+
+    Returns:
+        Tuple of (all_matches, image_matches, other_matches)
+    """
+    pattern = r'(<\|ref\|>(.*?)<\|/ref\|><\|det\|>(.*?)<\|/det\|>)'
+    matches = re.findall(pattern, text, re.DOTALL)
+
+    matches_image = []
+    matches_other = []
+
+    for match in matches:
+        if '<|ref|>image<|/ref|>' in match[0]:
+            matches_image.append(match[0])
+        else:
+            matches_other.append(match[0])
+
+    return matches, matches_image, matches_other
+
+
+def parse_coordinates(ref_text: Tuple, image_width: int, image_height: int) -> Dict[str, Any]:
+    """
+    Parse coordinates from reference text
+
+    Args:
+        ref_text: Tuple of (full_match, label, coordinates)
+        image_width: Image width in pixels
+        image_height: Image height in pixels
+
+    Returns:
+        Dictionary with label and scaled coordinates
+    """
+    try:
+        label_type = ref_text[1]
+        cor_list = ast.literal_eval(ref_text[2])
+
+        # Scale coordinates from 0-999 to actual pixels
+        scaled_boxes = []
+        for points in cor_list:
+            x1, y1, x2, y2 = points
+            scaled_box = [
+                int(x1 / 999 * image_width),
+                int(y1 / 999 * image_height),
+                int(x2 / 999 * image_width),
+                int(y2 / 999 * image_height)
+            ]
+            scaled_boxes.append(scaled_box)
+
+        return {
+            'label': label_type,
+            'boxes': scaled_boxes
+        }
+    except Exception as e:
+        print(f"Error parsing coordinates: {e}")
+        return None
+
+
+def crop_images_from_refs(image: Image.Image, refs: List[Tuple]) -> List[Image.Image]:
+    """
+    Crop images based on reference bounding boxes
+
+    Args:
+        image: Source PIL Image
+        refs: List of reference tuples
+
+    Returns:
+        List of cropped PIL Images
+    """
+    cropped_images = []
+    image_width, image_height = image.size
+
+    for ref in refs:
+        coord_data = parse_coordinates(ref, image_width, image_height)
+        if coord_data and coord_data['label'] == 'image':
+            for box in coord_data['boxes']:
+                x1, y1, x2, y2 = box
+                try:
+                    cropped = image.crop((x1, y1, x2, y2))
+                    cropped_images.append(cropped)
+                except Exception as e:
+                    print(f"Error cropping image: {e}")
+                    continue
+
+    return cropped_images
+
+
+def clean_markdown_content(content: str, image_refs: List[str], other_refs: List[str]) -> str:
+    """
+    Clean markdown content by removing reference tags
+
+    Args:
+        content: Raw OCR output with tags
+        image_refs: List of image reference tags
+        other_refs: List of other reference tags
+
+    Returns:
+        Cleaned markdown content
+    """
+    cleaned = content
+
+    # Remove image reference tags (will be replaced with markdown images)
+    for ref in image_refs:
+        cleaned = cleaned.replace(ref, '')
+
+    # Remove other reference tags and clean up formatting
+    for ref in other_refs:
+        cleaned = cleaned.replace(ref, '')
+
+    # Clean up LaTeX and formatting
+    cleaned = (cleaned
+               .replace('\\coloneqq', ':=')
+               .replace('\\eqqcolon', '=:')
+               .replace('\n\n\n\n', '\n\n')
+               .replace('\n\n\n', '\n\n'))
+
+    return cleaned
--- a/backend/providers.py
+++ b/backend/providers.py
@@ -0,0 +1,489 @@
+"""
+OCR provider abstraction.
+
+Each provider knows how to turn an image + a semantic OCR request (mode, prompt,
+options) into raw model text. DeepSeek-specific prompt tokens and grounding-box
+parsing live here too so the FastAPI routes stay model-agnostic.
+
+Two providers ship today:
+  - DeepSeekLocalProvider  -> the local HF transformers DeepSeek-OCR model (GPU)
+  - OllamaProvider         -> any vision model served by an external Ollama host
+
+The registry is built from environment variables at startup (see build_registry()).
+"""
+
+import os
+import re
+import base64
+import tempfile
+import shutil
+from abc import ABC, abstractmethod
+from typing import List, Dict, Any, Optional
+
+from decouple import config as env_config
+
+# httpx is only needed when an Ollama model is actually used; import lazily so the
+# backend can run DeepSeek-only without the dependency installed.
+try:
+    import httpx
+except Exception:  # pragma: no cover - exercised only when httpx is missing
+    httpx = None
+
+
+# =============================================================================
+# Prompt builders
+# =============================================================================
+def build_prompt(
+    mode: str,
+    user_prompt: str,
+    grounding: bool,
+    find_term: Optional[str],
+    schema: Optional[str],
+    include_caption: bool,
+) -> str:
+    """Build the DeepSeek-OCR prompt (with its special tokens) based on mode."""
+    parts: List[str] = ["<image>"]
+    mode_requires_grounding = mode in {"find_ref", "layout_map", "pii_redact"}
+    if grounding or mode_requires_grounding:
+        parts.append("<|grounding|>")
+
+    parts.append(_instruction_for_mode(mode, user_prompt, find_term, schema, include_caption))
+    return "\n".join(parts)
+
+
+def build_ollama_prompt(
+    mode: str,
+    user_prompt: str,
+    find_term: Optional[str],
+    schema: Optional[str],
+    include_caption: bool,
+) -> str:
+    """Build a plain natural-language prompt for a generic vision model.
+
+    No DeepSeek grounding tokens — Ollama vision models receive the image
+    separately and respond in plain text.
+    """
+    if mode == "plain_ocr":
+        instruction = (
+            "Transcribe all of the text in this image exactly as it appears, "
+            "preserving line breaks and reading order. Output only the transcribed "
+            "text with no commentary."
+        )
+    elif mode == "markdown":
+        instruction = (
+            "Convert this document image to clean GitHub-flavored Markdown, "
+            "preserving headings, lists, and tables. Output only the Markdown."
+        )
+    elif mode == "tables_csv":
+        instruction = (
+            "Extract every table in this image and output CSV only. Use commas with "
+            "minimal quoting. If there are multiple tables, separate them with a line "
+            "containing '---'. Output only the CSV."
+        )
+    elif mode == "tables_md":
+        instruction = (
+            "Extract every table in this image as GitHub-flavored Markdown tables. "
+            "Output only the tables."
+        )
+    elif mode == "kv_json":
+        schema_text = schema.strip() if schema else "{}"
+        instruction = (
+            "Extract the key fields from this image and return strict JSON only "
+            f"(no prose). Use this schema, filling in the values: {schema_text}"
+        )
+    elif mode == "figure_chart":
+        instruction = (
+            "Parse the figure in this image. First extract any numeric series as a "
+            "two-column table (x,y). Then add a line containing '---' followed by a "
+            "two-sentence summary of the chart."
+        )
+    elif mode == "find_ref":
+        key = (find_term or "").strip() or "Total"
+        instruction = (
+            f"Find every occurrence of '{key}' in this image and quote the surrounding "
+            "text for each match. If it does not appear, say so."
+        )
+    elif mode == "layout_map":
+        instruction = (
+            'Identify the layout blocks in this image and return a JSON array of '
+            'objects {"type": one of ["title","paragraph","table","figure"]}. '
+            "Do not include the text content."
+        )
+    elif mode == "pii_redact":
+        instruction = (
+            "Find all emails, phone numbers, postal addresses, and IBANs in this image. "
+            'Return a JSON array of objects {"label", "text"}.'
+        )
+    elif mode == "multilingual":
+        instruction = (
+            "Transcribe all of the text in this image exactly, detecting the language "
+            "automatically and preserving the original script. Output only the text."
+        )
+    elif mode == "describe":
+        instruction = "Describe this image, focusing on the key visible elements."
+    elif mode == "freeform":
+        instruction = user_prompt.strip() if user_prompt else "Transcribe the text in this image."
+    else:
+        instruction = "Transcribe the text in this image."
+
+    if include_caption and mode != "describe":
+        instruction += "\nThen add a one-paragraph description of the image."
+
+    return instruction
+
+
+def _instruction_for_mode(
+    mode: str,
+    user_prompt: str,
+    find_term: Optional[str],
+    schema: Optional[str],
+    include_caption: bool,
+) -> str:
+    """The DeepSeek instruction text (without the <image>/<|grounding|> prefix tokens)."""
+    if mode == "plain_ocr":
+        instruction = "Free OCR."
+    elif mode == "markdown":
+        instruction = "Convert the document to markdown."
+    elif mode == "tables_csv":
+        instruction = (
+            "Extract every table and output CSV only. "
+            "Use commas, minimal quoting. If multiple tables, separate with a line containing '---'."
+        )
+    elif mode == "tables_md":
+        instruction = "Extract every table as GitHub-flavored Markdown tables. Output only the tables."
+    elif mode == "kv_json":
+        schema_text = schema.strip() if schema else "{}"
+        instruction = (
+            "Extract key fields and return strict JSON only. "
+            f"Use this schema (fill the values): {schema_text}"
+        )
+    elif mode == "figure_chart":
+        instruction = (
+            "Parse the figure. First extract any numeric series as a two-column table (x,y). "
+            "Then summarize the chart in 2 sentences. Output the table, then a line '---', then the summary."
+        )
+    elif mode == "find_ref":
+        key = (find_term or "").strip() or "Total"
+        instruction = f"Locate <|ref|>{key}<|/ref|> in the image."
+    elif mode == "layout_map":
+        instruction = (
+            'Return a JSON array of blocks with fields {"type":["title","paragraph","table","figure"],'
+            '"box":[x1,y1,x2,y2]}. Do not include any text content.'
+        )
+    elif mode == "pii_redact":
+        instruction = (
+            'Find all occurrences of emails, phone numbers, postal addresses, and IBANs. '
+            'Return a JSON array of objects {label, text, box:[x1,y1,x2,y2]}.'
+        )
+    elif mode == "multilingual":
+        instruction = "Free OCR. Detect the language automatically and output in the same script."
+    elif mode == "describe":
+        instruction = "Describe this image. Focus on visible key elements."
+    elif mode == "freeform":
+        instruction = user_prompt.strip() if user_prompt else "OCR this image."
+    else:
+        instruction = "OCR this image."
+
+    if include_caption and mode != "describe":
+        instruction = instruction + "\nThen add a one-paragraph description of the image."
+
+    return instruction
+
+
+# =============================================================================
+# Grounding parser (DeepSeek-specific; no-op on plain text)
+# =============================================================================
+DET_BLOCK = re.compile(
+    r"<\|ref\|>(?P<label>.*?)<\|/ref\|>\s*<\|det\|>\s*(?P<coords>\[.*\])\s*<\|/det\|>",
+    re.DOTALL,
+)
+
+
+def clean_grounding_text(text: str) -> str:
+    """Remove grounding tags from text for display, keeping labels."""
+    cleaned = re.sub(
+        r"<\|ref\|>(.*?)<\|/ref\|>\s*<\|det\|>\s*\[.*\]\s*<\|/det\|>",
+        r"\1",
+        text,
+        flags=re.DOTALL,
+    )
+    cleaned = re.sub(r"<\|grounding\|>", "", cleaned)
+    return cleaned.strip()
+
+
+def parse_detections(text: str, image_width: int, image_height: int) -> List[Dict[str, Any]]:
+    """Parse grounding boxes from text and scale 0-999 normalized coords to pixels."""
+    boxes: List[Dict[str, Any]] = []
+    for m in DET_BLOCK.finditer(text or ""):
+        label = m.group("label").strip()
+        coords_str = m.group("coords").strip()
+
+        try:
+            import ast
+
+            parsed = ast.literal_eval(coords_str)
+
+            if (
+                isinstance(parsed, list)
+                and len(parsed) == 4
+                and all(isinstance(n, (int, float)) for n in parsed)
+            ):
+                box_coords = [parsed]
+            elif isinstance(parsed, list):
+                box_coords = parsed
+            else:
+                raise ValueError("Unsupported coords structure")
+
+            for box in box_coords:
+                if isinstance(box, (list, tuple)) and len(box) >= 4:
+                    x1 = int(float(box[0]) / 999 * image_width)
+                    y1 = int(float(box[1]) / 999 * image_height)
+                    x2 = int(float(box[2]) / 999 * image_width)
+                    y2 = int(float(box[3]) / 999 * image_height)
+                    boxes.append({"label": label, "box": [x1, y1, x2, y2]})
+        except Exception as e:
+            print(f"❌ Grounding parse failed: {e}")
+            continue
+
+    return boxes
+
+
+# =============================================================================
+# Providers
+# =============================================================================
+GROUNDING_MODES = {"find_ref", "layout_map", "pii_redact"}
+
+
+class ProviderError(Exception):
+    """Raised when a provider cannot fulfil a request (e.g. backend unreachable)."""
+
+
+class OCRProvider(ABC):
+    """Turns an image + OCR request into raw model text."""
+
+    id: str
+    label: str
+    capabilities: Dict[str, Any]
+
+    @abstractmethod
+    def run(
+        self,
+        image_path: str,
+        *,
+        mode: str,
+        prompt: str,
+        grounding: bool,
+        find_term: Optional[str],
+        schema: Optional[str],
+        include_caption: bool,
+        options: Dict[str, Any],
+    ) -> str:
+        """Return the raw text output of the model for this image/request."""
+
+    def info(self) -> Dict[str, Any]:
+        return {"id": self.id, "label": self.label, "capabilities": self.capabilities}
+
+
+class DeepSeekLocalProvider(OCRProvider):
+    """Local HF transformers DeepSeek-OCR model. Loaded lazily on first use."""
+
+    def __init__(self):
+        self.id = "deepseek-local"
+        self.label = "DeepSeek-OCR (local GPU)"
+        self.capabilities = {"grounding": True, "advanced_settings": True}
+        self._model = None
+        self._tokenizer = None
+
+    @property
+    def loaded(self) -> bool:
+        return self._model is not None and self._tokenizer is not None
+
+    def _ensure_loaded(self):
+        if self.loaded:
+            return
+
+        # Heavy imports kept local so an Ollama-only deployment never needs torch.
+        import torch
+        from transformers import AutoModel, AutoTokenizer
+
+        os.environ.pop("TRANSFORMERS_CACHE", None)
+        model_name = env_config("MODEL_NAME", default="deepseek-ai/DeepSeek-OCR")
+        hf_home = env_config("HF_HOME", default="/models")
+        os.makedirs(hf_home, exist_ok=True)
+
+        print(f"🚀 Loading {model_name}...")
+        tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
+        model = AutoModel.from_pretrained(
+            model_name,
+            trust_remote_code=True,
+            use_safetensors=True,
+            attn_implementation="eager",
+            torch_dtype=torch.bfloat16,
+        ).eval().to("cuda")
+
+        try:
+            if getattr(tokenizer, "pad_token_id", None) is None and getattr(tokenizer, "eos_token_id", None) is not None:
+                tokenizer.pad_token = tokenizer.eos_token
+            if getattr(model.config, "pad_token_id", None) is None and getattr(tokenizer, "pad_token_id", None) is not None:
+                model.config.pad_token_id = tokenizer.pad_token_id
+        except Exception:
+            pass
+
+        self._model = model
+        self._tokenizer = tokenizer
+        print("✅ DeepSeek-OCR loaded and ready!")
+
+    def run(self, image_path, *, mode, prompt, grounding, find_term, schema, include_caption, options):
+        self._ensure_loaded()
+
+        prompt_text = build_prompt(
+            mode=mode,
+            user_prompt=prompt,
+            grounding=grounding,
+            find_term=find_term,
+            schema=schema,
+            include_caption=include_caption,
+        )
+
+        out_dir = tempfile.mkdtemp(prefix="dsocr_")
+        try:
+            res = self._model.infer(
+                self._tokenizer,
+                prompt=prompt_text,
+                image_file=image_path,
+                output_path=out_dir,
+                base_size=int(options.get("base_size", 1024)),
+                image_size=int(options.get("image_size", 640)),
+                crop_mode=bool(options.get("crop_mode", True)),
+                save_results=False,
+                test_compress=bool(options.get("test_compress", False)),
+                eval_mode=True,
+            )
+
+            if isinstance(res, str):
+                text = res.strip()
+            elif isinstance(res, dict) and "text" in res:
+                text = str(res["text"]).strip()
+            elif isinstance(res, (list, tuple)):
+                text = "\n".join(map(str, res)).strip()
+            else:
+                text = ""
+
+            if not text:
+                mmd = os.path.join(out_dir, "result.mmd")
+                if os.path.exists(mmd):
+                    with open(mmd, "r", encoding="utf-8") as fh:
+                        text = fh.read().strip()
+            return text
+        finally:
+            shutil.rmtree(out_dir, ignore_errors=True)
+
+
+class OllamaProvider(OCRProvider):
+    """A single vision model served by an external Ollama host."""
+
+    def __init__(self, tag: str, base_url: str, label: Optional[str] = None):
+        self.tag = tag
+        self.base_url = base_url.rstrip("/")
+        self.id = f"ollama:{tag}"
+        self.label = label or f"{tag} (Ollama)"
+        # Generic vision models don't emit DeepSeek grounding tokens.
+        self.capabilities = {"grounding": False, "advanced_settings": False}
+
+    def run(self, image_path, *, mode, prompt, grounding, find_term, schema, include_caption, options):
+        if httpx is None:
+            raise ProviderError("httpx is not installed; cannot reach Ollama.")
+
+        prompt_text = build_ollama_prompt(
+            mode=mode,
+            user_prompt=prompt,
+            find_term=find_term,
+            schema=schema,
+            include_caption=include_caption,
+        )
+
+        with open(image_path, "rb") as f:
+            img_b64 = base64.b64encode(f.read()).decode("utf-8")
+
+        payload = {
+            "model": self.tag,
+            "prompt": prompt_text,
+            "images": [img_b64],
+            "stream": False,
+        }
+
+        timeout = float(env_config("OLLAMA_TIMEOUT", default=300.0, cast=float))
+        try:
+            resp = httpx.post(f"{self.base_url}/api/generate", json=payload, timeout=timeout)
+            resp.raise_for_status()
+            data = resp.json()
+        except httpx.HTTPStatusError as e:
+            detail = ""
+            try:
+                detail = e.response.json().get("error", "")
+            except Exception:
+                detail = e.response.text[:200]
+            raise ProviderError(f"Ollama returned {e.response.status_code}: {detail}") from e
+        except httpx.HTTPError as e:
+            raise ProviderError(f"Could not reach Ollama at {self.base_url}: {e}") from e
+
+        return (data.get("response") or "").strip()
+
+
+# =============================================================================
+# Registry
+# =============================================================================
+class ModelRegistry:
+    def __init__(self, providers: List[OCRProvider], default_id: str):
+        self._providers: Dict[str, OCRProvider] = {p.id: p for p in providers}
+        # Fall back to the first registered provider if the configured default is gone.
+        self.default_id = default_id if default_id in self._providers else (
+            next(iter(self._providers), None)
+        )
+
+    def get(self, model_id: Optional[str]) -> OCRProvider:
+        chosen = model_id or self.default_id
+        provider = self._providers.get(chosen)
+        if provider is None:
+            raise ProviderError(f"Unknown model '{chosen}'.")
+        return provider
+
+    def list_models(self) -> List[Dict[str, Any]]:
+        out = []
+        for p in self._providers.values():
+            entry = p.info()
+            entry["default"] = (p.id == self.default_id)
+            out.append(entry)
+        return out
+
+
+def build_registry() -> ModelRegistry:
+    """Build the provider registry from environment variables.
+
+    Env:
+      ENABLE_DEEPSEEK_LOCAL  - register the local DeepSeek-OCR model (default: true)
+      OLLAMA_BASE_URL        - Ollama host (default: http://host.docker.internal:11434)
+      OLLAMA_MODELS          - comma-separated tags to surface (e.g. "glm-ocr,llama3.2-vision")
+      DEFAULT_OCR_MODEL      - id to select by default (default: deepseek-local)
+    """
+    providers: List[OCRProvider] = []
+
+    enable_deepseek = env_config("ENABLE_DEEPSEEK_LOCAL", default="true").strip().lower() in {"1", "true", "yes"}
+    if enable_deepseek:
+        providers.append(DeepSeekLocalProvider())
+
+    base_url = env_config("OLLAMA_BASE_URL", default="http://host.docker.internal:11434")
+    raw_tags = env_config("OLLAMA_MODELS", default="")
+    tags = [t.strip() for t in raw_tags.split(",") if t.strip()]
+    for tag in tags:
+        providers.append(OllamaProvider(tag=tag, base_url=base_url))
+
+    default_id = env_config("DEFAULT_OCR_MODEL", default="deepseek-local")
+    if not providers:
+        # Defensive: nothing configured. Register DeepSeek so the app still starts.
+        providers.append(DeepSeekLocalProvider())
+        default_id = "deepseek-local"
+
+    registry = ModelRegistry(providers, default_id)
+    print(f"🧠 OCR models registered: {[p.id for p in providers]} (default: {registry.default_id})")
+    return registry
--- a/backend/requirements.txt
+++ b/backend/requirements.txt
@@ -11,3 +11,9 @@ pillow
 safetensors
 torch
 python-decouple>=3.8
+PyMuPDF>=1.23.0
+img2pdf>=0.5.0
+python-docx>=1.1.0
+markdown>=3.5.0
+psycopg2-binary>=2.9.0
+httpx>=0.27.0
--- a/backend/test_security.py
+++ b/backend/test_security.py
@@ -0,0 +1,150 @@
+"""
+Security regression tests for the eval() RCE vulnerability (OX Security disclosure).
+
+The vulnerability allowed arbitrary code execution via crafted OCR output
+that was passed to eval() in parse_coordinates(). The fix uses ast.literal_eval()
+which only allows literal data structures.
+
+This test is self-contained and does not require backend dependencies.
+
+Run: python test_security.py
+"""
+
+import ast
+
+
+def parse_coordinates(ref_text, image_width, image_height):
+    """
+    Minimal reproduction of pdf_utils.parse_coordinates using the patched code.
+    This mirrors the fixed version that uses ast.literal_eval() instead of eval().
+    """
+    try:
+        label_type = ref_text[1]
+        cor_list = ast.literal_eval(ref_text[2])
+
+        scaled_boxes = []
+        for points in cor_list:
+            x1, y1, x2, y2 = points
+            scaled_box = [
+                int(x1 / 999 * image_width),
+                int(y1 / 999 * image_height),
+                int(x2 / 999 * image_width),
+                int(y2 / 999 * image_height)
+            ]
+            scaled_boxes.append(scaled_box)
+
+        return {
+            'label': label_type,
+            'boxes': scaled_boxes
+        }
+    except Exception as e:
+        print(f"  [Blocked] {type(e).__name__}: {e}")
+        return None
+
+
+def test_legitimate_coordinates():
+    """Verify that normal coordinate parsing still works."""
+    ref_text = ("full_match", "text", "[[312, 339, 480, 681]]")
+    result = parse_coordinates(ref_text, 1000, 1000)
+
+    assert result is not None, "Legitimate coordinates should parse successfully"
+    assert result['label'] == 'text'
+    assert len(result['boxes']) == 1
+    print("PASS: Legitimate coordinates parse correctly")
+
+
+def test_multiple_boxes():
+    """Verify multiple bounding boxes still work."""
+    ref_text = ("full_match", "image", "[[100, 200, 300, 400], [500, 600, 700, 800]]")
+    result = parse_coordinates(ref_text, 1000, 1000)
+
+    assert result is not None, "Multiple boxes should parse successfully"
+    assert len(result['boxes']) == 2
+    print("PASS: Multiple bounding boxes parse correctly")
+
+
+def test_rce_blocked_import_os():
+    """The original exploit: __import__('os').system('...') must be blocked."""
+    malicious = "__import__('os').system('echo HACKED')"
+    ref_text = ("full_match", "exploit", malicious)
+    result = parse_coordinates(ref_text, 1000, 1000)
+
+    assert result is None, "Code execution payload should be rejected"
+    print("PASS: __import__('os').system() payload is blocked")
+
+
+def test_rce_blocked_exec():
+    """exec() based payloads must be blocked."""
+    malicious = "exec('import os; os.system(\"echo HACKED\")')"
+    ref_text = ("full_match", "exploit", malicious)
+    result = parse_coordinates(ref_text, 1000, 1000)
+
+    assert result is None, "exec() payload should be rejected"
+    print("PASS: exec() payload is blocked")
+
+
+def test_rce_blocked_eval():
+    """Nested eval() payloads must be blocked."""
+    malicious = "eval('__import__(\"os\").popen(\"id\").read()')"
+    ref_text = ("full_match", "exploit", malicious)
+    result = parse_coordinates(ref_text, 1000, 1000)
+
+    assert result is None, "Nested eval() payload should be rejected"
+    print("PASS: Nested eval() payload is blocked")
+
+
+def test_rce_blocked_lambda():
+    """Lambda-based payloads must be blocked."""
+    malicious = "(lambda: __import__('os').system('echo HACKED'))()"
+    ref_text = ("full_match", "exploit", malicious)
+    result = parse_coordinates(ref_text, 1000, 1000)
+
+    assert result is None, "Lambda payload should be rejected"
+    print("PASS: Lambda payload is blocked")
+
+
+def test_rce_blocked_comprehension():
+    """List comprehension code execution must be blocked."""
+    malicious = "[__import__('os').system('echo HACKED') for x in [1]]"
+    ref_text = ("full_match", "exploit", malicious)
+    result = parse_coordinates(ref_text, 1000, 1000)
+
+    assert result is None, "List comprehension payload should be rejected"
+    print("PASS: List comprehension payload is blocked")
+
+
+if __name__ == "__main__":
+    print("=" * 60)
+    print("Security Regression Tests (OX Security RCE disclosure)")
+    print("=" * 60)
+    print()
+
+    tests = [
+        test_legitimate_coordinates,
+        test_multiple_boxes,
+        test_rce_blocked_import_os,
+        test_rce_blocked_exec,
+        test_rce_blocked_eval,
+        test_rce_blocked_lambda,
+        test_rce_blocked_comprehension,
+    ]
+
+    passed = 0
+    failed = 0
+    for test in tests:
+        try:
+            test()
+            passed += 1
+        except AssertionError as e:
+            print(f"FAIL: {test.__name__}: {e}")
+            failed += 1
+        except Exception as e:
+            print(f"ERROR: {test.__name__}: {e}")
+            failed += 1
+
+    print()
+    print(f"Results: {passed} passed, {failed} failed out of {len(tests)} tests")
+    if failed == 0:
+        print("All security tests passed - RCE vulnerability is patched.")
+    else:
+        print("WARNING: Some tests failed!")
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -1,4 +1,19 @@
 services:
+  postgres:
+    image: postgres:16-alpine
+    container_name: deepseek-ocr-postgres
+    environment:
+      POSTGRES_USER: ${POSTGRES_USER:-ocr_user}
+      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-ocr_password}
+      POSTGRES_DB: ${POSTGRES_DB:-ocr_db}
+    volumes:
+      - postgres_data:/var/lib/postgresql/data
+    healthcheck:
+      test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER:-ocr_user} -d ${POSTGRES_DB:-ocr_db}"]
+      interval: 5s
+      timeout: 5s
+      retries: 10
+
  backend:
    build: ./backend
    container_name: deepseek-ocr-backend
@@ -10,8 +25,23 @@ services:
      API_HOST: ${API_HOST:-0.0.0.0}
      API_PORT: ${API_PORT:-8000}
      MAX_UPLOAD_SIZE_MB: ${MAX_UPLOAD_SIZE_MB:-100}
+      DATABASE_URL: ${DATABASE_URL:-postgresql://ocr_user:ocr_password@postgres:5432/ocr_db}
+      OCR_IMAGES_DIR: ${OCR_IMAGES_DIR:-/data/ocr_images}
+      ENABLE_DEEPSEEK_LOCAL: ${ENABLE_DEEPSEEK_LOCAL:-true}
+      OLLAMA_BASE_URL: ${OLLAMA_BASE_URL:-http://host.docker.internal:11434}
+      OLLAMA_MODELS: ${OLLAMA_MODELS:-}
+      DEFAULT_OCR_MODEL: ${DEFAULT_OCR_MODEL:-deepseek-local}
+      OLLAMA_TIMEOUT: ${OLLAMA_TIMEOUT:-300}
+    # Lets the container reach an Ollama server running on the Docker host
+    # (works out of the box on Docker Desktop; required for Linux engines).
+    extra_hosts:
+      - "host.docker.internal:host-gateway"
    volumes:
      - ./models:/models
+      - ./ocr_images:/data/ocr_images
+    depends_on:
+      postgres:
+        condition: service_healthy
    deploy:
      resources:
        reservations:
@@ -22,8 +52,6 @@ services:
    shm_size: "4g"
    ports:
      - "${API_PORT:-8000}:${API_PORT:-8000}"
-    networks:
-      - ocr-network

  frontend:
    build: ./frontend
@@ -32,9 +60,10 @@ services:
      - "${FRONTEND_PORT:-3000}:80"
    depends_on:
      - backend
-    networks:
-      - ocr-network
+
+volumes:
+  postgres_data:

 networks:
-  ocr-network:
-    driver: bridge
+  default:
+    name: rw-research
--- a/frontend/package.json
+++ b/frontend/package.json
@@ -10,6 +10,7 @@
  },
  "dependencies": {
    "axios": "^1.6.5",
+    "dompurify": "^3.3.3",
    "framer-motion": "^11.0.0",
    "lucide-react": "^0.344.0",
    "react": "^18.3.1",
--- a/frontend/src/App.jsx
+++ b/frontend/src/App.jsx
@@ -1,66 +1,118 @@
-import { useState, useCallback } from 'react'
+import { useState, useCallback, useEffect } from 'react'
+import { useSuggestions } from './hooks/useSuggestions'
+import { useModels } from './hooks/useModels'
 import { motion, AnimatePresence } from 'framer-motion'
-import { Sparkles, Zap, Loader2 } from 'lucide-react'
+import {
+  Sparkles, Zap, Loader2, Settings, Image as ImageIcon, FileText,
+  Layers, ChevronLeft, CheckCircle2, Database,
+} from 'lucide-react'
 import ImageUpload from './components/ImageUpload'
 import ModeSelector from './components/ModeSelector'
+import ModelSelector from './components/ModelSelector'
 import ResultPanel from './components/ResultPanel'
+import AdvancedSettings from './components/AdvancedSettings'
+import PDFProcessor from './components/PDFProcessor'
+import MetadataForm from './components/MetadataForm'
+import JobsPanel from './components/JobsPanel'
 import axios from 'axios'

 const API_BASE = import.meta.env.VITE_API_URL || '/api'

+const INPUT_CLASS =
+  'w-full bg-white/5 border border-white/10 rounded-lg px-3 py-2 text-sm text-gray-200 ' +
+  'placeholder-gray-600 focus:outline-none focus:border-purple-500/50 transition-colors'
+
 function App() {
+  const [view, setView] = useState('new_job')
+
+  // OCR state
+  const { models, loading: modelsLoading } = useModels()
+  const [model, setModel] = useState(null)
  const [mode, setMode] = useState('plain_ocr')
+  const [fileType, setFileType] = useState('image')
  const [image, setImage] = useState(null)
  const [imagePreview, setImagePreview] = useState(null)
  const [result, setResult] = useState(null)
  const [loading, setLoading] = useState(false)
  const [error, setError] = useState(null)
-  
-  // Form state
+  const [showAdvanced, setShowAdvanced] = useState(false)
+  const [includeCaption, setIncludeCaption] = useState(false)
+
  const [prompt, setPrompt] = useState('')
  const [findTerm, setFindTerm] = useState('')
  const [advancedSettings, setAdvancedSettings] = useState({
-    base_size: 1024,
-    image_size: 640,
-    crop_mode: true,
-    test_compress: false
+    base_size: 1024, image_size: 640, crop_mode: true, test_compress: false,
  })

+  const suggestions = useSuggestions()
+
+  const [metadata, setMetadata] = useState({ author: '', book: '', chapter: '', page: '' })
+  // Results accumulated per mode: { plain_ocr: 'text', describe: 'text', freeform: 'text' }
+  const [modeResults, setModeResults] = useState({})
+  const [editedResults, setEditedResults] = useState({})
+  const [activeResultMode, setActiveResultMode] = useState(null)
+  const [commitLoading, setCommitLoading] = useState(false)
+  const [commitResult, setCommitResult] = useState(null)
+
+  // Modes that produce editable text output and can be committed to the DB
+  const COMMITTABLE_MODES = new Set(['plain_ocr', 'describe'])
+  const MODE_LABELS = { plain_ocr: 'OCR Text', describe: 'Description' }
+
+  // Pick the default model once the list loads
+  useEffect(() => {
+    if (!model && models.length > 0) {
+      setModel((models.find(m => m.default) || models[0]).id)
+    }
+  }, [models, model])
+
+  // Show the full-screen result view once at least one committable mode has a result
+  const showResultView = view === 'new_job' && Object.keys(modeResults).length > 0
+
+  const handleFileTypeChange = useCallback((newType) => {
+    setImage(null)
+    if (imagePreview) URL.revokeObjectURL(imagePreview)
+    setImagePreview(null)
+    setError(null)
+    setResult(null)
+    setFileType(newType)
+  }, [imagePreview])
+
  const handleImageSelect = useCallback((file) => {
    if (file === null) {
-      // Clear everything when removing image
      setImage(null)
-      if (imagePreview) {
-        URL.revokeObjectURL(imagePreview)
-      }
+      if (imagePreview && fileType === 'image') URL.revokeObjectURL(imagePreview)
      setImagePreview(null)
      setError(null)
      setResult(null)
+      setModeResults({})
+      setEditedResults({})
+      setActiveResultMode(null)
+      setCommitResult(null)
    } else {
      setImage(file)
-      setImagePreview(URL.createObjectURL(file))
+      setImagePreview(fileType === 'image' ? URL.createObjectURL(file) : file)
      setError(null)
      setResult(null)
+      setModeResults({})
+      setEditedResults({})
+      setActiveResultMode(null)
+      setCommitResult(null)
    }
-  }, [imagePreview])
+  }, [imagePreview, fileType])

  const handleSubmit = async () => {
-    if (!image) {
-      setError('Please upload an image first')
-      return
-    }
-
+    if (!image) { setError('Please upload an image first'); return }
    setLoading(true)
    setError(null)
-
+    setCommitResult(null)
    try {
      const formData = new FormData()
      formData.append('image', image)
+      if (model) formData.append('model', model)
      formData.append('mode', mode)
      formData.append('prompt', prompt)
-      // Enable grounding only for find mode
      formData.append('grounding', mode === 'find_ref')
-      formData.append('include_caption', false)
+      formData.append('include_caption', includeCaption)
      formData.append('find_term', findTerm)
      formData.append('schema', '')
      formData.append('base_size', advancedSettings.base_size)
@@ -69,12 +121,16 @@ function App() {
      formData.append('test_compress', advancedSettings.test_compress)

      const response = await axios.post(`${API_BASE}/ocr`, formData, {
-        headers: {
-          'Content-Type': 'multipart/form-data',
-        },
+        headers: { 'Content-Type': 'multipart/form-data' },
      })
-
      setResult(response.data)
+      if (COMMITTABLE_MODES.has(mode)) {
+        const text = response.data.text || ''
+        setModeResults(prev => ({ ...prev, [mode]: text }))
+        setEditedResults(prev => ({ ...prev, [mode]: text }))
+        setActiveResultMode(mode)
+      }
+      setCommitResult(null)
    } catch (err) {
      setError(err.response?.data?.detail || err.message || 'An error occurred')
    } finally {
@@ -82,31 +138,61 @@ function App() {
    }
  }

-  const handleCopy = useCallback(() => {
-    if (result?.text) {
-      navigator.clipboard.writeText(result.text)
+  const handleNewAnalysis = () => {
+    setResult(null)
+    setModeResults({})
+    setEditedResults({})
+    setActiveResultMode(null)
+    setCommitResult(null)
+  }
+
+  const handleCommitJob = useCallback(async () => {
+    if (!image) return
+    setCommitLoading(true)
+    setCommitResult(null)
+    try {
+      const formData = new FormData()
+      formData.append('image', image)
+      formData.append('author', metadata.author)
+      formData.append('book', metadata.book)
+      formData.append('chapter', metadata.chapter)
+      formData.append('page', metadata.page)
+      formData.append('ocr_text', editedResults.plain_ocr || '')
+      formData.append('describe_text', editedResults.describe || '')
+      formData.append('freeform_text', editedResults.freeform || '')
+      formData.append('mode', mode)
+      if (model) formData.append('ocr_model', model)
+
+      const response = await axios.post(`${API_BASE}/jobs`, formData, {
+        headers: { 'Content-Type': 'multipart/form-data' },
+      })
+      setCommitResult({ success: true, job: response.data })
+    } catch (err) {
+      setCommitResult({ success: false, error: err.response?.data?.detail || err.message })
+    } finally {
+      setCommitLoading(false)
    }
-  }, [result])
+  }, [image, editedResults, metadata, mode, model])
+
+  const handleCopy = useCallback(() => {
+    const text = (activeResultMode && editedResults[activeResultMode]) || result?.text
+    if (text) navigator.clipboard.writeText(text)
+  }, [activeResultMode, editedResults, result])

  const handleDownload = useCallback(() => {
-    if (!result?.text) return
-    
-    const extensions = {
-      plain_ocr: 'txt',
-      describe: 'txt',
-      find_ref: 'txt',
-      freeform: 'txt',
-    }
-    
-    const ext = extensions[mode] || 'txt'
-    const blob = new Blob([result.text], { type: 'text/plain' })
+    const text = (activeResultMode && editedResults[activeResultMode]) || result?.text
+    if (!text) return
+    const ext = { plain_ocr: 'txt', describe: 'txt', find_ref: 'txt', freeform: 'txt' }[mode] || 'txt'
+    const blob = new Blob([text], { type: 'text/plain' })
    const url = URL.createObjectURL(blob)
    const a = document.createElement('a')
    a.href = url
    a.download = `deepseek-ocr-result.${ext}`
    a.click()
    URL.revokeObjectURL(url)
-  }, [result, mode])
+  }, [activeResultMode, editedResults, result, mode])
+
+  const metaField = (key) => (e) => setMetadata(m => ({ ...m, [key]: e.target.value }))

  return (
    <div className="min-h-screen relative overflow-hidden">
@@ -116,27 +202,13 @@ function App() {
        <div className="absolute inset-0 bg-[url('data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iNjAiIGhlaWdodD0iNjAiIHZpZXdCb3g9IjAgMCA2MCA2MCIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj48ZyBmaWxsPSJub25lIiBmaWxsLXJ1bGU9ImV2ZW5vZGQiPjxwYXRoIGQ9Ik0zNiAxOGMzLjMxIDAgNiAyLjY5IDYgNnMtMi42OSA2LTYgNi02LTIuNjktNi02IDIuNjktNiA2LTZ6TTI0IDZjMy4zMSAwIDYgMi42OSA2IDZzLTIuNjkgNi02IDYtNi0yLjY5LTYtNiAyLjY5LTYgNi02ek00OCAzNmMzLjMxIDAgNiAyLjY5IDYgNnMtMi42OSA2LTYgNi02LTIuNjktNi02IDIuNjktNiA2LTZ6IiBzdHJva2U9InJnYmEoMTQ3LCA1MSwgMjM0LCAwLjEpIiBzdHJva2Utd2lkdGg9IjIiLz48L2c+PC9zdmc+')] opacity-30" />
        <motion.div
          className="absolute top-20 left-20 w-96 h-96 bg-purple-500/10 rounded-full blur-3xl"
-          animate={{
-            scale: [1, 1.2, 1],
-            opacity: [0.3, 0.5, 0.3],
-          }}
-          transition={{
-            duration: 8,
-            repeat: Infinity,
-            ease: "easeInOut"
-          }}
+          animate={{ scale: [1, 1.2, 1], opacity: [0.3, 0.5, 0.3] }}
+          transition={{ duration: 8, repeat: Infinity, ease: 'easeInOut' }}
        />
        <motion.div
          className="absolute bottom-20 right-20 w-96 h-96 bg-cyan-500/10 rounded-full blur-3xl"
-          animate={{
-            scale: [1.2, 1, 1.2],
-            opacity: [0.5, 0.3, 0.5],
-          }}
-          transition={{
-            duration: 8,
-            repeat: Infinity,
-            ease: "easeInOut"
-          }}
+          animate={{ scale: [1.2, 1, 1.2], opacity: [0.5, 0.3, 0.5] }}
+          transition={{ duration: 8, repeat: Infinity, ease: 'easeInOut' }}
        />
      </div>

@@ -144,11 +216,7 @@ function App() {
      <header className="sticky top-0 z-50 glass border-b border-white/10">
        <div className="max-w-7xl mx-auto px-6 py-4">
          <div className="flex items-center justify-between">
-            <motion.div 
-              className="flex items-center gap-3"
-              initial={{ opacity: 0, x: -20 }}
-              animate={{ opacity: 1, x: 0 }}
-            >
+            <motion.div className="flex items-center gap-3" initial={{ opacity: 0, x: -20 }} animate={{ opacity: 1, x: 0 }}>
              <div className="relative">
                <div className="absolute inset-0 bg-gradient-to-r from-purple-500 to-cyan-500 rounded-xl blur-lg opacity-75" />
                <div className="relative bg-gradient-to-br from-purple-600 to-cyan-500 p-2 rounded-xl">
@@ -160,97 +228,353 @@ function App() {
                <p className="text-xs text-gray-400">Next-Gen Vision AI</p>
              </div>
            </motion.div>
+
+            <nav className="flex gap-2">
+              {showResultView && (
+                <motion.button
+                  onClick={handleNewAnalysis}
+                  className="flex items-center gap-2 px-4 py-2 rounded-xl text-sm font-medium glass text-gray-400 hover:bg-white/5 transition-all"
+                  whileHover={{ scale: 1.02 }} whileTap={{ scale: 0.98 }}
+                >
+                  <ChevronLeft className="w-4 h-4" />
+                  New Analysis
+                </motion.button>
+              )}
+              <motion.button
+                onClick={() => setView('new_job')}
+                className={`flex items-center gap-2 px-4 py-2 rounded-xl text-sm font-medium transition-all ${view === 'new_job' ? 'bg-gradient-to-r from-purple-600 to-cyan-600 text-white' : 'glass text-gray-400 hover:bg-white/5'}`}
+                whileHover={{ scale: 1.02 }} whileTap={{ scale: 0.98 }}
+              >
+                <Zap className="w-4 h-4" />
+                New Job
+              </motion.button>
+              <motion.button
+                onClick={() => setView('jobs')}
+                className={`flex items-center gap-2 px-4 py-2 rounded-xl text-sm font-medium transition-all ${view === 'jobs' ? 'bg-gradient-to-r from-purple-600 to-cyan-600 text-white' : 'glass text-gray-400 hover:bg-white/5'}`}
+                whileHover={{ scale: 1.02 }} whileTap={{ scale: 0.98 }}
+              >
+                <Layers className="w-4 h-4" />
+                Browse Jobs
+              </motion.button>
+            </nav>
          </div>
        </div>
      </header>

      {/* Main Content */}
-      <main className="max-w-7xl mx-auto px-6 py-8">
-        <div className="grid lg:grid-cols-2 gap-6">
-          {/* Left Panel - Upload & Controls */}
-          <motion.div
-            initial={{ opacity: 0, y: 20 }}
-            animate={{ opacity: 1, y: 0 }}
-            transition={{ delay: 0.1 }}
-            className="space-y-6"
-          >
-            {/* Mode Selector with integrated inputs */}
-            <ModeSelector 
-              mode={mode} 
-              onModeChange={setMode}
-              prompt={prompt}
-              onPromptChange={setPrompt}
-              findTerm={findTerm}
-              onFindTermChange={setFindTerm}
-            />
+      <main className="max-w-7xl mx-auto px-6 py-6">
+        <AnimatePresence>

-            {/* Image Upload */}
-            <ImageUpload 
-              onImageSelect={handleImageSelect}
-              preview={imagePreview}
-            />
-
-            {/* Action Button */}
-            <motion.button
-              onClick={handleSubmit}
-              disabled={!image || loading}
-              className={`w-full relative overflow-hidden rounded-2xl p-[2px] ${
-                !image || loading ? 'opacity-50 cursor-not-allowed' : ''
-              }`}
-              whileHover={!loading && image ? { scale: 1.02 } : {}}
-              whileTap={!loading && image ? { scale: 0.98 } : {}}
+          {/* ── Full-screen OCR result view ── */}
+          {showResultView ? (
+            <motion.div
+              key="ocr_result"
+              initial={{ opacity: 0, y: 20 }}
+              animate={{ opacity: 1, y: 0 }}
+              exit={{ opacity: 0, y: -20 }}
+              className="flex flex-col gap-4"
            >
-              <div className="absolute inset-0 bg-gradient-to-r from-purple-600 via-pink-600 to-cyan-600 animate-gradient" />
-              <div className="relative bg-dark-100 px-8 py-4 rounded-2xl flex items-center justify-center gap-3">
-                {loading ? (
-                  <>
-                    <Loader2 className="w-5 h-5 animate-spin" />
-                    <span className="font-semibold">Processing Magic...</span>
-                  </>
-                ) : (
-                  <>
-                    <Zap className="w-5 h-5" />
-                    <span className="font-semibold">Analyze Image</span>
-                  </>
-                )}
+              {/* Run additional modes */}
+              <div className="glass p-4 rounded-2xl flex-shrink-0">
+                <div className="mb-3">
+                  <ModelSelector
+                    models={models} value={model} onChange={setModel} loading={modelsLoading}
+                  />
+                </div>
+                <ModeSelector mode={mode} onModeChange={setMode} />
+                <div className="flex items-center gap-3 mt-3">
+                  <motion.button
+                    onClick={handleSubmit}
+                    disabled={loading}
+                    className={`flex items-center gap-2 px-5 py-2 rounded-xl font-medium text-sm transition-all ${loading ? 'opacity-50 cursor-not-allowed bg-white/5' : 'bg-gradient-to-r from-purple-600 to-cyan-600'}`}
+                    whileHover={!loading ? { scale: 1.02 } : {}}
+                    whileTap={!loading ? { scale: 0.98 } : {}}
+                  >
+                    {loading
+                      ? <><Loader2 className="w-4 h-4 animate-spin" /> Processing...</>
+                      : <><Zap className="w-4 h-4" /> Analyze</>}
+                  </motion.button>
+                  {error && <p className="text-sm text-red-400">{error}</p>}
+                </div>
              </div>
-            </motion.button>

-            {error && (
-              <motion.div
-                initial={{ opacity: 0, y: -10 }}
-                animate={{ opacity: 1, y: 0 }}
-                className="glass p-4 rounded-2xl border-red-500/50 bg-red-500/10"
-              >
-                <p className="text-sm text-red-400">{error}</p>
-              </motion.div>
-            )}
-          </motion.div>
+              {/* Image + Text */}
+              <div className="grid gap-6" style={{ gridTemplateColumns: '1fr 1fr', height: '130vh' }}>
+                {imagePreview && typeof imagePreview === 'string' ? (
+                  <div className="glass rounded-2xl overflow-hidden flex items-center justify-center bg-black/20 h-full">
+                    <img
+                      src={imagePreview}
+                      alt="Source"
+                      className="w-full h-full object-contain"
+                    />
+                  </div>
+                ) : (
+                  <div className="glass rounded-2xl flex items-center justify-center h-full">
+                    <p className="text-gray-500 text-sm">No preview</p>
+                  </div>
+                )}
+                <div className="glass rounded-2xl p-4 flex flex-col h-full">
+                  {/* Mode tabs — only shown when multiple modes have results */}
+                  {Object.keys(modeResults).length > 1 && (
+                    <div className="flex gap-1 mb-3 flex-shrink-0">
+                      {Object.keys(modeResults).map(m => (
+                        <button
+                          key={m}
+                          onClick={() => setActiveResultMode(m)}
+                          className={`px-3 py-1 rounded-lg text-xs font-medium transition-colors ${
+                            activeResultMode === m
+                              ? 'bg-purple-600 text-white'
+                              : 'bg-white/5 text-gray-400 hover:bg-white/10'
+                          }`}
+                        >
+                          {MODE_LABELS[m] || m}
+                        </button>
+                      ))}
+                    </div>
+                  )}
+                  <p className="text-xs text-gray-400 mb-2 flex-shrink-0">
+                    {MODE_LABELS[activeResultMode] || 'Result'}
+                    <span className="text-purple-400 ml-1">(edit before committing)</span>
+                  </p>
+                  {loading && COMMITTABLE_MODES.has(mode) ? (
+                    <div className="flex-1 flex items-center justify-center">
+                      <Loader2 className="w-8 h-8 animate-spin text-purple-400" />
+                    </div>
+                  ) : (
+                    <textarea
+                      value={activeResultMode ? (editedResults[activeResultMode] ?? '') : ''}
+                      onChange={e => setEditedResults(prev => ({ ...prev, [activeResultMode]: e.target.value }))}
+                      className="flex-1 w-full bg-transparent text-sm text-gray-200 font-mono resize-none focus:outline-none min-h-0"
+                      placeholder="Run a mode to see results here..."
+                    />
+                  )}
+                </div>
+              </div>

-          {/* Right Panel - Results */}
-          <motion.div
-            initial={{ opacity: 0, y: 20 }}
-            animate={{ opacity: 1, y: 0 }}
-            transition={{ delay: 0.2 }}
-          >
-            <ResultPanel 
-              result={result}
-              loading={loading}
-              imagePreview={imagePreview}
-              onCopy={handleCopy}
-              onDownload={handleDownload}
-            />
-          </motion.div>
-        </div>
+              {/* Metadata row */}
+              <div className="glass p-4 rounded-2xl flex-shrink-0">
+                <datalist id="rv-authors">
+                  {suggestions.authors.map(a => <option key={a} value={a} />)}
+                </datalist>
+                <datalist id="rv-books">
+                  {(suggestions.books || []).map(b => <option key={b} value={b} />)}
+                </datalist>
+                <datalist id="rv-chapters">
+                  {suggestions.chapters.map(c => <option key={c} value={c} />)}
+                </datalist>
+                <div className="grid grid-cols-4 gap-4">
+                  {[
+                    { key: 'author',  label: 'Author',  placeholder: 'Author name',  list: 'rv-authors' },
+                    { key: 'book',    label: 'Book',    placeholder: 'Book title',    list: 'rv-books' },
+                    { key: 'chapter', label: 'Chapter', placeholder: 'Chapter',       list: 'rv-chapters' },
+                    { key: 'page',    label: 'Page',    placeholder: 'Page number',   list: undefined },
+                  ].map(({ key, label, placeholder, list }) => (
+                    <div key={key}>
+                      <label className="text-xs text-gray-400 mb-1 block">{label}</label>
+                      <input
+                        type="text"
+                        list={list}
+                        value={metadata[key]}
+                        onChange={metaField(key)}
+                        placeholder={placeholder}
+                        className={INPUT_CLASS}
+                      />
+                    </div>
+                  ))}
+                </div>
+              </div>
+
+              {/* Commit row */}
+              <div className="flex items-center gap-4 flex-shrink-0">
+                <AnimatePresence>
+                  {commitResult?.success && (
+                    <motion.div
+                      initial={{ opacity: 0, x: -10 }} animate={{ opacity: 1, x: 0 }} exit={{ opacity: 0 }}
+                      className="flex-1 glass p-3 rounded-xl bg-green-500/10 border border-green-500/20"
+                    >
+                      <p className="text-xs text-green-400">
+                        Job saved &mdash; ID: <span className="font-mono">{commitResult.job?.id}</span>
+                      </p>
+                    </motion.div>
+                  )}
+                  {commitResult && !commitResult.success && (
+                    <motion.div
+                      initial={{ opacity: 0, x: -10 }} animate={{ opacity: 1, x: 0 }} exit={{ opacity: 0 }}
+                      className="flex-1 glass p-3 rounded-xl bg-red-500/10 border border-red-500/20"
+                    >
+                      <p className="text-xs text-red-400">{commitResult.error}</p>
+                    </motion.div>
+                  )}
+                </AnimatePresence>
+                <motion.button
+                  onClick={handleCommitJob}
+                  disabled={commitLoading || commitResult?.success}
+                  className={`flex items-center gap-2 px-6 py-3 rounded-xl font-medium text-sm transition-all flex-shrink-0 ${
+                    commitLoading || commitResult?.success
+                      ? 'opacity-50 cursor-not-allowed bg-white/5'
+                      : 'bg-gradient-to-r from-blue-600 to-indigo-600 hover:from-blue-500 hover:to-indigo-500'
+                  }`}
+                  whileHover={!commitLoading && !commitResult?.success ? { scale: 1.02 } : {}}
+                  whileTap={!commitLoading && !commitResult?.success ? { scale: 0.98 } : {}}
+                >
+                  {commitLoading ? (
+                    <><Loader2 className="w-4 h-4 animate-spin" /> Committing...</>
+                  ) : commitResult?.success ? (
+                    <><CheckCircle2 className="w-4 h-4" /> Committed</>
+                  ) : (
+                    <><Database className="w-4 h-4" /> Commit Job</>
+                  )}
+                </motion.button>
+              </div>
+            </motion.div>
+
+          ) : view === 'jobs' ? (
+            <motion.div
+              key="jobs"
+              initial={{ opacity: 0, y: 20 }}
+              animate={{ opacity: 1, y: 0 }}
+              exit={{ opacity: 0, y: -20 }}
+            >
+              <JobsPanel />
+            </motion.div>
+
+          ) : (
+            /* ── Upload / Controls layout ── */
+            <motion.div
+              key="new_job"
+              initial={{ opacity: 0, y: 20 }}
+              animate={{ opacity: 1, y: 0 }}
+              exit={{ opacity: 0, y: -20 }}
+            >
+              <div className="grid lg:grid-cols-2 gap-6">
+                {/* Left Panel */}
+                <motion.div
+                  initial={{ opacity: 0, y: 20 }}
+                  animate={{ opacity: 1, y: 0 }}
+                  transition={{ delay: 0.1 }}
+                  className="space-y-6"
+                >
+                  {/* File Type Toggle */}
+                  <div className="glass p-4 rounded-2xl">
+                    <div className="grid grid-cols-2 gap-2">
+                      <motion.button
+                        onClick={() => handleFileTypeChange('image')}
+                        className={`p-3 rounded-xl text-sm font-medium transition-all flex items-center justify-center gap-2 ${fileType === 'image' ? 'bg-gradient-to-r from-purple-600 to-cyan-600 text-white' : 'glass text-gray-400 hover:bg-white/5'}`}
+                        whileHover={{ scale: 1.02 }} whileTap={{ scale: 0.98 }}
+                      >
+                        <ImageIcon className="w-4 h-4" /> Image OCR
+                      </motion.button>
+                      <motion.button
+                        onClick={() => handleFileTypeChange('pdf')}
+                        className={`p-3 rounded-xl text-sm font-medium transition-all flex items-center justify-center gap-2 ${fileType === 'pdf' ? 'bg-gradient-to-r from-purple-600 to-cyan-600 text-white' : 'glass text-gray-400 hover:bg-white/5'}`}
+                        whileHover={{ scale: 1.02 }} whileTap={{ scale: 0.98 }}
+                      >
+                        <FileText className="w-4 h-4" /> PDF Processing
+                      </motion.button>
+                    </div>
+                  </div>
+
+                  <MetadataForm metadata={metadata} onChange={setMetadata} suggestions={suggestions} />
+
+                  <ModelSelector
+                    models={models} value={model} onChange={setModel} loading={modelsLoading}
+                  />
+
+                  <ModeSelector mode={mode} onModeChange={setMode} />
+
+                  <ImageUpload onImageSelect={handleImageSelect} preview={imagePreview} fileType={fileType} />
+
+                  <motion.button
+                    onClick={() => setShowAdvanced(!showAdvanced)}
+                    className="w-full glass px-4 py-3 rounded-2xl flex items-center justify-between hover:bg-white/5 transition-colors"
+                    whileHover={{ scale: 1.01 }} whileTap={{ scale: 0.99 }}
+                  >
+                    <div className="flex items-center gap-2">
+                      <Settings className="w-4 h-4 text-purple-400" />
+                      <span className="text-sm font-medium text-gray-300">Advanced Settings</span>
+                    </div>
+                    <motion.div animate={{ rotate: showAdvanced ? 180 : 0 }} transition={{ duration: 0.3 }}>
+                      <svg className="w-4 h-4 text-gray-400" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                        <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M19 9l-7 7-7-7" />
+                      </svg>
+                    </motion.div>
+                  </motion.button>
+
+                  <AnimatePresence>
+                    {showAdvanced && (
+                      <AdvancedSettings
+                        settings={advancedSettings} onSettingsChange={setAdvancedSettings}
+                        includeCaption={includeCaption} onIncludeCaptionChange={setIncludeCaption}
+                      />
+                    )}
+                  </AnimatePresence>
+
+                  {fileType === 'pdf' ? (
+                    <PDFProcessor
+                      pdfFile={image} mode={mode} prompt={prompt} model={model}
+                      advancedSettings={advancedSettings} includeCaption={includeCaption}
+                    />
+                  ) : (
+                    <>
+                      <motion.button
+                        onClick={handleSubmit}
+                        disabled={!image || loading}
+                        className={`w-full relative overflow-hidden rounded-2xl p-[2px] ${!image || loading ? 'opacity-50 cursor-not-allowed' : ''}`}
+                        whileHover={!loading && image ? { scale: 1.02 } : {}}
+                        whileTap={!loading && image ? { scale: 0.98 } : {}}
+                      >
+                        <div className="absolute inset-0 bg-gradient-to-r from-purple-600 via-pink-600 to-cyan-600 animate-gradient" />
+                        <div className="relative bg-dark-100 px-8 py-4 rounded-2xl flex items-center justify-center gap-3">
+                          {loading ? (
+                            <><Loader2 className="w-5 h-5 animate-spin" /><span className="font-semibold">Processing Magic...</span></>
+                          ) : (
+                            <><Zap className="w-5 h-5" /><span className="font-semibold">Analyze Image</span></>
+                          )}
+                        </div>
+                      </motion.button>
+
+                      {error && (
+                        <motion.div
+                          initial={{ opacity: 0, y: -10 }} animate={{ opacity: 1, y: 0 }}
+                          className="glass p-4 rounded-2xl border-red-500/50 bg-red-500/10"
+                        >
+                          <p className="text-sm text-red-400">{error}</p>
+                        </motion.div>
+                      )}
+                    </>
+                  )}
+                </motion.div>
+
+                {/* Right Panel - Results (non-plain_ocr modes or loading) */}
+                <motion.div
+                  initial={{ opacity: 0, y: 20 }}
+                  animate={{ opacity: 1, y: 0 }}
+                  transition={{ delay: 0.2 }}
+                >
+                  <ResultPanel
+                    result={result}
+                    loading={loading}
+                    imagePreview={imagePreview}
+                    onCopy={handleCopy}
+                    onDownload={handleDownload}
+                  />
+                </motion.div>
+              </div>
+            </motion.div>
+          )}
+        </AnimatePresence>
      </main>

      {/* Footer */}
      <footer className="mt-20 border-t border-white/10 glass">
-        <div className="max-w-7xl mx-auto px-6 py-8 text-center">
+        <div className="max-w-7xl mx-auto px-6 py-8 text-center space-y-2">
          <p className="text-sm text-gray-400">
-            Powered by <span className="gradient-text font-semibold">DeepSeek-OCR</span> • 
+            Powered by <span className="gradient-text font-semibold">DeepSeek-OCR</span> &bull;
            Built with <span className="text-pink-400">♥</span> using React + FastAPI
          </p>
+          <p className="text-xs text-gray-500">
+            Thanks to <a href="https://github.com/p-xiexin" target="_blank" rel="noopener noreferrer" className="text-purple-400 hover:text-purple-300 transition-colors">@p-xiexin</a> for the clipboard paste idea!
+          </p>
        </div>
      </footer>
    </div>
--- a/frontend/src/components/ImageUpload.jsx
+++ b/frontend/src/components/ImageUpload.jsx
@@ -1,18 +1,54 @@
-import { useCallback } from 'react'
+import { useCallback, useEffect } from 'react'
 import { motion } from 'framer-motion'
 import { useDropzone } from 'react-dropzone'
-import { Upload, Image as ImageIcon, X } from 'lucide-react'
+import { Upload, Image as ImageIcon, X, FileText, Clipboard } from 'lucide-react'

-export default function ImageUpload({ onImageSelect, preview }) {
+export default function ImageUpload({ onImageSelect, preview, fileType = 'image' }) {
  const onDrop = useCallback((acceptedFiles) => {
    if (acceptedFiles?.[0]) {
      onImageSelect(acceptedFiles[0])
    }
  }, [onImageSelect])

+  const isPDF = fileType === 'pdf'
+
+  // Handle clipboard paste
+  useEffect(() => {
+    // Only enable paste for images, not PDFs
+    if (isPDF) return
+
+    const handlePaste = async (e) => {
+      const items = e.clipboardData?.items
+      if (!items) return
+
+      for (let i = 0; i < items.length; i++) {
+        const item = items[i]
+        
+        if (item.type.indexOf('image') !== -1) {
+          e.preventDefault()
+          const blob = item.getAsFile()
+          
+          if (blob) {
+            // Create a File object with a proper name
+            const file = new File([blob], `pasted-image-${Date.now()}.png`, {
+              type: blob.type,
+            })
+            onImageSelect(file)
+          }
+          break
+        }
+      }
+    }
+
+    document.addEventListener('paste', handlePaste)
+    return () => document.removeEventListener('paste', handlePaste)
+  }, [onImageSelect, isPDF])
+
  const { getRootProps, getInputProps, isDragActive } = useDropzone({
    onDrop,
-    accept: {
+    accept: isPDF ? {
+      'application/pdf': ['.pdf']
+    } : {
      'image/*': ['.png', '.jpg', '.jpeg', '.webp', '.gif', '.bmp']
    },
    multiple: false
@@ -21,8 +57,14 @@ export default function ImageUpload({ onImageSelect, preview }) {
  return (
    <div className="glass p-6 rounded-2xl space-y-4">
      <div className="flex items-center justify-between">
-        <h3 className="font-semibold text-gray-200">Upload Image</h3>
-        <ImageIcon className="w-5 h-5 text-purple-400" />
+        <h3 className="font-semibold text-gray-200">
+          {isPDF ? 'Upload PDF' : 'Upload Image'}
+        </h3>
+        {isPDF ? (
+          <FileText className="w-5 h-5 text-purple-400" />
+        ) : (
+          <ImageIcon className="w-5 h-5 text-purple-400" />
+        )}
      </div>

      {!preview ? (
@@ -59,11 +101,25 @@ export default function ImageUpload({ onImageSelect, preview }) {
            
            <div>
              <p className="text-lg font-medium text-gray-200">
-                {isDragActive ? 'Drop it like it\'s hot! 🔥' : 'Drag & drop your image'}
+                {isDragActive
+                  ? 'Drop it like it\'s hot! 🔥'
+                  : isPDF
+                    ? 'Drag & drop your PDF'
+                    : 'Drag & drop your image'
+                }
              </p>
              <p className="text-sm text-gray-400 mt-1">
-                or click to browse • PNG, JPG, WEBP up to 10MB
+                {isPDF
+                  ? 'or click to browse • PDF files up to 100MB'
+                  : 'or click to browse • PNG, JPG, WEBP up to 10MB'
+                }
              </p>
+              {!isPDF && (
+                <p className="text-xs text-purple-400 mt-2 flex items-center justify-center gap-1.5">
+                  <Clipboard className="w-3.5 h-3.5" />
+                  <span>Press Ctrl+V to paste from clipboard</span>
+                </p>
+              )}
            </div>
          </div>
        </motion.div>
@@ -73,11 +129,21 @@ export default function ImageUpload({ onImageSelect, preview }) {
          animate={{ opacity: 1, scale: 1 }}
          className="relative group rounded-2xl overflow-hidden"
        >
-          <img 
-            src={preview} 
-            alt="Preview" 
-            className="w-full rounded-2xl border border-white/10"
-          />
+          {isPDF ? (
+            <div className="flex items-center justify-center p-12 bg-white/5 border border-white/10 rounded-2xl">
+              <div className="text-center">
+                <FileText className="w-16 h-16 mx-auto mb-3 text-purple-400" />
+                <p className="text-sm text-gray-300 font-medium">PDF Ready</p>
+                <p className="text-xs text-gray-500 mt-1">{preview?.name || 'Document loaded'}</p>
+              </div>
+            </div>
+          ) : (
+            <img
+              src={preview}
+              alt="Preview"
+              className="w-full rounded-2xl border border-white/10"
+            />
+          )}
          <div className="absolute top-3 right-3 flex gap-2">
            <motion.button
              onClick={(e) => {
@@ -87,7 +153,7 @@ export default function ImageUpload({ onImageSelect, preview }) {
              className="bg-red-500/90 backdrop-blur-sm px-3 py-2 rounded-full opacity-100 hover:bg-red-600 transition-colors flex items-center gap-2 shadow-lg"
              whileHover={{ scale: 1.05 }}
              whileTap={{ scale: 0.95 }}
-              title="Remove image"
+              title={isPDF ? "Remove PDF" : "Remove image"}
            >
              <X className="w-4 h-4" />
              <span className="text-sm font-medium">Remove</span>
--- a/frontend/src/components/JobsPanel.jsx
+++ b/frontend/src/components/JobsPanel.jsx
@@ -0,0 +1,665 @@
+import { useState, useEffect, useCallback } from 'react'
+import { useSuggestions } from '../hooks/useSuggestions'
+import { useModels } from '../hooks/useModels'
+import { motion, AnimatePresence } from 'framer-motion'
+import {
+  Search, ChevronLeft, ChevronRight, CheckCircle2, Clock,
+  FileText, Loader2, Save, RefreshCw, Trash2, Sparkles,
+} from 'lucide-react'
+import axios from 'axios'
+
+const API_BASE = import.meta.env.VITE_API_URL || '/api'
+
+const INPUT_CLASS =
+  'w-full bg-white/5 border border-white/10 rounded-lg px-3 py-2 text-sm text-gray-200 ' +
+  'placeholder-gray-600 focus:outline-none focus:border-purple-500/50 transition-colors'
+
+const STATUS_COLORS = {
+  unreviewed: 'text-amber-400 bg-amber-400/10 border-amber-400/30',
+  reviewed:   'text-green-400 bg-green-400/10 border-green-400/30',
+}
+
+function StatusBadge({ status }) {
+  const Icon = status === 'reviewed' ? CheckCircle2 : Clock
+  return (
+    <span className={`inline-flex items-center gap-1 px-2 py-0.5 rounded-full text-xs border ${STATUS_COLORS[status] || 'text-gray-400'}`}>
+      <Icon className="w-3 h-3" />
+      {status}
+    </span>
+  )
+}
+
+// ─────────────────────────────────────────────────────────────
+// Full-screen Job Detail
+// ─────────────────────────────────────────────────────────────
+function JobDetail({ jobId, onClose, onReviewed, onDeleted, suggestions = {} }) {
+  const { models } = useModels()
+  const [job, setJob] = useState(null)
+  const [loading, setLoading] = useState(true)
+  const [error, setError] = useState(null)
+
+  const [describeModel, setDescribeModel] = useState('')
+  const [generatingDescribe, setGeneratingDescribe] = useState(false)
+
+  const [editedText, setEditedText]         = useState('')
+  const [editDescribeText, setEditDescribeText] = useState('')
+  const [editFreeformText, setEditFreeformText] = useState('')
+  const [activeTab, setActiveTab]           = useState('ocr')
+  const [editAuthor, setEditAuthor]         = useState('')
+  const [editBook, setEditBook]             = useState('')
+  const [editChapter, setEditChapter]       = useState('')
+  const [editPage, setEditPage]             = useState('')
+  const [reviewerName, setReviewerName]     = useState('')
+
+  const [submitting, setSubmitting] = useState(false)
+  const [saveResult, setSaveResult] = useState(null)
+  const [confirmDelete, setConfirmDelete] = useState(false)
+  const [deleting, setDeleting] = useState(false)
+  const [togglingStatus, setTogglingStatus] = useState(false)
+
+  useEffect(() => {
+    let cancelled = false
+    setLoading(true)
+    setError(null)
+    setSaveResult(null)
+
+    axios.get(`${API_BASE}/jobs/${jobId}`)
+      .then(res => {
+        if (!cancelled) {
+          const d = res.data
+          setJob(d)
+          setEditedText(d.reviewed_text ?? d.ocr_text ?? '')
+          setEditDescribeText(d.describe_text ?? '')
+          setEditFreeformText(d.freeform_text ?? '')
+          setEditAuthor(d.author || '')
+          setEditBook(d.book || '')
+          setEditChapter(d.chapter || '')
+          setEditPage(d.page || '')
+          setReviewerName(d.reviewer_name || '')
+          // Default to the OCR tab when there's OCR text, otherwise Description
+          if (d.reviewed_text || d.ocr_text) setActiveTab('ocr')
+          else setActiveTab('describe')
+        }
+      })
+      .catch(err => {
+        if (!cancelled) setError(err.response?.data?.detail || err.message)
+      })
+      .finally(() => { if (!cancelled) setLoading(false) })
+
+    return () => { cancelled = true }
+  }, [jobId])
+
+  // Default the Describe model to the job's original model (if available) or the registry default
+  useEffect(() => {
+    if (!describeModel && models.length > 0) {
+      const def = models.find(m => m.default) || models[0]
+      const fromJob = job?.ocr_model && models.some(m => m.id === job.ocr_model) ? job.ocr_model : null
+      setDescribeModel(fromJob || def.id)
+    }
+  }, [models, job, describeModel])
+
+  const handleGenerateDescribe = async () => {
+    setGeneratingDescribe(true)
+    setSaveResult(null)
+    try {
+      const res = await axios.post(`${API_BASE}/jobs/${jobId}/describe`, {
+        model: describeModel || null,
+      })
+      setJob(res.data)
+      setEditDescribeText(res.data.describe_text || '')
+      onReviewed(res.data)
+    } catch (err) {
+      setSaveResult({ success: false, error: err.response?.data?.detail || err.message })
+    } finally {
+      setGeneratingDescribe(false)
+    }
+  }
+
+  const handleSave = async () => {
+    if (!reviewerName.trim()) {
+      setSaveResult({ success: false, error: 'Reviewer name is required.' })
+      return
+    }
+    setSubmitting(true)
+    setSaveResult(null)
+    try {
+      const res = await axios.put(`${API_BASE}/jobs/${jobId}/review`, {
+        reviewed_text: editedText,
+        reviewer_name: reviewerName.trim(),
+        author: editAuthor,
+        book: editBook,
+        chapter: editChapter,
+        page: editPage,
+        describe_text: editDescribeText || null,
+        freeform_text: editFreeformText || null,
+      })
+      setJob(res.data)
+      setSaveResult({ success: true })
+      onReviewed(res.data)
+    } catch (err) {
+      setSaveResult({ success: false, error: err.response?.data?.detail || err.message })
+    } finally {
+      setSubmitting(false)
+    }
+  }
+
+  const handleToggleStatus = async () => {
+    // Marking reviewed accepts BOTH the reviewed document text and the description,
+    // so it goes through the full review save (not a status-only flip).
+    if (!isReviewed) {
+      setTogglingStatus(true)
+      try {
+        await handleSave()
+      } finally {
+        setTogglingStatus(false)
+      }
+      return
+    }
+
+    // Reverting to unreviewed preserves the saved reviewed text and description.
+    setTogglingStatus(true)
+    setSaveResult(null)
+    try {
+      const res = await axios.put(`${API_BASE}/jobs/${jobId}/status`, {
+        status: 'unreviewed',
+        reviewer_name: reviewerName.trim() || null,
+      })
+      setJob(res.data)
+      setReviewerName(res.data.reviewer_name || '')
+      onReviewed(res.data)
+    } catch (err) {
+      setSaveResult({ success: false, error: err.response?.data?.detail || err.message })
+    } finally {
+      setTogglingStatus(false)
+    }
+  }
+
+  const handleDelete = async () => {
+    setDeleting(true)
+    try {
+      await axios.delete(`${API_BASE}/jobs/${jobId}`)
+      onDeleted(jobId)
+    } catch (err) {
+      setSaveResult({ success: false, error: err.response?.data?.detail || err.message })
+      setConfirmDelete(false)
+    } finally {
+      setDeleting(false)
+    }
+  }
+
+  const isReviewed = job?.status === 'reviewed'
+
+  return (
+    <motion.div
+      key={jobId}
+      initial={{ opacity: 0, y: 20 }}
+      animate={{ opacity: 1, y: 0 }}
+      exit={{ opacity: 0, y: -20 }}
+      className="flex flex-col gap-4"
+    >
+      {/* Top bar */}
+      <div className="flex items-center gap-4 flex-shrink-0">
+        <motion.button
+          onClick={onClose}
+          className="flex items-center gap-2 glass glass-hover px-4 py-2 rounded-xl text-sm text-gray-300"
+          whileHover={{ scale: 1.02 }} whileTap={{ scale: 0.98 }}
+        >
+          <ChevronLeft className="w-4 h-4" />
+          Back to results
+        </motion.button>
+        {job && (
+          <>
+            <StatusBadge status={job.status} />
+            <motion.button
+              onClick={handleToggleStatus}
+              disabled={togglingStatus}
+              title={isReviewed ? 'Revert to unreviewed' : 'Mark as reviewed'}
+              className={`flex items-center gap-1 px-3 py-1.5 rounded-lg text-xs font-medium transition-colors disabled:opacity-50 ${
+                isReviewed
+                  ? 'glass glass-hover text-amber-400 hover:bg-amber-500/10'
+                  : 'glass glass-hover text-green-400 hover:bg-green-500/10'
+              }`}
+              whileHover={!togglingStatus ? { scale: 1.02 } : {}}
+              whileTap={!togglingStatus ? { scale: 0.98 } : {}}
+            >
+              {togglingStatus ? (
+                <Loader2 className="w-3.5 h-3.5 animate-spin" />
+              ) : isReviewed ? (
+                <Clock className="w-3.5 h-3.5" />
+              ) : (
+                <CheckCircle2 className="w-3.5 h-3.5" />
+              )}
+              {isReviewed ? 'Mark Unreviewed' : 'Mark Reviewed'}
+            </motion.button>
+            <span className="text-xs text-gray-500 font-mono hidden sm:block">{job.id}</span>
+          </>
+        )}
+        <div className="ml-auto flex items-center gap-2">
+          {confirmDelete ? (
+            <>
+              <span className="text-xs text-red-400">Delete this job permanently?</span>
+              <motion.button
+                onClick={handleDelete}
+                disabled={deleting}
+                className="flex items-center gap-1 px-3 py-2 rounded-xl text-sm font-medium bg-red-600 hover:bg-red-500 disabled:opacity-50"
+                whileHover={{ scale: 1.02 }} whileTap={{ scale: 0.98 }}
+              >
+                {deleting ? <Loader2 className="w-4 h-4 animate-spin" /> : <Trash2 className="w-4 h-4" />}
+                Confirm
+              </motion.button>
+              <motion.button
+                onClick={() => setConfirmDelete(false)}
+                className="px-3 py-2 rounded-xl text-sm glass glass-hover text-gray-300"
+                whileHover={{ scale: 1.02 }} whileTap={{ scale: 0.98 }}
+              >
+                Cancel
+              </motion.button>
+            </>
+          ) : (
+            <motion.button
+              onClick={() => setConfirmDelete(true)}
+              className="flex items-center gap-2 px-3 py-2 rounded-xl text-sm glass glass-hover text-red-400 hover:bg-red-500/10"
+              whileHover={{ scale: 1.02 }} whileTap={{ scale: 0.98 }}
+            >
+              <Trash2 className="w-4 h-4" />
+              Delete
+            </motion.button>
+          )}
+        </div>
+      </div>
+
+      {loading && (
+        <div className="flex-1 flex items-center justify-center">
+          <Loader2 className="w-8 h-8 animate-spin text-purple-400" />
+        </div>
+      )}
+
+      {error && (
+        <div className="glass p-4 rounded-xl border-red-500/30 bg-red-500/10 flex-shrink-0">
+          <p className="text-sm text-red-400">{error}</p>
+        </div>
+      )}
+
+      {job && !loading && (
+        <>
+          {/* Image + Text */}
+          <div className="grid gap-6" style={{ gridTemplateColumns: '1fr 1fr', height: '130vh' }}>
+            <div className="glass rounded-2xl overflow-hidden flex items-center justify-center bg-black/20 h-full">
+              <img
+                src={`${API_BASE}/jobs/${job.id}/image`}
+                alt="Job source"
+                className="w-full h-full object-contain"
+                onError={e => { e.target.style.display = 'none' }}
+              />
+            </div>
+            <div className="glass rounded-2xl p-4 flex flex-col h-full">
+              {/* Tabs — only show tabs that have content */}
+              {(() => {
+                const tabs = [
+                  job.ocr_text || job.reviewed_text ? { id: 'ocr', label: 'OCR Text' } : null,
+                  { id: 'describe', label: 'Description' },
+                ].filter(Boolean)
+                return tabs.length > 1 ? (
+                  <div className="flex gap-1 mb-3 flex-shrink-0">
+                    {tabs.map(t => (
+                      <button
+                        key={t.id}
+                        onClick={() => setActiveTab(t.id)}
+                        className={`px-3 py-1 rounded-lg text-xs font-medium transition-colors ${
+                          activeTab === t.id
+                            ? 'bg-purple-600 text-white'
+                            : 'bg-white/5 text-gray-400 hover:bg-white/10'
+                        }`}
+                      >
+                        {t.label}
+                      </button>
+                    ))}
+                  </div>
+                ) : null
+              })()}
+
+              <p className="text-xs text-gray-400 mb-2 flex-shrink-0">
+                {{ ocr: isReviewed ? 'Reviewed Text' : 'OCR Text', describe: 'Description' }[activeTab]}
+                <span className="text-purple-400 ml-1">(editable)</span>
+              </p>
+
+              {activeTab === 'ocr' && (
+                <>
+                  <textarea
+                    value={editedText}
+                    onChange={e => setEditedText(e.target.value)}
+                    className="flex-1 w-full bg-transparent text-sm text-gray-200 font-mono resize-none focus:outline-none min-h-0"
+                    placeholder="OCR text..."
+                  />
+                  {isReviewed && job.ocr_text && (
+                    <details className="flex-shrink-0 mt-2 border-t border-white/10 pt-2">
+                      <summary className="cursor-pointer text-xs text-gray-500 hover:text-gray-400 transition-colors">
+                        Original OCR Text
+                      </summary>
+                      <pre className="text-xs text-gray-600 whitespace-pre-wrap font-mono mt-1 max-h-28 overflow-y-auto">
+                        {job.ocr_text}
+                      </pre>
+                    </details>
+                  )}
+                </>
+              )}
+              {activeTab === 'describe' && (
+                <>
+                  <div className="flex items-center gap-2 mb-2 flex-shrink-0">
+                    <select
+                      value={describeModel}
+                      onChange={e => setDescribeModel(e.target.value)}
+                      disabled={generatingDescribe || models.length === 0}
+                      className="bg-white/5 border border-white/10 rounded-lg px-2 py-1.5 text-xs text-gray-200 focus:outline-none focus:border-purple-500/50"
+                    >
+                      {models.length === 0 && <option value="">No models</option>}
+                      {models.map(m => (
+                        <option key={m.id} value={m.id}>{m.label}{m.default ? ' (default)' : ''}</option>
+                      ))}
+                    </select>
+                    <motion.button
+                      onClick={handleGenerateDescribe}
+                      disabled={generatingDescribe || !describeModel}
+                      className={`flex items-center gap-1.5 px-3 py-1.5 rounded-lg text-xs font-medium transition-all ${
+                        generatingDescribe || !describeModel
+                          ? 'opacity-50 cursor-not-allowed bg-white/5'
+                          : 'bg-gradient-to-r from-violet-600 to-purple-600 hover:from-violet-500 hover:to-purple-500'
+                      }`}
+                      whileHover={!generatingDescribe && describeModel ? { scale: 1.02 } : {}}
+                      whileTap={!generatingDescribe && describeModel ? { scale: 0.98 } : {}}
+                      title="Run Describe on this job's image and save it"
+                    >
+                      {generatingDescribe
+                        ? <><Loader2 className="w-3.5 h-3.5 animate-spin" /> Generating…</>
+                        : <><Sparkles className="w-3.5 h-3.5" /> Generate Description</>}
+                    </motion.button>
+                  </div>
+                  <textarea
+                    value={editDescribeText}
+                    onChange={e => setEditDescribeText(e.target.value)}
+                    className="flex-1 w-full bg-transparent text-sm text-gray-200 font-mono resize-none focus:outline-none min-h-0"
+                    placeholder="No description yet — pick a model and click Generate Description, or type one here."
+                  />
+                </>
+              )}
+            </div>
+          </div>
+
+          {/* Metadata + reviewer row */}
+          <div className="glass p-4 rounded-2xl flex-shrink-0">
+            <datalist id="jd-authors">
+              {(suggestions.authors || []).map(a => <option key={a} value={a} />)}
+            </datalist>
+            <datalist id="jd-books">
+              {(suggestions.books || []).map(b => <option key={b} value={b} />)}
+            </datalist>
+            <datalist id="jd-chapters">
+              {(suggestions.chapters || []).map(c => <option key={c} value={c} />)}
+            </datalist>
+            <datalist id="jd-reviewers">
+              {(suggestions.reviewers || []).map(r => <option key={r} value={r} />)}
+            </datalist>
+            <div className="grid grid-cols-6 gap-4">
+              <div>
+                <label className="text-xs text-gray-400 mb-1 block">Author</label>
+                <input type="text" list="jd-authors" value={editAuthor} onChange={e => setEditAuthor(e.target.value)} placeholder="Author" className={INPUT_CLASS} />
+              </div>
+              <div>
+                <label className="text-xs text-gray-400 mb-1 block">Book</label>
+                <input type="text" list="jd-books" value={editBook} onChange={e => setEditBook(e.target.value)} placeholder="Book title" className={INPUT_CLASS} />
+              </div>
+              <div>
+                <label className="text-xs text-gray-400 mb-1 block">Chapter</label>
+                <input type="text" list="jd-chapters" value={editChapter} onChange={e => setEditChapter(e.target.value)} placeholder="Chapter" className={INPUT_CLASS} />
+              </div>
+              <div>
+                <label className="text-xs text-gray-400 mb-1 block">Page</label>
+                <input type="text" value={editPage} onChange={e => setEditPage(e.target.value)} placeholder="Page" className={INPUT_CLASS} />
+              </div>
+              <div>
+                <label className="text-xs text-gray-400 mb-1 block">Reviewer</label>
+                <input type="text" list="jd-reviewers" value={reviewerName} onChange={e => setReviewerName(e.target.value)} placeholder="Your name" className={INPUT_CLASS} />
+              </div>
+              <div className="flex flex-col justify-end">
+                <motion.button
+                  onClick={handleSave}
+                  disabled={submitting || !reviewerName.trim()}
+                  className={`w-full flex items-center justify-center gap-2 px-4 py-2 rounded-lg font-medium text-sm transition-all ${
+                    submitting || !reviewerName.trim()
+                      ? 'opacity-50 cursor-not-allowed bg-white/5'
+                      : isReviewed
+                        ? 'bg-gradient-to-r from-blue-600 to-indigo-600 hover:from-blue-500 hover:to-indigo-500'
+                        : 'bg-gradient-to-r from-green-600 to-emerald-600 hover:from-green-500 hover:to-emerald-500'
+                  }`}
+                  whileHover={!submitting && reviewerName.trim() ? { scale: 1.02 } : {}}
+                  whileTap={!submitting && reviewerName.trim() ? { scale: 0.98 } : {}}
+                >
+                  {submitting ? (
+                    <><Loader2 className="w-4 h-4 animate-spin" /> Saving...</>
+                  ) : isReviewed ? (
+                    <><Save className="w-4 h-4" /> Save Changes</>
+                  ) : (
+                    <><CheckCircle2 className="w-4 h-4" /> Mark Reviewed</>
+                  )}
+                </motion.button>
+              </div>
+            </div>
+
+            {!isReviewed && (
+              <p className="text-xs text-gray-500 mt-2">
+                Marking reviewed accepts both the reviewed document text and the description.
+              </p>
+            )}
+
+            {saveResult && (
+              <motion.div
+                initial={{ opacity: 0, y: -4 }} animate={{ opacity: 1, y: 0 }}
+                className={`mt-3 p-2 rounded-lg text-xs ${saveResult.success ? 'bg-green-500/10 text-green-400' : 'bg-red-500/10 text-red-400'}`}
+              >
+                {saveResult.success
+                  ? (isReviewed ? 'Changes saved!' : 'Job marked as reviewed!')
+                  : saveResult.error}
+              </motion.div>
+            )}
+
+            {/* Read-only info row */}
+            <div className="flex gap-6 mt-3 pt-3 border-t border-white/10">
+              {job.submitted_at && (
+                <span className="text-xs text-gray-500">Submitted: {new Date(job.submitted_at).toLocaleString()}</span>
+              )}
+              {isReviewed && job.reviewed_at && (
+                <span className="text-xs text-gray-500">Last reviewed: {new Date(job.reviewed_at).toLocaleString()}</span>
+              )}
+              {job.mode && <span className="text-xs text-gray-500">Mode: {job.mode}</span>}
+              {job.ocr_model && <span className="text-xs text-gray-500">Model: {job.ocr_model}</span>}
+            </div>
+          </div>
+        </>
+      )}
+    </motion.div>
+  )
+}
+
+// ─────────────────────────────────────────────────────────────
+// Search / List view
+// ─────────────────────────────────────────────────────────────
+export default function JobsPanel() {
+  const suggestions = useSuggestions()
+  const [search, setSearch] = useState('')
+  const [filterStatus, setFilterStatus] = useState('')
+  const [filterAuthor, setFilterAuthor] = useState('')
+  const [filterBook, setFilterBook] = useState('')
+  const [jobs, setJobs] = useState([])
+  const [total, setTotal] = useState(0)
+  const [page, setPage] = useState(0)
+  const [loading, setLoading] = useState(false)
+  const [error, setError] = useState(null)
+  const [selectedJobId, setSelectedJobId] = useState(null)
+
+  const LIMIT = 20
+
+  const fetchJobs = useCallback(async (pageNum = 0) => {
+    setLoading(true)
+    setError(null)
+    try {
+      const params = new URLSearchParams()
+      if (search.trim()) params.set('search', search.trim())
+      if (filterStatus) params.set('status', filterStatus)
+      if (filterAuthor.trim()) params.set('author', filterAuthor.trim())
+      if (filterBook.trim()) params.set('book', filterBook.trim())
+      params.set('limit', LIMIT)
+      params.set('offset', pageNum * LIMIT)
+
+      const res = await axios.get(`${API_BASE}/jobs?${params}`)
+      setJobs(res.data.jobs)
+      setTotal(res.data.total)
+      setPage(pageNum)
+    } catch (err) {
+      setError(err.response?.data?.detail || err.message)
+    } finally {
+      setLoading(false)
+    }
+  }, [search, filterStatus, filterAuthor, filterBook])
+
+  useEffect(() => { fetchJobs(0) }, []) // eslint-disable-line react-hooks/exhaustive-deps
+
+  const handleReviewed = (updatedJob) => {
+    setJobs(prev => prev.map(j => j.id === updatedJob.id ? { ...j, ...updatedJob } : j))
+  }
+
+  const totalPages = Math.ceil(total / LIMIT)
+
+  // When a job is selected show full-screen detail
+  if (selectedJobId) {
+    return (
+      <AnimatePresence mode="wait">
+        <JobDetail
+          key={selectedJobId}
+          jobId={selectedJobId}
+          onClose={() => setSelectedJobId(null)}
+          onReviewed={handleReviewed}
+          onDeleted={(id) => {
+            setJobs(prev => prev.filter(j => j.id !== id))
+            setTotal(prev => prev - 1)
+            setSelectedJobId(null)
+          }}
+          suggestions={suggestions}
+        />
+      </AnimatePresence>
+    )
+  }
+
+  return (
+    <motion.div
+      key="job_list"
+      initial={{ opacity: 0, y: 20 }}
+      animate={{ opacity: 1, y: 0 }}
+      exit={{ opacity: 0, y: -20 }}
+      className="space-y-4"
+    >
+      {/* Search form */}
+      <div className="glass p-4 rounded-2xl space-y-3">
+        <form onSubmit={e => { e.preventDefault(); fetchJobs(0) }} className="flex gap-2">
+          <input
+            type="text"
+            value={search}
+            onChange={e => setSearch(e.target.value)}
+            placeholder="Search all fields..."
+            className={`${INPUT_CLASS} flex-1`}
+          />
+          <motion.button
+            type="submit"
+            className="flex items-center gap-2 px-4 py-2 rounded-lg bg-gradient-to-r from-purple-600 to-cyan-600 text-sm font-medium"
+            whileHover={{ scale: 1.02 }} whileTap={{ scale: 0.98 }}
+          >
+            <Search className="w-4 h-4" /> Search
+          </motion.button>
+        </form>
+
+        <datalist id="jp-authors">
+          {suggestions.authors.map(a => <option key={a} value={a} />)}
+        </datalist>
+        <datalist id="jp-books">
+          {(suggestions.books || []).map(b => <option key={b} value={b} />)}
+        </datalist>
+        <div className="grid grid-cols-3 gap-2">
+          <select value={filterStatus} onChange={e => setFilterStatus(e.target.value)} className={INPUT_CLASS}>
+            <option value="">All statuses</option>
+            <option value="unreviewed">Unreviewed</option>
+            <option value="reviewed">Reviewed</option>
+          </select>
+          <input type="text" list="jp-authors" value={filterAuthor} onChange={e => setFilterAuthor(e.target.value)} placeholder="Author..." className={INPUT_CLASS} />
+          <input type="text" list="jp-books" value={filterBook} onChange={e => setFilterBook(e.target.value)} placeholder="Book..." className={INPUT_CLASS} />
+        </div>
+
+        <div className="flex items-center justify-between">
+          <span className="text-xs text-gray-500">{total} job{total !== 1 ? 's' : ''} found</span>
+          <button onClick={() => fetchJobs(page)} className="flex items-center gap-1 text-xs text-gray-400 hover:text-gray-200 transition-colors">
+            <RefreshCw className="w-3 h-3" /> Refresh
+          </button>
+        </div>
+      </div>
+
+      {loading && <div className="flex justify-center py-8"><Loader2 className="w-6 h-6 animate-spin text-purple-400" /></div>}
+
+      {error && (
+        <div className="glass p-4 rounded-xl border-red-500/30 bg-red-500/10">
+          <p className="text-sm text-red-400">{error}</p>
+        </div>
+      )}
+
+      {!loading && !error && jobs.length === 0 && (
+        <div className="glass p-8 rounded-2xl text-center">
+          <FileText className="w-10 h-10 mx-auto mb-3 text-gray-600" />
+          <p className="text-gray-400">No jobs found</p>
+          <p className="text-xs text-gray-500 mt-1">Commit your first OCR job from the New Job tab</p>
+        </div>
+      )}
+
+      {/* Results grid */}
+      <div className="grid grid-cols-1 sm:grid-cols-2 lg:grid-cols-3 xl:grid-cols-4 gap-3">
+        <AnimatePresence>
+          {jobs.map(job => (
+            <motion.button
+              key={job.id}
+              onClick={() => setSelectedJobId(job.id)}
+              className="text-left glass p-4 rounded-xl border border-white/5 hover:border-white/20 hover:bg-white/5 transition-all"
+              initial={{ opacity: 0, y: 10 }}
+              animate={{ opacity: 1, y: 0 }}
+              exit={{ opacity: 0 }}
+              whileHover={{ scale: 1.02 }}
+              whileTap={{ scale: 0.98 }}
+              layout
+            >
+              <div className="flex items-start justify-between gap-2 mb-2">
+                <StatusBadge status={job.status} />
+              </div>
+              {job.book && <p className="text-sm font-medium text-gray-200 truncate">{job.book}</p>}
+              <div className="flex items-center gap-2 mt-0.5">
+                {job.chapter && <span className="text-xs text-gray-500">Ch. {job.chapter}</span>}
+                {job.page && <span className="text-xs text-gray-500">p. {job.page}</span>}
+              </div>
+              {job.author && <p className="text-xs text-gray-400 mt-1">{job.author}</p>}
+              <div className="flex items-center justify-between mt-2">
+                <p className="text-xs text-gray-600 font-mono">{new Date(job.submitted_at).toLocaleDateString()}</p>
+                {job.ocr_model && <span className="text-[10px] text-gray-500 truncate ml-2">{job.ocr_model}</span>}
+              </div>
+            </motion.button>
+          ))}
+        </AnimatePresence>
+      </div>
+
+      {totalPages > 1 && (
+        <div className="flex items-center justify-center gap-3">
+          <button onClick={() => fetchJobs(page - 1)} disabled={page === 0} className="glass glass-hover p-2 rounded-lg disabled:opacity-30">
+            <ChevronLeft className="w-4 h-4" />
+          </button>
+          <span className="text-sm text-gray-400">Page {page + 1} of {totalPages}</span>
+          <button onClick={() => fetchJobs(page + 1)} disabled={page >= totalPages - 1} className="glass glass-hover p-2 rounded-lg disabled:opacity-30">
+            <ChevronRight className="w-4 h-4" />
+          </button>
+        </div>
+      )}
+    </motion.div>
+  )
+}
--- a/frontend/src/components/MetadataForm.jsx
+++ b/frontend/src/components/MetadataForm.jsx
@@ -0,0 +1,77 @@
+import { BookOpen } from 'lucide-react'
+
+export default function MetadataForm({ metadata, onChange, suggestions = {} }) {
+  const { author, book, chapter, page } = metadata
+  const { authors = [], books = [], chapters = [] } = suggestions
+
+  const field = (key) => (e) => onChange({ ...metadata, [key]: e.target.value })
+
+  const inputClass =
+    'w-full bg-white/5 border border-white/10 rounded-lg px-3 py-2 text-sm text-gray-200 ' +
+    'placeholder-gray-600 focus:outline-none focus:border-purple-500/50 transition-colors'
+
+  return (
+    <div className="glass p-4 rounded-2xl space-y-3">
+      <div className="flex items-center gap-2">
+        <BookOpen className="w-4 h-4 text-purple-400" />
+        <h3 className="text-sm font-medium text-gray-300">Job Metadata</h3>
+      </div>
+
+      <datalist id="mf-authors">
+        {authors.map(a => <option key={a} value={a} />)}
+      </datalist>
+      <datalist id="mf-books">
+        {books.map(b => <option key={b} value={b} />)}
+      </datalist>
+      <datalist id="mf-chapters">
+        {chapters.map(c => <option key={c} value={c} />)}
+      </datalist>
+
+      <div className="grid grid-cols-2 gap-3">
+        <div>
+          <label className="text-xs text-gray-400 mb-1 block">Author</label>
+          <input
+            type="text"
+            list="mf-authors"
+            value={author}
+            onChange={field('author')}
+            placeholder="Author name"
+            className={inputClass}
+          />
+        </div>
+        <div>
+          <label className="text-xs text-gray-400 mb-1 block">Book</label>
+          <input
+            type="text"
+            list="mf-books"
+            value={book}
+            onChange={field('book')}
+            placeholder="Book title"
+            className={inputClass}
+          />
+        </div>
+        <div>
+          <label className="text-xs text-gray-400 mb-1 block">Chapter</label>
+          <input
+            type="text"
+            list="mf-chapters"
+            value={chapter}
+            onChange={field('chapter')}
+            placeholder="Chapter"
+            className={inputClass}
+          />
+        </div>
+        <div>
+          <label className="text-xs text-gray-400 mb-1 block">Page</label>
+          <input
+            type="text"
+            value={page}
+            onChange={field('page')}
+            placeholder="Page number"
+            className={inputClass}
+          />
+        </div>
+      </div>
+    </div>
+  )
+}
--- a/frontend/src/components/ModeSelector.jsx
+++ b/frontend/src/components/ModeSelector.jsx
@@ -1,41 +1,30 @@
 import { motion } from 'framer-motion'
-import { FileText, Eye, Search, Wand2 } from 'lucide-react'
+import { FileText, Eye } from 'lucide-react'

 const modes = [
-  { id: 'plain_ocr', name: 'Plain OCR', icon: FileText, color: 'from-blue-500 to-cyan-500', desc: 'Extract raw text', needsInput: false },
-  { id: 'describe', name: 'Describe', icon: Eye, color: 'from-violet-500 to-purple-500', desc: 'Image description', needsInput: false },
-  { id: 'find_ref', name: 'Find', icon: Search, color: 'from-yellow-500 to-orange-500', desc: 'Locate specific terms', needsInput: 'findTerm' },
-  { id: 'freeform', name: 'Freeform', icon: Wand2, color: 'from-fuchsia-500 to-pink-500', desc: 'Custom prompt', needsInput: 'prompt' },
+  { id: 'plain_ocr', name: 'Plain OCR', icon: FileText, color: 'from-blue-500 to-cyan-500', desc: 'Extract raw text' },
+  { id: 'describe', name: 'Describe', icon: Eye, color: 'from-violet-500 to-purple-500', desc: 'Image description' },
 ]

-export default function ModeSelector({ 
-  mode, 
-  onModeChange, 
-  prompt, 
-  onPromptChange,
-  findTerm,
-  onFindTermChange
-}) {
-  const selectedMode = modes.find(m => m.id === mode)
-  const needsInput = selectedMode?.needsInput
-
+export default function ModeSelector({ mode, onModeChange }) {
  return (
    <div className="glass p-4 rounded-2xl space-y-3">
      <h3 className="text-sm font-semibold text-gray-200">Mode</h3>

-      <div className="grid grid-cols-4 gap-2">
+      <div className="grid grid-cols-2 gap-2">
        {modes.map((m) => {
          const Icon = m.icon
          const isSelected = mode === m.id
-          
+
          return (
            <motion.button
              key={m.id}
              onClick={() => onModeChange(m.id)}
+              title={m.desc}
              className={`
                relative p-2 rounded-xl text-center transition-all
-                ${isSelected 
-                  ? 'glass border-white/20 shadow-lg' 
+                ${isSelected
+                  ? 'glass border-white/20 shadow-lg'
                  : 'bg-white/5 border border-white/10 hover:border-white/20'
                }
              `}
@@ -49,12 +38,12 @@ export default function ModeSelector({
                  transition={{ type: "spring", bounce: 0.2, duration: 0.6 }}
                />
              )}
-              
+
              <div className="relative space-y-1">
                <div className={`
                  w-8 h-8 mx-auto rounded-lg flex items-center justify-center
-                  ${isSelected 
-                    ? `bg-gradient-to-br ${m.color}` 
+                  ${isSelected
+                    ? `bg-gradient-to-br ${m.color}`
                    : 'bg-white/10'
                  }
                `}>
@@ -68,38 +57,6 @@ export default function ModeSelector({
          )
        })}
      </div>
-
-      {needsInput === 'findTerm' && (
-        <motion.div
-          initial={{ opacity: 0, height: 0 }}
-          animate={{ opacity: 1, height: 'auto' }}
-          exit={{ opacity: 0, height: 0 }}
-        >
-          <input
-            type="text"
-            value={findTerm}
-            onChange={(e) => onFindTermChange(e.target.value)}
-            placeholder="Enter term to find (e.g., Total, Invoice #)"
-            className="w-full bg-white/5 border border-white/10 rounded-xl px-3 py-2 text-sm focus:outline-none focus:border-purple-500 transition-colors"
-          />
-        </motion.div>
-      )}
-
-      {needsInput === 'prompt' && (
-        <motion.div
-          initial={{ opacity: 0, height: 0 }}
-          animate={{ opacity: 1, height: 'auto' }}
-          exit={{ opacity: 0, height: 0 }}
-        >
-          <textarea
-            value={prompt}
-            onChange={(e) => onPromptChange(e.target.value)}
-            placeholder="Enter your custom prompt..."
-            className="w-full bg-white/5 border border-white/10 rounded-xl px-3 py-2 text-sm focus:outline-none focus:border-purple-500 transition-colors resize-none"
-            rows={2}
-          />
-        </motion.div>
-      )}
    </div>
  )
 }
--- a/frontend/src/components/ModelSelector.jsx
+++ b/frontend/src/components/ModelSelector.jsx
@@ -0,0 +1,33 @@
+import { Cpu } from 'lucide-react'
+
+const SELECT_CLASS =
+  'w-full bg-white/5 border border-white/10 rounded-lg px-3 py-2 text-sm text-gray-200 ' +
+  'focus:outline-none focus:border-purple-500/50 transition-colors'
+
+// Dropdown to pick which OCR model runs the analysis.
+// `models` comes from the useModels() hook; `value` is the selected model id.
+export default function ModelSelector({ models, value, onChange, loading }) {
+  return (
+    <div className="glass p-4 rounded-2xl space-y-3">
+      <div className="flex items-center gap-2">
+        <Cpu className="w-4 h-4 text-purple-400" />
+        <h3 className="text-sm font-semibold text-gray-200">Model</h3>
+      </div>
+
+      <select
+        value={value || ''}
+        onChange={e => onChange(e.target.value)}
+        disabled={loading || models.length === 0}
+        className={SELECT_CLASS}
+      >
+        {loading && <option value="">Loading models…</option>}
+        {!loading && models.length === 0 && <option value="">No models available</option>}
+        {models.map(m => (
+          <option key={m.id} value={m.id}>
+            {m.label}{m.default ? ' (default)' : ''}
+          </option>
+        ))}
+      </select>
+    </div>
+  )
+}
--- a/frontend/src/components/PDFProcessor.jsx
+++ b/frontend/src/components/PDFProcessor.jsx
@@ -0,0 +1,234 @@
+import { useState, useCallback } from 'react'
+import { motion, AnimatePresence } from 'framer-motion'
+import { FileText, Download, Loader2, CheckCircle2, AlertCircle } from 'lucide-react'
+import axios from 'axios'
+
+const API_BASE = import.meta.env.VITE_API_URL || '/api'
+
+function PDFProcessor({ pdfFile, mode, prompt, model, advancedSettings, includeCaption }) {
+  const [processing, setProcessing] = useState(false)
+  const [progress, setProgress] = useState(0)
+  const [result, setResult] = useState(null)
+  const [error, setError] = useState(null)
+  const [outputFormat, setOutputFormat] = useState('markdown')
+
+  const formats = [
+    { value: 'markdown', label: 'Markdown', ext: 'md', icon: '📝' },
+    { value: 'html', label: 'HTML', ext: 'html', icon: '🌐' },
+    { value: 'docx', label: 'Word', ext: 'docx', icon: '📄' },
+    { value: 'json', label: 'JSON', ext: 'json', icon: '📊' }
+  ]
+
+  const handleProcess = useCallback(async () => {
+    if (!pdfFile) return
+
+    setProcessing(true)
+    setError(null)
+    setProgress(0)
+
+    try {
+      const formData = new FormData()
+      formData.append('pdf_file', pdfFile)
+      if (model) formData.append('model', model)
+      formData.append('mode', mode)
+      formData.append('prompt', prompt)
+      formData.append('output_format', outputFormat)
+      formData.append('grounding', mode === 'find_ref')
+      formData.append('include_caption', includeCaption)
+      formData.append('extract_images', true)
+      formData.append('dpi', 144)
+      formData.append('base_size', advancedSettings.base_size)
+      formData.append('image_size', advancedSettings.image_size)
+      formData.append('crop_mode', advancedSettings.crop_mode)
+
+      const response = await axios.post(`${API_BASE}/process-pdf`, formData, {
+        headers: {
+          'Content-Type': 'multipart/form-data',
+        },
+        responseType: outputFormat === 'json' ? 'json' : 'blob',
+        onUploadProgress: (progressEvent) => {
+          const percentCompleted = Math.round((progressEvent.loaded * 100) / progressEvent.total)
+          setProgress(percentCompleted)
+        }
+      })
+
+      if (outputFormat === 'json') {
+        setResult(response.data)
+      } else {
+        // For file downloads (markdown, html, docx)
+        const format = formats.find(f => f.value === outputFormat)
+        const blob = new Blob([response.data], {
+          type: response.headers['content-type']
+        })
+        const url = URL.createObjectURL(blob)
+        const a = document.createElement('a')
+        a.href = url
+        a.download = `ocr_result.${format.ext}`
+        a.click()
+        URL.revokeObjectURL(url)
+
+        setResult({
+          success: true,
+          message: `Document downloaded as ${format.label}`,
+          format: outputFormat
+        })
+      }
+
+      setProgress(100)
+    } catch (err) {
+      console.error('PDF processing error:', err)
+      setError(err.response?.data?.detail || err.message || 'Failed to process PDF')
+    } finally {
+      setProcessing(false)
+    }
+  }, [pdfFile, mode, prompt, model, outputFormat, includeCaption, advancedSettings])
+
+  const handleDownloadJSON = useCallback(() => {
+    if (!result || outputFormat !== 'json') return
+
+    const blob = new Blob([JSON.stringify(result, null, 2)], { type: 'application/json' })
+    const url = URL.createObjectURL(blob)
+    const a = document.createElement('a')
+    a.href = url
+    a.download = 'ocr_result.json'
+    a.click()
+    URL.revokeObjectURL(url)
+  }, [result, outputFormat])
+
+  return (
+    <div className="space-y-4">
+      {/* Format Selector */}
+      <div className="glass p-6 rounded-2xl space-y-3">
+        <label className="block text-sm font-medium text-gray-300 mb-3">
+          Output Format
+        </label>
+        <div className="grid grid-cols-2 gap-2">
+          {formats.map((format) => (
+            <motion.button
+              key={format.value}
+              onClick={() => setOutputFormat(format.value)}
+              className={`p-3 rounded-xl text-sm font-medium transition-all ${
+                outputFormat === format.value
+                  ? 'bg-gradient-to-r from-purple-600 to-cyan-600 text-white'
+                  : 'glass text-gray-400 hover:bg-white/5'
+              }`}
+              whileHover={{ scale: 1.02 }}
+              whileTap={{ scale: 0.98 }}
+            >
+              <span className="mr-2">{format.icon}</span>
+              {format.label}
+            </motion.button>
+          ))}
+        </div>
+      </div>
+
+      {/* Process Button */}
+      <motion.button
+        onClick={handleProcess}
+        disabled={!pdfFile || processing}
+        className={`w-full relative overflow-hidden rounded-2xl p-[2px] ${
+          !pdfFile || processing ? 'opacity-50 cursor-not-allowed' : ''
+        }`}
+        whileHover={!processing && pdfFile ? { scale: 1.02 } : {}}
+        whileTap={!processing && pdfFile ? { scale: 0.98 } : {}}
+      >
+        <div className="absolute inset-0 bg-gradient-to-r from-purple-600 via-pink-600 to-cyan-600 animate-gradient" />
+        <div className="relative bg-dark-100 px-8 py-4 rounded-2xl flex items-center justify-center gap-3">
+          {processing ? (
+            <>
+              <Loader2 className="w-5 h-5 animate-spin" />
+              <span className="font-semibold">Processing PDF...</span>
+            </>
+          ) : (
+            <>
+              <FileText className="w-5 h-5" />
+              <span className="font-semibold">Process PDF</span>
+            </>
+          )}
+        </div>
+      </motion.button>
+
+      {/* Progress Bar */}
+      <AnimatePresence>
+        {processing && progress > 0 && (
+          <motion.div
+            initial={{ opacity: 0, height: 0 }}
+            animate={{ opacity: 1, height: 'auto' }}
+            exit={{ opacity: 0, height: 0 }}
+            className="glass p-4 rounded-2xl"
+          >
+            <div className="flex items-center justify-between mb-2">
+              <span className="text-sm text-gray-400">Processing...</span>
+              <span className="text-sm font-medium text-purple-400">{progress}%</span>
+            </div>
+            <div className="h-2 bg-dark-200 rounded-full overflow-hidden">
+              <motion.div
+                className="h-full bg-gradient-to-r from-purple-600 to-cyan-600"
+                initial={{ width: 0 }}
+                animate={{ width: `${progress}%` }}
+                transition={{ duration: 0.3 }}
+              />
+            </div>
+          </motion.div>
+        )}
+      </AnimatePresence>
+
+      {/* Error Display */}
+      <AnimatePresence>
+        {error && (
+          <motion.div
+            initial={{ opacity: 0, y: -10 }}
+            animate={{ opacity: 1, y: 0 }}
+            exit={{ opacity: 0, y: -10 }}
+            className="glass p-4 rounded-2xl border-red-500/50 bg-red-500/10 flex items-start gap-3"
+          >
+            <AlertCircle className="w-5 h-5 text-red-400 flex-shrink-0 mt-0.5" />
+            <div>
+              <p className="text-sm font-medium text-red-400">Processing Failed</p>
+              <p className="text-xs text-red-300 mt-1">{error}</p>
+            </div>
+          </motion.div>
+        )}
+      </AnimatePresence>
+
+      {/* Success Display */}
+      <AnimatePresence>
+        {result && !error && (
+          <motion.div
+            initial={{ opacity: 0, y: -10 }}
+            animate={{ opacity: 1, y: 0 }}
+            exit={{ opacity: 0, y: -10 }}
+            className="glass p-6 rounded-2xl border-green-500/50 bg-green-500/10"
+          >
+            <div className="flex items-start gap-3">
+              <CheckCircle2 className="w-5 h-5 text-green-400 flex-shrink-0 mt-0.5" />
+              <div className="flex-1">
+                <p className="text-sm font-medium text-green-400">
+                  {result.message || 'PDF processed successfully!'}
+                </p>
+                {outputFormat === 'json' && result.pages && (
+                  <div className="mt-3 space-y-2">
+                    <p className="text-xs text-gray-400">
+                      Processed {result.total_pages} page{result.total_pages > 1 ? 's' : ''}
+                    </p>
+                    <motion.button
+                      onClick={handleDownloadJSON}
+                      className="glass px-4 py-2 rounded-xl text-sm font-medium hover:bg-white/5 transition-colors flex items-center gap-2"
+                      whileHover={{ scale: 1.02 }}
+                      whileTap={{ scale: 0.98 }}
+                    >
+                      <Download className="w-4 h-4" />
+                      Download JSON
+                    </motion.button>
+                  </div>
+                )}
+              </div>
+            </div>
+          </motion.div>
+        )}
+      </AnimatePresence>
+    </div>
+  )
+}
+
+export default PDFProcessor
--- a/frontend/src/components/ResultPanel.jsx
+++ b/frontend/src/components/ResultPanel.jsx
@@ -2,6 +2,7 @@ import { useEffect, useRef, useState, useCallback } from 'react'
 import { motion, AnimatePresence } from 'framer-motion'
 import { Copy, Download, Sparkles, Loader2, CheckCircle2, ChevronDown } from 'lucide-react'
 import ReactMarkdown from 'react-markdown'
+import DOMPurify from 'dompurify'

 export default function ResultPanel({ result, loading, imagePreview, onCopy, onDownload }) {
  const canvasRef = useRef(null)
@@ -204,20 +205,20 @@ export default function ResultPanel({ result, loading, imagePreview, onCopy, onD
            exit={{ opacity: 0, y: -20 }}
            className="space-y-4"
          >
-            {/* Preview with boxes */}
+            {/* Preview with boxes (grounding modes) */}
            {imagePreview && result.boxes && result.boxes.length > 0 && (
              <div className="relative rounded-xl overflow-hidden border border-white/10 bg-black">
-                <img 
+                <img
                  ref={imgRef}
-                  src={imagePreview} 
-                  alt="Result" 
-                  className="w-full block" 
+                  src={imagePreview}
+                  alt="Result"
+                  className="w-full block"
                  onLoad={() => {
                    console.log('🖼️ Image loaded, triggering draw')
                    setImageLoaded(true)
                  }}
                />
-                <canvas 
+                <canvas
                  ref={canvasRef}
                  className="absolute top-0 left-0 w-full h-full pointer-events-none"
                  style={{ display: 'block' }}
@@ -225,15 +226,13 @@ export default function ResultPanel({ result, loading, imagePreview, onCopy, onD
              </div>
            )}

-            {/* Text result */}
+            {/* Rendered text result */}
            <div className="bg-white/5 border border-white/10 rounded-xl p-4 max-h-96 overflow-y-auto">
              {isHTML ? (
-                <div 
+                <div
                  className="prose prose-invert prose-sm max-w-none"
-                  dangerouslySetInnerHTML={{ __html: result.text }}
-                  style={{
-                    color: '#e5e7eb',
-                  }}
+                  dangerouslySetInnerHTML={{ __html: DOMPurify.sanitize(result.text) }}
+                  style={{ color: '#e5e7eb' }}
                />
              ) : isMarkdown ? (
                <div className="prose prose-invert prose-sm max-w-none">
--- a/frontend/src/hooks/useModels.js
+++ b/frontend/src/hooks/useModels.js
@@ -0,0 +1,24 @@
+import { useState, useEffect } from 'react'
+
+const API_BASE = import.meta.env.VITE_API_URL || '/api'
+
+// Fetches the OCR models available for selection. Returns { models, loading }.
+// Each model: { id, label, capabilities: { grounding, advanced_settings }, default }
+export function useModels() {
+  const [models, setModels] = useState([])
+  const [loading, setLoading] = useState(true)
+
+  useEffect(() => {
+    let cancelled = false
+    fetch(`${API_BASE}/models`)
+      .then(r => (r.ok ? r.json() : null))
+      .then(data => {
+        if (!cancelled && data?.models) setModels(data.models)
+      })
+      .catch(() => {})
+      .finally(() => { if (!cancelled) setLoading(false) })
+    return () => { cancelled = true }
+  }, [])
+
+  return { models, loading }
+}
--- a/frontend/src/hooks/useSuggestions.js
+++ b/frontend/src/hooks/useSuggestions.js
@@ -0,0 +1,16 @@
+import { useState, useEffect } from 'react'
+
+const API_BASE = import.meta.env.VITE_API_URL || '/api'
+
+export function useSuggestions() {
+  const [suggestions, setSuggestions] = useState({ authors: [], books: [], chapters: [], reviewers: [] })
+
+  useEffect(() => {
+    fetch(`${API_BASE}/jobs/suggestions`)
+      .then(r => r.ok ? r.json() : null)
+      .then(data => { if (data) setSuggestions(data) })
+      .catch(() => {})
+  }, [])
+
+  return suggestions
+}
Author	SHA1	Message	Date
Aaron Roberts	02185bef46	Adding missed files	2026-06-30 12:16:16 +01:00
Aaron Roberts	04bbbebd5a	Remove Freeform and Find from UI. Allow Description to be added to Reviewed job	2026-06-29 13:09:01 +01:00
Aaron Roberts	48f958de6c	Added job review toggle	2026-06-23 10:43:44 +01:00
Aaron Roberts	91c134faa7	Add updated_at column and trigger for Qdrant re-sync detection Adds updated_at TIMESTAMPTZ to ocr_jobs, stamped automatically by a BEFORE UPDATE trigger. The sync process can use updated_at > qdrant_synced_at to detect jobs that need re-ingestion after edits or reviews. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-19 23:12:33 +01:00
Aaron Roberts	38ac36b18e	Add qdrant_synced_at column	2026-06-19 17:47:53 +01:00
Aaron Roberts	ab19725e0b	Remove AnimatePresence mode=wait to fix blank screen on view transitions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-10 22:04:52 +01:00
Aaron Roberts	a511db78cb	Fix blank screen on Analyze; add mode selector to result view showResultView now only activates after results exist (not during loading), preventing AnimatePresence from blanking the screen mid-transition. Adds a mode selector + Analyze button at the top of the result view so additional modes can be run without leaving the page. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-10 21:55:23 +01:00
Aaron Roberts	07b2f2b6bc	Fix stale editedOcrText reference in handleDownload dependency array Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-10 21:44:36 +01:00
Aaron Roberts	ae0ac3af59	Store all mode results (OCR, Describe, Freeform) in a single job record - DB: add describe_text and freeform_text columns (ALTER TABLE IF NOT EXISTS) - Backend: commit and review endpoints accept/persist all three text fields - App: accumulate results per mode in state; tabs appear when >1 mode run; all results sent on Commit Job - JobDetail: tabbed text panel shows whichever fields are populated, all editable Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-10 12:28:01 +01:00
Aaron Roberts	4ab87d2e6f	Extend commit workflow to Describe and Freeform modes All text-output modes (plain_ocr, describe, freeform) now show the full-screen editable result view with metadata fields and Commit Job button. The textarea label reflects the active mode. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-10 10:38:27 +01:00
Aaron Roberts	cc5ce0c6be	Fix suggestions fetch using wrong API base URL Fallback was http://localhost:8000/api instead of /api, causing silent failure in containerized deployments. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-09 18:37:13 +01:00
Aaron Roberts	02e3099388	Add delete job functionality with confirmation step Adds DELETE /api/jobs/{id} endpoint (removes DB record and image file), and a two-step Delete / Confirm button on the review page that returns to the job list on success. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-09 18:33:46 +01:00
Aaron Roberts	dc5a1a4ff5	Add book title to autocomplete suggestions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-09 18:29:14 +01:00
Aaron Roberts	5ea18d76d6	Add autocomplete suggestions for Author, Chapter, and Reviewer fields Adds a GET /api/jobs/suggestions endpoint that returns distinct values for author, chapter, and reviewer_name from the database, and wires them into HTML datalist elements on the New Job, result view, and Browse Jobs pages. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-09 18:24:49 +01:00
Aaron Roberts	1d15b5f0c1	Add unique constraint to prevent duplicate (author, chapter, page) submissions Adds a PostgreSQL partial unique index on (author, chapter, page) where all three fields are non-null, and returns HTTP 409 when a duplicate is detected. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-09 18:19:54 +01:00
Aaron Roberts	cb704a2f27	Double image/text section height to 130vh Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-09 18:13:11 +01:00
Aaron Roberts	3ca40a2255	Revert to 50/50 image/text column split Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-09 18:10:51 +01:00
Aaron Roberts	6f86f872a9	Make image display significantly taller Give the image+text row an explicit 65vh height instead of flex-1 inside a viewport-locked container. Remove the overall height constraint so metadata and commit rows sit naturally below with scroll if needed. Image and textarea containers now use h-full to fill the fixed row height. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-09 18:10:39 +01:00
Aaron Roberts	7381ecd12e	Increase image display size to 60% of the split layout Change image/text column ratio from 50/50 to 60/40 (3fr 2fr) on both the New Job result view and the Browse Jobs detail view. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-09 18:05:09 +01:00
Aaron Roberts	247a5e4b0e	Full-screen side-by-side layout for New Job and Browse Jobs New Job (plain_ocr): - After OCR completes, the entire main area becomes a flex-column view pinned to viewport height: image and editable textarea side by side at top (filling available space), metadata fields in a compact row below, Commit Job button at the bottom - "New Analysis" button in the header returns to the upload view - ResultPanel reverted to simple rendered-output only (no commit logic) Browse Jobs: - Selecting a job replaces the search list with a full-screen detail view using the same layout: image \| editable textarea on top, all metadata fields + Reviewer name + action button in a single row below - "Back to results" button returns to the search/list grid - Search results now display as a responsive card grid Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-09 17:57:11 +01:00
Aaron Roberts	9356ba6d1b	Side-by-side image/text layout and editable metadata on review New Job page: - OCR result now shows source image and editable textarea side by side - Grounding-box overlay preview moved into the non-commit branch Browse Jobs / Review page: - JobDetail uses a 2-column layout: image + read-only info on left, all editable fields on right - Author, book, chapter, and page are now editable inputs (not read-only) - Text textarea is always editable (for both unreviewed and reviewed jobs) - Reviewer name pre-filled for reviewed jobs; button becomes "Save Changes" - Outer grid changed to 1/3 list + 2/3 detail for more review space Backend: - PUT /api/jobs/{id}/review now accepts and saves author, book, chapter, page alongside reviewed_text and reviewer_name Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-09 17:38:36 +01:00
Aaron Roberts	da7957d7d5	Fix commit job and OCR text editing - OCR text is now shown in an editable textarea (plain_ocr mode) so users can correct it before committing - editedOcrText state tracks edits; commit job sends the edited value instead of the original result.text - Remove silent early-return guard that blocked commit when text was empty - Copy and download also use the edited text Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-09 17:11:49 +01:00
Aaron Roberts	fd747e6c23	Add job tracking with PostgreSQL, image storage, and review workflow - Add PostgreSQL service to docker-compose with health check and postgres_data volume - Mount ./ocr_images as bind volume for persistent image storage - Add backend/database.py with schema init and get_db() context manager - Add 5 new API endpoints: POST /api/jobs, GET /api/jobs (search), GET /api/jobs/{id}, GET /api/jobs/{id}/image, PUT /api/jobs/{id}/review - Jobs are saved with author/book/chapter/page metadata, auto UUID, and submitted_at timestamp - Jobs start as 'unreviewed'; review captures edited text, reviewer name, and reviewed_at - Add MetadataForm.jsx (author/book/chapter/page inputs) to the New Job panel - Add JobsPanel.jsx with search/filter, paginated list, and detail pane with review form - Add "Commit Job" button to ResultPanel (plain_ocr mode only) with success/error feedback - Add "New Job" / "Browse Jobs" navigation to the app header Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-09 16:48:12 +01:00
Aaron Roberts	68147eb97c	.env	2026-06-09 15:10:25 +01:00
Aaron Roberts	ba313ee808	stack.env	2026-06-09 15:06:02 +01:00
Aaron Roberts	bd19e09630	Adding .env for portainer	2026-06-09 14:15:34 +01:00
Ray Dumasia	3dac0741b1	Fix RCE vulnerability and harden security - Replace eval() with ast.literal_eval() in pdf_utils.py to fix unauthenticated remote code execution via crafted PDF uploads (reported by OX Security) - Sanitize HTML output with DOMPurify to prevent XSS - Restrict CORS origins (configurable via CORS_ORIGINS env var) - Suppress raw exception details in API error responses - Cap Image.MAX_IMAGE_PIXELS to prevent decompression bomb DoS - Add security regression test suite Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 09:01:52 +01:00
Ray Dumasia	e24f064042	Add CTRL-V support as suggested by @p-xiexin	2025-11-15 23:32:33 +00:00
rdumasia303	e82cd2abf0	Merge pull request #22 from rdumasia303/claude/add-pdf-support-016ikhUYeakWY2dah4X9STAX Claude/add pdf support 016ikh u yeak wy2dah4 x9 stax	2025-11-15 23:00:51 +00:00
rdumasia303	7b7d368c94	Update latest updates section to November 2025	2025-11-15 22:58:28 +00:00
Claude	efa2bd265b	Enhance README with comprehensive PDF processing documentation - Add prominent "What's New" section highlighting v2.2.0 features - Add detailed "How to Use" guide for both Image OCR and PDF Processing - Include output format comparison table - Add use cases and tips for best results - Expand tech stack section with new dependencies - Better structure with clear sections for new users	2025-11-15 22:55:43 +00:00
Claude	e33e9be75a	Fix Dockerfile to copy all Python files including pdf_utils and format_converter	2025-11-15 14:38:54 +00:00
Claude	e578276d3e	Add PDF processing and multi-format document conversion Features added: - PDF to image conversion with configurable DPI - Multi-page PDF processing with OCR - Export to Markdown, HTML, DOCX, and JSON formats - Automatic image extraction from PDFs - Formula and formatting preservation - Real-time progress tracking for multi-page documents Backend changes: - New /api/process-pdf endpoint for PDF processing - pdf_utils.py: PDF conversion and image extraction utilities - format_converter.py: Document format conversion (MD, HTML, DOCX) - Updated dependencies: PyMuPDF, img2pdf, python-docx, markdown Frontend changes: - File type toggle (Image OCR / PDF Processing) - PDFProcessor component with format selection - Updated ImageUpload to support both images and PDFs - Progress bars for multi-page processing - Download options for converted documents Documentation: - Updated README with PDF processing features - Added API documentation for /api/process-pdf endpoint - Added format conversion examples	2025-11-15 14:25:09 +00:00
rdumasia303	5ba45f7db2	Update README.md with new content	2025-10-23 01:14:24 +01:00
rdumasia303	fd063c0e71	Add MIT License to the project	2025-10-23 01:06:22 +01:00
rdumasia303	0fb5760b11	Merge pull request #11 from dnnspaul/main Fix incorrect OCR instructions + show advanced settings	2025-10-22 23:52:30 +01:00
Dennis Paul	23bbd1fc8d	show advanced settings toggle	2025-10-23 00:05:24 +02:00
Dennis Paul	225655d02c	(#10 ) Fix incorrect OCR instruction	2025-10-23 00:05:00 +02:00