rdumasia303 a7aeeaf109 Update GPU model from RTX 3090 to RTX 5090
Updated GPU model in requirements and README.
2025-10-21 09:40:44 +01:00
2025-10-21 01:32:09 +01:00
2025-10-21 01:32:09 +01:00
2025-10-21 01:32:09 +01:00
2025-10-21 01:32:09 +01:00

🚀 DeepSeek OCR - React + FastAPI

Modern OCR web application powered by DeepSeek-OCR with a stunning React frontend and FastAPI backend.

Note

: This was a quickly vibe-coded project to test out DeepSeek-OCR! It basically works quite nice on an RTX 5090. The "Find" mode grounding boxes aren't quite working yet - probably my fault in not interpreting the dimensions correctly, but the core OCR functionality is pretty nice so far.

Quick Start

docker compose up --build

Then open:

Features

4 OCR Modes

  • Plain OCR - Raw text extraction
  • Describe - Generate image descriptions
  • Find - Locate specific terms (grounding boxes WIP)
  • Freeform - Custom prompts for anything

UI Features

  • 🎨 Glass morphism design with animated gradients
  • 🎯 Drag & drop file upload
  • 📦 Grounding box visualization (WIP - dimensions need fixing)
  • Smooth animations (Framer Motion)
  • 📋 Copy/Download results
  • 🎛️ Advanced settings dropdown
  • 📝 Markdown rendering for formatted output

Tech Stack

  • Frontend: React 18 + Vite 5 + TailwindCSS 3 + Framer Motion 11
  • Backend: FastAPI + PyTorch + Transformers 4.46 + DeepSeek-OCR
  • Server: Nginx (reverse proxy)
  • Container: Docker + Docker Compose with multi-stage builds
  • GPU: NVIDIA CUDA support (tested on RTX 5090)

Project Structure

deepseek-ocr/
├── backend/           # FastAPI backend
│   ├── main.py
│   ├── requirements.txt
│   └── Dockerfile
├── frontend/          # React frontend
│   ├── src/
│   │   ├── components/
│   │   ├── App.jsx
│   │   └── main.jsx
│   ├── package.json
│   ├── nginx.conf
│   └── Dockerfile
├── models/            # Model cache
└── docker-compose.yml

Development

Backend

cd backend
pip install -r requirements.txt
uvicorn main:app --reload --host 0.0.0.0 --port 8000

Frontend

cd frontend
npm install
npm run dev

Requirements

  • Docker & Docker Compose
  • NVIDIA GPU with CUDA support (tested on RTX 5090)
  • nvidia-docker runtime
  • ~8-12GB VRAM for model

Known Issues

  • 📦 Find mode grounding boxes: Not rendering correctly - likely dimension scaling issue in the canvas overlay logic. Boxes are detected and returned by the backend, but the frontend visualization needs work.

API Usage

POST /api/ocr

Parameters:

  • image (file, required)
  • mode (string): plain_ocr | describe | find_ref | freeform
  • prompt (string): Custom prompt for freeform mode
  • grounding (bool): Enable bounding boxes (auto-enabled for find_ref)
  • find_term (string): Term to locate in find_ref mode
  • base_size (int): Base processing size (default: 1024)
  • image_size (int): Image size (default: 640)
  • crop_mode (bool): Enable crop mode (default: true)

Response:

{
  "success": true,
  "text": "Extracted text...",
  "boxes": [{"label": "field", "box": [x1, y1, x2, y2]}],
  "image_dims": {"w": 1920, "h": 1080},
  "metadata": {...}
}

Troubleshooting

GPU not detected

nvidia-smi
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi

Port conflicts

sudo lsof -i :3000
sudo lsof -i :8000

Frontend build issues

cd frontend
rm -rf node_modules package-lock.json
docker-compose build frontend

License

This project uses the DeepSeek-OCR model. Refer to the model's license terms.

Description
No description provided
Readme MIT 3.8 MiB
Languages
JavaScript 55.1%
Python 42.3%
CSS 0.9%
TypeScript 0.7%
Dockerfile 0.6%
Other 0.4%