aec04f6eb4e3d07fa524f9d56986f64d8c5d4753
🚀 DeepSeek OCR - React + FastAPI
Modern OCR web application powered by DeepSeek-OCR with a stunning React frontend and FastAPI backend.
Note
: This was a quickly vibe-coded project to test out DeepSeek-OCR! It basically works quite nice on an RTX 5090. The "Find" mode grounding boxes aren't quite working yet - probably my fault in not interpreting the dimensions correctly, but the core OCR functionality is pretty nice so far.
Quick Start
docker compose up --build
Then open:
- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- API Docs: http://localhost:8000/docs
Features
4 OCR Modes
- Plain OCR - Raw text extraction
- Describe - Generate image descriptions
- Find - Locate specific terms (grounding boxes WIP)
- Freeform - Custom prompts for anything
UI Features
- 🎨 Glass morphism design with animated gradients
- 🎯 Drag & drop file upload
- 📦 Grounding box visualization (WIP - dimensions need fixing)
- ✨ Smooth animations (Framer Motion)
- 📋 Copy/Download results
- 🎛️ Advanced settings dropdown
- 📝 Markdown rendering for formatted output
Tech Stack
- Frontend: React 18 + Vite 5 + TailwindCSS 3 + Framer Motion 11
- Backend: FastAPI + PyTorch + Transformers 4.46 + DeepSeek-OCR
- Server: Nginx (reverse proxy)
- Container: Docker + Docker Compose with multi-stage builds
- GPU: NVIDIA CUDA support (tested on RTX 3090)
Project Structure
deepseek-ocr/
├── backend/ # FastAPI backend
│ ├── main.py
│ ├── requirements.txt
│ └── Dockerfile
├── frontend/ # React frontend
│ ├── src/
│ │ ├── components/
│ │ ├── App.jsx
│ │ └── main.jsx
│ ├── package.json
│ ├── nginx.conf
│ └── Dockerfile
├── models/ # Model cache
└── docker-compose.yml
Development
Backend
cd backend
pip install -r requirements.txt
uvicorn main:app --reload --host 0.0.0.0 --port 8000
Frontend
cd frontend
npm install
npm run dev
Requirements
- Docker & Docker Compose
- NVIDIA GPU with CUDA support (tested on RTX 3090)
- nvidia-docker runtime
- ~8-12GB VRAM for model
Known Issues
- 📦 Find mode grounding boxes: Not rendering correctly - likely dimension scaling issue in the canvas overlay logic. Boxes are detected and returned by the backend, but the frontend visualization needs work.
API Usage
POST /api/ocr
Parameters:
image(file, required)mode(string): plain_ocr | describe | find_ref | freeformprompt(string): Custom prompt for freeform modegrounding(bool): Enable bounding boxes (auto-enabled for find_ref)find_term(string): Term to locate in find_ref modebase_size(int): Base processing size (default: 1024)image_size(int): Image size (default: 640)crop_mode(bool): Enable crop mode (default: true)
Response:
{
"success": true,
"text": "Extracted text...",
"boxes": [{"label": "field", "box": [x1, y1, x2, y2]}],
"image_dims": {"w": 1920, "h": 1080},
"metadata": {...}
}
Troubleshooting
GPU not detected
nvidia-smi
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
Port conflicts
sudo lsof -i :3000
sudo lsof -i :8000
Frontend build issues
cd frontend
rm -rf node_modules package-lock.json
docker-compose build frontend
License
This project uses the DeepSeek-OCR model. Refer to the model's license terms.
Languages
JavaScript
55.1%
Python
42.3%
CSS
0.9%
TypeScript
0.7%
Dockerfile
0.6%
Other
0.4%