Files
rw-deepseek-ocr/README.md
rdumasia303 a7aeeaf109 Update GPU model from RTX 3090 to RTX 5090
Updated GPU model in requirements and README.
2025-10-21 09:40:44 +01:00

139 lines
3.6 KiB
Markdown

# 🚀 DeepSeek OCR - React + FastAPI
Modern OCR web application powered by DeepSeek-OCR with a stunning React frontend and FastAPI backend.
> **Note**: This was a quickly vibe-coded project to test out DeepSeek-OCR! It basically works quite nice on an RTX 5090. The "Find" mode grounding boxes aren't quite working yet - probably my fault in not interpreting the dimensions correctly, but the core OCR functionality is pretty nice so far.
## Quick Start
```bash
docker compose up --build
```
Then open:
- **Frontend**: http://localhost:3000
- **Backend API**: http://localhost:8000
- **API Docs**: http://localhost:8000/docs
## Features
### 4 OCR Modes
- **Plain OCR** - Raw text extraction
- **Describe** - Generate image descriptions
- **Find** - Locate specific terms (grounding boxes WIP)
- **Freeform** - Custom prompts for anything
### UI Features
- 🎨 Glass morphism design with animated gradients
- 🎯 Drag & drop file upload
- 📦 Grounding box visualization (WIP - dimensions need fixing)
- ✨ Smooth animations (Framer Motion)
- 📋 Copy/Download results
- 🎛️ Advanced settings dropdown
- 📝 Markdown rendering for formatted output
## Tech Stack
- **Frontend**: React 18 + Vite 5 + TailwindCSS 3 + Framer Motion 11
- **Backend**: FastAPI + PyTorch + Transformers 4.46 + DeepSeek-OCR
- **Server**: Nginx (reverse proxy)
- **Container**: Docker + Docker Compose with multi-stage builds
- **GPU**: NVIDIA CUDA support (tested on RTX 5090)
## Project Structure
```
deepseek-ocr/
├── backend/ # FastAPI backend
│ ├── main.py
│ ├── requirements.txt
│ └── Dockerfile
├── frontend/ # React frontend
│ ├── src/
│ │ ├── components/
│ │ ├── App.jsx
│ │ └── main.jsx
│ ├── package.json
│ ├── nginx.conf
│ └── Dockerfile
├── models/ # Model cache
└── docker-compose.yml
```
## Development
### Backend
```bash
cd backend
pip install -r requirements.txt
uvicorn main:app --reload --host 0.0.0.0 --port 8000
```
### Frontend
```bash
cd frontend
npm install
npm run dev
```
## Requirements
- Docker & Docker Compose
- NVIDIA GPU with CUDA support (tested on RTX 5090)
- nvidia-docker runtime
- ~8-12GB VRAM for model
## Known Issues
- 📦 **Find mode grounding boxes**: Not rendering correctly - likely dimension scaling issue in the canvas overlay logic. Boxes are detected and returned by the backend, but the frontend visualization needs work.
## API Usage
### POST /api/ocr
**Parameters:**
- `image` (file, required)
- `mode` (string): plain_ocr | describe | find_ref | freeform
- `prompt` (string): Custom prompt for freeform mode
- `grounding` (bool): Enable bounding boxes (auto-enabled for find_ref)
- `find_term` (string): Term to locate in find_ref mode
- `base_size` (int): Base processing size (default: 1024)
- `image_size` (int): Image size (default: 640)
- `crop_mode` (bool): Enable crop mode (default: true)
**Response:**
```json
{
"success": true,
"text": "Extracted text...",
"boxes": [{"label": "field", "box": [x1, y1, x2, y2]}],
"image_dims": {"w": 1920, "h": 1080},
"metadata": {...}
}
```
## Troubleshooting
### GPU not detected
```bash
nvidia-smi
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
```
### Port conflicts
```bash
sudo lsof -i :3000
sudo lsof -i :8000
```
### Frontend build issues
```bash
cd frontend
rm -rf node_modules package-lock.json
docker-compose build frontend
```
## License
This project uses the DeepSeek-OCR model. Refer to the model's license terms.