Enhance README with comprehensive PDF processing documentation

- Add prominent "What's New" section highlighting v2.2.0 features
- Add detailed "How to Use" guide for both Image OCR and PDF Processing
- Include output format comparison table
- Add use cases and tips for best results
- Expand tech stack section with new dependencies
- Better structure with clear sections for new users
This commit is contained in:
Claude
2025-11-15 22:55:43 +00:00
parent e33e9be75a
commit efa2bd265b

107
README.md
View File

@@ -1,10 +1,46 @@
# 🚀 DeepSeek OCR - React + FastAPI # 🚀 DeepSeek OCR - React + FastAPI
Modern OCR web application powered by DeepSeek-OCR with a stunning React frontend and FastAPI backend. Modern OCR web application powered by DeepSeek-OCR with a stunning React frontend and FastAPI backend. **Now with PDF processing and multi-format document conversion!**
![DeepSeek OCR in Action](assets/multi-bird.png) ![DeepSeek OCR in Action](assets/multi-bird.png)
> **Recent Updates (v2.2.0)** ## ✨ What's New in v2.2.0 - PDF Processing & Document Conversion
We've added powerful PDF processing capabilities based on community feedback! Here's what you can do now:
### 📄 Process Entire PDF Documents
- Upload PDF files up to 100MB
- Automatic multi-page OCR processing
- Real-time progress tracking for large documents
- Extract text from scanned PDFs or image-based documents
### 🔄 Convert to Multiple Formats
Export your OCR results in the format you need:
- **Markdown (.md)** - Clean, structured text perfect for documentation
- **HTML (.html)** - Styled documents with embedded images and tables
- **Word (.docx)** - Professional documents with formatting, tables, and images
- **JSON** - Structured data for programmatic access
### 🖼️ Automatic Image Extraction
- Detects and extracts images from PDF pages
- Embeds images in exported documents
- Preserves image placement and context
### 📐 Formula & Formatting Preservation
- Maintains mathematical formulas (LaTeX syntax)
- Preserves tables, headings, and document structure
- Cleans up special characters while keeping formatting intact
### 🎯 Use Cases
- **Document Digitization** - Convert scanned PDFs to editable formats
- **Data Extraction** - Pull structured data from forms and invoices
- **Content Migration** - Convert PDFs to Markdown for wikis/documentation
- **Academic Papers** - Extract text and formulas from research papers
- **Business Documents** - Convert reports to Word for editing
---
> **Latest Updates (v2.2.0)** - December 2024
> - 🎉 **NEW: PDF Processing** - Upload PDFs and extract text from all pages > - 🎉 **NEW: PDF Processing** - Upload PDFs and extract text from all pages
> - 🎉 **NEW: Multi-Format Export** - Convert to Markdown, HTML, DOCX, or JSON > - 🎉 **NEW: Multi-Format Export** - Convert to Markdown, HTML, DOCX, or JSON
> - 🎉 **NEW: Automatic Image Extraction** - Extract and preserve images from PDFs > - 🎉 **NEW: Automatic Image Extraction** - Extract and preserve images from PDFs
@@ -45,6 +81,52 @@ Modern OCR web application powered by DeepSeek-OCR with a stunning React fronten
- **Backend API**: http://localhost:8000 (or your configured API_PORT) - **Backend API**: http://localhost:8000 (or your configured API_PORT)
- **API Docs**: http://localhost:8000/docs - **API Docs**: http://localhost:8000/docs
## 🎓 How to Use
### Processing Images (Single Image OCR)
1. Select **"Image OCR"** mode in the toggle
2. Upload an image (PNG, JPG, WEBP, etc.)
3. Choose your OCR mode:
- **Plain OCR** - Extract all text
- **Describe** - Get image description
- **Find** - Locate specific terms
- **Freeform** - Use custom prompts
4. Click **"Analyze Image"**
5. View results with bounding boxes (if enabled)
6. Copy or download the extracted text
### Processing PDFs (Multi-Page Documents) - NEW!
1. Select **"PDF Processing"** mode in the toggle
2. Upload a PDF file (up to 100MB)
3. Choose your OCR mode (same as above)
4. Select **output format**:
- 📝 **Markdown** - For documentation, wikis, GitHub
- 🌐 **HTML** - For web publishing, styled viewing
- 📄 **DOCX** - For Word editing, professional documents
- 📊 **JSON** - For programmatic access, data extraction
5. Click **"Process PDF"**
6. Watch the progress bar as pages are processed
7. Your file downloads automatically when complete!
### Tips for Best Results
- **For scanned documents**: Use higher DPI (144-300) in advanced settings
- **For tables**: The model excels at extracting structured data
- **For formulas**: Mathematical notation is preserved in output
- **For images in PDFs**: Enable "Extract Images" to include them in output
- **For large PDFs**: JSON format is fastest, DOCX takes longer due to formatting
### Output Format Comparison
| Format | Best For | Features | File Size |
|--------|----------|----------|-----------|
| **Markdown** | Documentation, GitHub, wikis | Clean text, tables, code blocks | Smallest |
| **HTML** | Web viewing, sharing | Styled output, embedded images, tables | Medium |
| **DOCX** | Editing, professional docs | Full formatting, images, tables | Largest |
| **JSON** | Data processing, APIs | Structured data, metadata, page info | Small |
## Features ## Features
### Dual Processing Modes ### Dual Processing Modes
@@ -113,10 +195,25 @@ CROP_MODE=true # Enable dynamic cropping for large images
## Tech Stack ## Tech Stack
- **Frontend**: React 18 + Vite 5 + TailwindCSS 3 + Framer Motion 11 ### Frontend
- **Backend**: FastAPI + PyTorch + Transformers 4.46 + DeepSeek-OCR - **Framework**: React 18 + Vite 5
- **Styling**: TailwindCSS 3 + Custom Glass Morphism
- **Animations**: Framer Motion 11
- **HTTP Client**: Axios
- **File Upload**: React Dropzone
### Backend
- **API Framework**: FastAPI (async Python web framework)
- **ML/AI**: PyTorch + Transformers 4.46 + DeepSeek-OCR
- **PDF Processing**: PyMuPDF (fitz) + img2pdf
- **Document Conversion**:
- python-docx (Word documents)
- markdown (Markdown processing)
- Custom HTML generator
- **Configuration**: python-decouple for environment management - **Configuration**: python-decouple for environment management
- **Server**: Nginx (reverse proxy)
### Infrastructure
- **Server**: Nginx (reverse proxy & static file serving)
- **Container**: Docker + Docker Compose with multi-stage builds - **Container**: Docker + Docker Compose with multi-stage builds
- **GPU**: NVIDIA CUDA support (tested on RTX 3090, RTX 5090) - **GPU**: NVIDIA CUDA support (tested on RTX 3090, RTX 5090)