Enhance README with comprehensive PDF processing documentation

- Add prominent "What's New" section highlighting v2.2.0 features - Add detailed "How to Use" guide for both Image OCR and PDF Processing - Include output format comparison table - Add use cases and tips for best results - Expand tech stack section with new dependencies - Better structure with clear sections for new users
2025-11-15 22:55:43 +00:00
parent e33e9be75a
commit efa2bd265b
1 changed files with 102 additions and 5 deletions
--- a/README.md
+++ b/README.md
@@ -1,10 +1,46 @@
 # 🚀 DeepSeek OCR - React + FastAPI
-Modern OCR web application powered by DeepSeek-OCR with a stunning React frontend and FastAPI backend.
+Modern OCR web application powered by DeepSeek-OCR with a stunning React frontend and FastAPI backend. **Now with PDF processing and multi-format document conversion!**
 ![DeepSeek OCR in Action](assets/multi-bird.png)
-> **Recent Updates (v2.2.0)**
+## ✨ What's New in v2.2.0 - PDF Processing & Document Conversion
 We've added powerful PDF processing capabilities based on community feedback! Here's what you can do now:
 ### 📄 Process Entire PDF Documents
 - Upload PDF files up to 100MB
 - Automatic multi-page OCR processing
 - Real-time progress tracking for large documents
 - Extract text from scanned PDFs or image-based documents
 ### 🔄 Convert to Multiple Formats
 Export your OCR results in the format you need:
 - **Markdown (.md)** - Clean, structured text perfect for documentation
 - **HTML (.html)** - Styled documents with embedded images and tables
 - **Word (.docx)** - Professional documents with formatting, tables, and images
 - **JSON** - Structured data for programmatic access
 ### 🖼️ Automatic Image Extraction
 - Detects and extracts images from PDF pages
 - Embeds images in exported documents
 - Preserves image placement and context
 ### 📐 Formula & Formatting Preservation
 - Maintains mathematical formulas (LaTeX syntax)
 - Preserves tables, headings, and document structure
 - Cleans up special characters while keeping formatting intact
 ### 🎯 Use Cases
 - **Document Digitization** - Convert scanned PDFs to editable formats
 - **Data Extraction** - Pull structured data from forms and invoices
 - **Content Migration** - Convert PDFs to Markdown for wikis/documentation
 - **Academic Papers** - Extract text and formulas from research papers
 - **Business Documents** - Convert reports to Word for editing
 ---
 > **Latest Updates (v2.2.0)** - December 2024
 > - 🎉 **NEW: PDF Processing** - Upload PDFs and extract text from all pages
 > - 🎉 **NEW: Multi-Format Export** - Convert to Markdown, HTML, DOCX, or JSON
 > - 🎉 **NEW: Automatic Image Extraction** - Extract and preserve images from PDFs
@@ -45,6 +81,52 @@ Modern OCR web application powered by DeepSeek-OCR with a stunning React fronten
   - **Backend API**: http://localhost:8000 (or your configured API_PORT)
   - **API Docs**: http://localhost:8000/docs
 ## 🎓 How to Use
 ### Processing Images (Single Image OCR)
 1. Select **"Image OCR"** mode in the toggle
 2. Upload an image (PNG, JPG, WEBP, etc.)
 3. Choose your OCR mode:
   - **Plain OCR** - Extract all text
   - **Describe** - Get image description
   - **Find** - Locate specific terms
   - **Freeform** - Use custom prompts
 4. Click **"Analyze Image"**
 5. View results with bounding boxes (if enabled)
 6. Copy or download the extracted text
 ### Processing PDFs (Multi-Page Documents) - NEW!
 1. Select **"PDF Processing"** mode in the toggle
 2. Upload a PDF file (up to 100MB)
 3. Choose your OCR mode (same as above)
 4. Select **output format**:
   - 📝 **Markdown** - For documentation, wikis, GitHub
   - 🌐 **HTML** - For web publishing, styled viewing
   - 📄 **DOCX** - For Word editing, professional documents
   - 📊 **JSON** - For programmatic access, data extraction
 5. Click **"Process PDF"**
 6. Watch the progress bar as pages are processed
 7. Your file downloads automatically when complete!
 ### Tips for Best Results
 - **For scanned documents**: Use higher DPI (144-300) in advanced settings
 - **For tables**: The model excels at extracting structured data
 - **For formulas**: Mathematical notation is preserved in output
 - **For images in PDFs**: Enable "Extract Images" to include them in output
 - **For large PDFs**: JSON format is fastest, DOCX takes longer due to formatting
 ### Output Format Comparison
 | Format | Best For | Features | File Size |
 |--------|----------|----------|-----------|
 | **Markdown** | Documentation, GitHub, wikis | Clean text, tables, code blocks | Smallest |
 | **HTML** | Web viewing, sharing | Styled output, embedded images, tables | Medium |
 | **DOCX** | Editing, professional docs | Full formatting, images, tables | Largest |
 | **JSON** | Data processing, APIs | Structured data, metadata, page info | Small |
 ## Features
 ### Dual Processing Modes
@@ -113,10 +195,25 @@ CROP_MODE=true         # Enable dynamic cropping for large images
 ## Tech Stack
- **Frontend**: React 18 + Vite 5 + TailwindCSS 3 + Framer Motion 11
+### Frontend
- **Backend**: FastAPI + PyTorch + Transformers 4.46 + DeepSeek-OCR
+- **Framework**: React 18 + Vite 5
 - **Styling**: TailwindCSS 3 + Custom Glass Morphism
 - **Animations**: Framer Motion 11
 - **HTTP Client**: Axios
 - **File Upload**: React Dropzone
 ### Backend
 - **API Framework**: FastAPI (async Python web framework)
 - **ML/AI**: PyTorch + Transformers 4.46 + DeepSeek-OCR
 - **PDF Processing**: PyMuPDF (fitz) + img2pdf
 - **Document Conversion**:
  - python-docx (Word documents)
  - markdown (Markdown processing)
  - Custom HTML generator
 - **Configuration**: python-decouple for environment management
- **Server**: Nginx (reverse proxy)
+
 ### Infrastructure
 - **Server**: Nginx (reverse proxy & static file serving)
 - **Container**: Docker + Docker Compose with multi-stage builds
 - **GPU**: NVIDIA CUDA support (tested on RTX 3090, RTX 5090)