rw-deepseek-ocr

Author	SHA1	Message	Date
Aaron Roberts	1d15b5f0c1	Add unique constraint to prevent duplicate (author, chapter, page) submissions Adds a PostgreSQL partial unique index on (author, chapter, page) where all three fields are non-null, and returns HTTP 409 when a duplicate is detected. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-09 18:19:54 +01:00
Aaron Roberts	cb704a2f27	Double image/text section height to 130vh Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-09 18:13:11 +01:00
Aaron Roberts	3ca40a2255	Revert to 50/50 image/text column split Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-09 18:10:51 +01:00
Aaron Roberts	6f86f872a9	Make image display significantly taller Give the image+text row an explicit 65vh height instead of flex-1 inside a viewport-locked container. Remove the overall height constraint so metadata and commit rows sit naturally below with scroll if needed. Image and textarea containers now use h-full to fill the fixed row height. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-09 18:10:39 +01:00
Aaron Roberts	7381ecd12e	Increase image display size to 60% of the split layout Change image/text column ratio from 50/50 to 60/40 (3fr 2fr) on both the New Job result view and the Browse Jobs detail view. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-09 18:05:09 +01:00
Aaron Roberts	247a5e4b0e	Full-screen side-by-side layout for New Job and Browse Jobs New Job (plain_ocr): - After OCR completes, the entire main area becomes a flex-column view pinned to viewport height: image and editable textarea side by side at top (filling available space), metadata fields in a compact row below, Commit Job button at the bottom - "New Analysis" button in the header returns to the upload view - ResultPanel reverted to simple rendered-output only (no commit logic) Browse Jobs: - Selecting a job replaces the search list with a full-screen detail view using the same layout: image \| editable textarea on top, all metadata fields + Reviewer name + action button in a single row below - "Back to results" button returns to the search/list grid - Search results now display as a responsive card grid Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-09 17:57:11 +01:00
Aaron Roberts	9356ba6d1b	Side-by-side image/text layout and editable metadata on review New Job page: - OCR result now shows source image and editable textarea side by side - Grounding-box overlay preview moved into the non-commit branch Browse Jobs / Review page: - JobDetail uses a 2-column layout: image + read-only info on left, all editable fields on right - Author, book, chapter, and page are now editable inputs (not read-only) - Text textarea is always editable (for both unreviewed and reviewed jobs) - Reviewer name pre-filled for reviewed jobs; button becomes "Save Changes" - Outer grid changed to 1/3 list + 2/3 detail for more review space Backend: - PUT /api/jobs/{id}/review now accepts and saves author, book, chapter, page alongside reviewed_text and reviewer_name Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-09 17:38:36 +01:00
Aaron Roberts	da7957d7d5	Fix commit job and OCR text editing - OCR text is now shown in an editable textarea (plain_ocr mode) so users can correct it before committing - editedOcrText state tracks edits; commit job sends the edited value instead of the original result.text - Remove silent early-return guard that blocked commit when text was empty - Copy and download also use the edited text Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-09 17:11:49 +01:00
Aaron Roberts	fd747e6c23	Add job tracking with PostgreSQL, image storage, and review workflow - Add PostgreSQL service to docker-compose with health check and postgres_data volume - Mount ./ocr_images as bind volume for persistent image storage - Add backend/database.py with schema init and get_db() context manager - Add 5 new API endpoints: POST /api/jobs, GET /api/jobs (search), GET /api/jobs/{id}, GET /api/jobs/{id}/image, PUT /api/jobs/{id}/review - Jobs are saved with author/book/chapter/page metadata, auto UUID, and submitted_at timestamp - Jobs start as 'unreviewed'; review captures edited text, reviewer name, and reviewed_at - Add MetadataForm.jsx (author/book/chapter/page inputs) to the New Job panel - Add JobsPanel.jsx with search/filter, paginated list, and detail pane with review form - Add "Commit Job" button to ResultPanel (plain_ocr mode only) with success/error feedback - Add "New Job" / "Browse Jobs" navigation to the app header Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-09 16:48:12 +01:00
Aaron Roberts	68147eb97c	.env	2026-06-09 15:10:25 +01:00
Aaron Roberts	ba313ee808	stack.env	2026-06-09 15:06:02 +01:00
Aaron Roberts	bd19e09630	Adding .env for portainer	2026-06-09 14:15:34 +01:00
Ray Dumasia	3dac0741b1	Fix RCE vulnerability and harden security - Replace eval() with ast.literal_eval() in pdf_utils.py to fix unauthenticated remote code execution via crafted PDF uploads (reported by OX Security) - Sanitize HTML output with DOMPurify to prevent XSS - Restrict CORS origins (configurable via CORS_ORIGINS env var) - Suppress raw exception details in API error responses - Cap Image.MAX_IMAGE_PIXELS to prevent decompression bomb DoS - Add security regression test suite Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 09:01:52 +01:00
Ray Dumasia	e24f064042	Add CTRL-V support as suggested by @p-xiexin	2025-11-15 23:32:33 +00:00
rdumasia303	e82cd2abf0	Merge pull request #22 from rdumasia303/claude/add-pdf-support-016ikhUYeakWY2dah4X9STAX Claude/add pdf support 016ikh u yeak wy2dah4 x9 stax	2025-11-15 23:00:51 +00:00
rdumasia303	7b7d368c94	Update latest updates section to November 2025	2025-11-15 22:58:28 +00:00
Claude	efa2bd265b	Enhance README with comprehensive PDF processing documentation - Add prominent "What's New" section highlighting v2.2.0 features - Add detailed "How to Use" guide for both Image OCR and PDF Processing - Include output format comparison table - Add use cases and tips for best results - Expand tech stack section with new dependencies - Better structure with clear sections for new users	2025-11-15 22:55:43 +00:00
Claude	e33e9be75a	Fix Dockerfile to copy all Python files including pdf_utils and format_converter	2025-11-15 14:38:54 +00:00
Claude	e578276d3e	Add PDF processing and multi-format document conversion Features added: - PDF to image conversion with configurable DPI - Multi-page PDF processing with OCR - Export to Markdown, HTML, DOCX, and JSON formats - Automatic image extraction from PDFs - Formula and formatting preservation - Real-time progress tracking for multi-page documents Backend changes: - New /api/process-pdf endpoint for PDF processing - pdf_utils.py: PDF conversion and image extraction utilities - format_converter.py: Document format conversion (MD, HTML, DOCX) - Updated dependencies: PyMuPDF, img2pdf, python-docx, markdown Frontend changes: - File type toggle (Image OCR / PDF Processing) - PDFProcessor component with format selection - Updated ImageUpload to support both images and PDFs - Progress bars for multi-page processing - Download options for converted documents Documentation: - Updated README with PDF processing features - Added API documentation for /api/process-pdf endpoint - Added format conversion examples	2025-11-15 14:25:09 +00:00
rdumasia303	5ba45f7db2	Update README.md with new content	2025-10-23 01:14:24 +01:00
rdumasia303	fd063c0e71	Add MIT License to the project	2025-10-23 01:06:22 +01:00
rdumasia303	0fb5760b11	Merge pull request #11 from dnnspaul/main Fix incorrect OCR instructions + show advanced settings	2025-10-22 23:52:30 +01:00
Dennis Paul	23bbd1fc8d	show advanced settings toggle	2025-10-23 00:05:24 +02:00
Dennis Paul	225655d02c	(#10 ) Fix incorrect OCR instruction	2025-10-23 00:05:00 +02:00
rdumasia303	f28320a23d	Revise GPU recommendations in README Updated hardware recommendations for GPU support.	2025-10-21 22:08:12 +01:00
Ray Dumasia	525fc1d248	Add assets to readme	2025-10-21 21:49:57 +01:00
Ray Dumasia	3efc4da7ff	Add in .env.example for setting ports, fix upload limit, fix bounding box, can now dismiss previous image, change markdown expectation to HTML - not MD. updated README with nvidia driver/container instructions	2025-10-21 21:35:17 +01:00
Ray Dumasia	e02338436b	Add in .env.example for setting ports, fix upload limit, fix bounding box, can now dismiss previous image, change markdown expectation to HTML - not MD. updated README with nvidia driver/container instructions	2025-10-21 21:33:13 +01:00
rdumasia303	a7aeeaf109	Update GPU model from RTX 3090 to RTX 5090 Updated GPU model in requirements and README.	2025-10-21 09:40:44 +01:00
Ray Dumasia	aec04f6eb4	Initial commit	2025-10-21 01:32:09 +01:00

30 Commits