PQ PDF Logo
PQ PDF Tools Secure document utilities for everyday workflows.
PDF Tools OCR PDF — Optical Character Recognition

🔎 OCR PDF — Optical Character Recognition

Extract text from scanned PDFs, image-based documents, and photographed pages. Powered by Tesseract 5 LSTM neural network. Output as searchable text, searchable PDF (invisible text layer added), or both.

No ads. No tracking. No data sold. Ever.
🔎
About OCR PDF
This tool runs Tesseract 5 LSTM neural network OCR on your PDF. Each page is rendered to a high-resolution image, then OCR'd to extract recognised text. Best for scanned documents, photographed pages, and image-based PDFs. For PDFs that already have a selectable text layer, use Extract Text instead — it is faster and more accurate for those files.
⏱️
Processing Time
OCR is compute-intensive. Expect 2–8 seconds per page depending on DPI and page complexity. A 10-page document at 200 DPI typically completes in under 60 seconds. Processing runs entirely server-side — your files are deleted immediately after download.
🔎
Drop your scanned PDF here or click to browse
Scanned PDFs, image-based PDFs, photographed documents — up to 50 MB
Scan Resolution (DPI)
Auto works for most documents. Use Sparse Text for forms, receipts, or mixed-content pages.
Output Format
Pages to OCR
Initialising OCR…
🧠
Tesseract 5 LSTM Engine
State-of-the-art neural network OCR — trained on millions of document samples for high character recognition accuracy.
📄
Searchable PDF Output
Adds an invisible text layer to your scanned images — the original appearance is preserved while text becomes copyable and searchable.
🔬
150 / 200 / 300 DPI Control
Match rendering resolution to your scan quality. 200 DPI is the recommended balance; 300 DPI maximises accuracy for small or faded text.
📐
4 Page Segmentation Modes
Auto, single column, single block, and sparse text — choose how Tesseract reads your page layout for better results on forms, receipts, and columns.
👁️
In-Browser Text Preview
Read the extracted text directly in the results panel without downloading — see immediately whether OCR succeeded before saving the file.
📊
Confidence Score & Word Count
Every job returns per-word Tesseract confidence averaged across all pages, plus word count and character count — so you know how well OCR performed.
✂️
Custom Page Ranges
Target specific pages (e.g. 1–3, 5, 8–12) rather than the entire document — saves time on long scanned books where you only need certain pages.
📚
Up to 100 Pages Per Job
Processes entirely server-side — no browser memory limits. Pages are handled one at a time to prevent disk exhaustion on large documents.
🔒
Zero Retention
Your file and all OCR output are deleted from the server immediately after the download begins — nothing is stored, logged, or retained.