The Complete Guide

PQ PDF — Every Tool, Explained

45 free PDF tools. No accounts. No ads. No file retention. Everything runs on our servers — your files are deleted the moment your download starts.

45 Free Tools

47 Scan Engines

31 PQC Algorithms

0 Files Stored

50 MB Max File Size

✅ Zero retention ✅ No accounts ✅ No tracking ✅ No third-party cloud ✅ All engines local 🤖 Self-hosted AI — no OpenAI/Anthropic/Google

Most online PDF tools share a structural problem: they are built around cloud storage. A file uploaded to add a watermark travels to a third-party processing service, sits in object storage, passes through analytics pipelines, and is subject to retention policies that are vague at best.

PQ PDF is different. Every operation creates one isolated temporary directory, runs entirely inside it, streams the result back to your browser, and deletes the directory — while the download is still in flight. There is no retention window because there is no buffer.

Four specific gaps drove this project: no free tool offered a genuine zero-retention guarantee anywhere in the stack; no free tool ran multi-engine threat analysis on PDFs; no tool offered post-quantum cryptography in document workflows; and no tool was transparent about which engines were actually running — every operation here is described in the exact pipeline terms of the code.

Why privacy-first matters for PDFs

🗄️

Zero file retention

Your file is deleted while the download streams. cleanup() is called immediately after readfile() — not on a schedule, not on the next request. There is no temp-file cleanup job because nothing is left to clean.

🚫

No third-party cloud — including AI

All 45 tools run on pqpdf.com's own servers. Ghostscript, LibreOffice, Tesseract, PyMuPDF, ClamAV — every engine runs locally. The AI features (forensic report, document analysis, redaction suggestions, change analysis) run on our own self-hosted Qwen 2.5 1.5B LLM via llama.cpp — no OpenAI, no Anthropic, no Google, no third-party AI API of any kind. No file data is ever sent outside our infrastructure.

🔒

Strict CSP — no unsafe-inline

Every page uses a per-request nonce-based Content Security Policy. No inline scripts, no unsafe-eval. All event handlers are registered via addEventListener() in external JS files. Including this page.

👁️

No tracking, ever

No analytics pixels. No advertising networks. No social-media trackers. Server access logs (IP, timestamp, path) are retained for 30 days for abuse prevention only, then permanently deleted.

⚡

Post-quantum ready

The Protect PDF tool includes 31 post-quantum algorithms (NIST ML-KEM-1024, HQC, FN-DSA, and hybrid modes) running entirely client-side in your browser. The server receives only the encrypted bundle — your plaintext never crosses the network.

🧪

No accounts required

Every tool works without registration. There are no user accounts, no email addresses collected, no passwords stored. Rate limiting uses session cookies that exist only in your browser and are never transmitted to or stored on the server.

How PQ PDF is funded

PQ PDF's public tools are free and will remain free.

The platform is funded through enterprise on-premise deployments and support contracts for organisations that need to run the full system inside their own infrastructure — air-gapped networks, regulated environments, high-volume internal pipelines.

This model means the public service operates without ads, tracking, or data monetisation. Revenue does not depend on user data, so there is no structural incentive to collect any.

If you need to process sensitive or high-volume documents internally, see Enterprise deployment.

What the 45 tools cover

Six groups covering every common PDF workflow — including a dedicated PDF Editor tab.

⚙️

Core Manipulation — 12 tools

Merge, split, compress, rotate, reorder, delete pages, extract pages, flatten, repair, grayscale, N-up imposition, and auto-crop & deskew.

Explore core tools →

📄

Format Conversion — 13 tools

Convert between PDF and Word, Excel, PowerPoint, HTML, Images, Markdown, PDF/A, and PDF/X. In both directions.

Explore convert tools →

🛡️

Security & Privacy — 7 tools

47-engine PDF forensics scanner, 23-engine Office document forensics scanner (Word · Excel · PowerPoint · Outlook · Access · Visio), AES-256 + PQC encryption, unlock, permanent redaction, watermarking, and PAdES-compliant signing.

All 47 engines explained → Security research — 7,800-PDF study → Case study — 16,971 DOJ Epstein PDFs → Explore security tools →

✏️

Annotate & Inspect — 11 tools

Form fill, PDF diff, OCR, accessibility audit, font inspection, colour profiling, table extraction, and more.

Explore annotation tools →

🖊️

PDF Editor — 24+ tools

Full visual canvas editor: annotations, AcroForm builder, deep text & image editing, linked text reflow frames, vector object editing, and professional prepress marks. All applied server-side via PyMuPDF.

Explore the editor →

🔄

Automation — 1 tool

Chain multiple operations into a named workflow. Save, load, append, and export pipelines as JSON. Run on one or many PDFs in one click.

Explore automation →

🔧

How it works

Temp-dir isolation, CSP nonces, rate limiting, file validation, zero-retention architecture, and the full engine stack — all documented.

Read the architecture →

📊

How PQ PDF compares

Side-by-side comparison with Adobe, Smallpdf, iLovePDF, PDF24, and Sejda — tools, limits, privacy, and pricing verified from official sources.

See the comparison →

How PQ PDF compares

Verified from official privacy policies, terms of service, and tool pages as of March 2026. Only publicly documented claims are listed.

Feature	PQ PDF	Adobe Acrobat Online	Smallpdf	iLovePDF	PDF24	Sejda
File retention after processing	✔ Deleted during download `cleanup()` called inside `send_file()` — no retention window of any kind	Not disclosed Files deleted "after processing" — no specific time given on any public page	1 hour Automatically deleted 1 hour after processing (stated on tool pages)	2 hours Documented in Terms of Service §9.3 and the public FAQ	1 hour Stated in Privacy Policy and Terms of Use (pdf24.org)	2 hours Stated on tool pages and in the Sejda Privacy Policy
Free tier — what's included
Core / Organise	✔ 12 tools Merge · Split · Compress (5 presets + custom DPI + live preview) · Rotate · Extract Pages · Delete Pages · Reorder · Repair · Flatten · Grayscale / B&W · N-up / Imposition · Auto-crop & Deskew (per-page interactive editor)	✔ 7 tools Free Adobe ID required for most Merge · Split · Rotate · Delete pages · Reorder pages · Add page numbers · Add watermark	✔ 7 tools 2 tasks/day cap Merge · Split · Compress · Rotate · Delete pages · Reorder pages · Extract pages	✔ 6 tools Per-task caps apply Merge (25 files, 100 MB) · Split · Remove pages · Extract pages · Rotate · Compress	✔ 6 tools All free · no caps · ad-supported Merge · Split · Compress · Rotate · Delete pages · Reorder pages	✔ 13+ tools 3 tasks/hr rate limit Merge (4 variants: std, specific pages, alternate, resize) · Split (5 variants: pages, text, bookmarks, size, extract) · Rotate · Delete pages · Crop · Repair · Flatten · Header & footer · Bates numbering · Reverse PDF
Convert	✔ 13 tools PDF ↔ Word · Excel · PowerPoint · HTML · Images · Markdown (pymupdf4llm) · PDF/A (1b/2b/3b) · PDF/X (X-1a/X-3/X-4)	Partial PDF → Office is paid Free (Adobe ID): Word / Excel / PPT / HTML / JPG → PDF · PDF → JPG (limited quality) Paid only: PDF → Word · Excel · PPT · PDF/A · OCR	Partial PDF → Office is Pro only Free: Word / Excel / PPT / HTML / JPG → PDF · PDF → JPG Pro only: PDF → Word · Excel · PPT · OCR · PDF/A	✔ 10 tools File size caps per tool → PDF (5): Word · Excel · PPT · HTML · JPG (20 files, 100 MB each) PDF → (5): Word (10 MB) · Excel (10 MB) · PPT (10 MB) · JPG (100 MB) · HTML (10 MB)	✔ 10 tools All free · no caps · ad-supported PDF ↔ Word · PDF ↔ Excel · PDF ↔ PowerPoint · Image ↔ PDF · HTML → PDF	✔ 8 tools 3 tasks/hr rate limit PDF → (5): Word · Excel · PPT · JPG · HTML → PDF (3): Word · Excel · JPG
Security & Encryption	✔ 6 tools 47-engine PDF forensics scanner — multi-axis (malware/exploit · integrity tampering · content-integrity / semantic-determinism) — (structural · dynamic sandbox · ML+SHAP · XFA FormCalc · action dependency graph · OCG cloaking · Unicode/invisible text · trailer chain forensics · codec exploit params · entropy topology · image stego · compliance fraud · JS behavioral emulation · font CharString emulator · XRef integrity graph · V/AP divergence & AI-ingestion detection (glyph remapping · OCR poisoning · /Alt & /ActualText injection) · local threat intelligence 6.4M+ indicators · MITRE ATT&CK · signature forensics · phishing · campaign attribution) · AES-256 + 31-algorithm PQC encrypt · Unlock · Permanent redact · Watermark · PAdES-B sign	✔ 2 tools free Most security features are paid Free (Adobe ID): Protect (encrypt) · Unlock Paid only: Redaction · Advanced watermark	✔ 3 tools 2 tasks/day cap Protect (encrypt) · Unlock · Watermark Pro only: Redact · PDF/A compliance	✔ 5 tools Per-task caps apply Protect (encrypt) · Unlock · E-sign (request) · Validate signature · Redact PDF No AES-256 PQC encryption, no threat scanning	✔ 4 tools All free · no caps · ad-supported Protect (encrypt) · Unlock (decrypt) · Sign PDF · Compare PDFs	✔ 5 tools 3 tasks/hr rate limit Protect (encrypt) · Unlock · Sign PDF (request) · Validate signature · Redact PDF
Edit, Annotate & Inspect	✔ 11 tools + full editor Visual editor (24+ tools: annotations, AcroForm builder, deep text editing, image object editing, linked text reflow frames, vector object editing, prepress marks) · Fill forms · Compare / diff · Tesseract 5 LSTM OCR (searchable PDF + confidence) · Bookmarks editor · WCAG 2.1 accessibility checker · Font inspector · Colour / CMYK inspector · Tables → JSON · Extract text · PDF info	Partial Advanced edit, OCR & compare are paid Free: Add comments · Fill & Sign · Basic text edit Paid only: Advanced editing · OCR · Compare PDFs · Accessibility checker · Bates numbering	✔ 4 tools 2 tasks/day cap Edit PDF · Add page numbers · Fill forms · Flatten Pro only: OCR · Redact · AI chat / summarise	✔ 6 tools Per-task caps apply Edit PDF (50 MB) · Add page numbers · Fill forms · Sign PDF · OCR PDF · PDF Scanner Premium: AI tools, API, batch	✔ 5 tools All free · no caps · ad-supported Edit PDF · Add page numbers · Fill forms · OCR PDF · PDF/A conversion	✔ 9 tools 3 tasks/hr rate limit Edit PDF · Add text / image / shapes / links · Whiteout · Edit hyperlinks · Add page numbers · OCR PDF · PDF Scanner · Optimise for web · HTML → PDF
Automation / Workflow	✔ Workflow builder + REST API Visual builder — chain, save, append, export as JSON; run on multiple PDFs in one job; fully free, no account needed REST API at api.pqpdf.com — 83 operations, API-key auth, IP whitelisting, stateful sessions; Development tab →	✘ None free Acrobat Actions (macro-style) — paid Acrobat Pro only; no visual builder	✘ None Batch via API — Pro only; no visual workflow builder	✘ None Batch — Premium & API only; no visual builder	✘ None Tools run individually; no workflow builder	✘ None Tools run individually; no workflow builder
Send for e-signature (multi-party)	✔ Free — no caps Up to 10 signers · sequential or parallel order · unique secure link per signer · PAdES-B cryptographic option (sender can enforce it) · multi-area placement — add any number of Sign Here and Initial Here boxes per signer per page · full field-rule locking — requester locks/pre-selects all 8 signer controls: signature method, ink colour, stroke thickness, date, time, page placement, PAdES-B crypto, certificate source · areas shown as clickable Sign Here / Initial Here targets on signer’s page — signer clicks each to confirm; counter tracks progress; submit locked until all confirmed · persistent page navigator for multi-page documents · date stamp · add/remove signers from tracking page · 24-hr TTL · no account needed	✔ 3 requests/month free Request e-sig from others via free Adobe ID; unlimited via Acrobat Sign (paid)	✔ Limited free E-sign requests included in free 2-tasks/day tier; Pro removes cap	✔ Free with caps E-sign — request sigs from others; 1-file per-task cap applies	✔ Free Request-signature workflow; free, no cap, ad-supported	✔ Free with rate limit E-sign — request sigs from others; 3 tasks/hr cap applies
PDF Scanner (camera to PDF)	✔ Free — no caps Browser camera or photo upload · real-time edge detection · OpenCV perspective correction · CLAHE & B&W enhancement · Tesseract 5 OCR · multi-page · no app install · zero retention	✔ Adobe Scan (app required) Dedicated free mobile app; scan to searchable PDF with OCR — requires install	✘ Not available	✔ Yes — free with caps PDF Scanner web tool; mobile camera to PDF with OCR; per-task file cap applies	✔ PDF24 app (app required) Mobile app includes scan-to-PDF; free, ad-supported — requires install	✔ Yes — rate limited PDF Scanner tool; 3 tasks/hr cap applies
Free tier limits	No account · No task caps · No daily limits · No ads · No upsells 50 MB / file · 200 MB total per request	2 GB / file Most tools need free Adobe ID; aggressive upgrade prompt after 1–2 uses of most tools; full access requires paid plan	5 GB / file 2 tasks/day hard cap — upgrade prompt after that; batch & API are Pro-only	200 MB / task Per-tool file count & size caps; tighter on PDF → Office (10 MB); Premium for batch & larger files	500 MB / file No task cap, no daily limit; all tools free with advertising; Premium removes ads	50 MB / file 3 tasks/hr · 50 pages/task · 30 files/hr; Paid plan removes all rate limits
Malware & threat scanning	✔ 47 independent engines Static heuristics · dynamic Linux namespace sandbox · ML anomaly detection · XFA FormCalc parser · PDF action dependency graph · OCG layer cloaking · Unicode/invisible text · trailer chain forensics · codec exploit params · entropy topology · image steganography · compliance fraud · JS behavioral emulation · font CharString emulator · XRef integrity graph · six-parser differential · JS AST deobfuscation · AcroForm forensics · signature forensics · campaign attribution · weighted correlation engine	✘ No malware scanning CSAM content check only (Adobe Terms §2.2(C)); no PDF threat analysis disclosed	✘ None disclosed	✘ Explicitly none "We won't check, copy or analyze your files in any way" — iLovePDF FAQ	✘ None disclosed	✘ None disclosed
Post-quantum encryption	✔ 31 algorithms — client-side NIST ML-KEM-1024/768/512, HQC-128/192/256, FN-DSA variants, and hybrid modes via `@noble/post-quantum`; server never sees plaintext	✘ Not available Not disclosed on any public page	✘ Not available	✘ Not available	✘ Not available	✘ Not available
Processing on own servers	✔ All engines run locally No file data sent to any third-party service — every engine runs on pqpdf.com's own server	Third party — not named "Trusted cloud infrastructure providers and CDNs" (Adobe Terms) — providers not disclosed	Not disclosed Privacy policy pages returned errors during verification; subprocessors not published	Third party — not named "Leading cloud data storage provider" cited on Security page — name not disclosed	EU servers confirmed; provider not named "All servers within the EU" — PDF24 Privacy Policy (geek software GmbH, Berlin)	DigitalOcean, Cloudflare, Fastly All three named as infrastructure providers in the Sejda Privacy Policy
AI analysis — self-hosted LLM	🤖 Qwen 2.5 1.5B — self-hosted 4 AI features — all self-hosted on own hardware, zero third-party AI calls: 🤖 AI Forensic Report (synthesises all 47 engine outputs → verdict, confidence, executive summary, key findings, MITRE techniques, recommended actions, false_positive_note; MALICIOUS/CLEAN auto-labels for ML retraining), 🤖 AI Document Analysis (type classification with confidence, language, entities incl. locations, topics, reading level), 🤖 AI Redaction Suggestions (PII pattern proposals across 13 categories with example + reason), 🤖 AI Change Analysis (significance rating, change type, plain-English summary, per-change details array, recommendation) — Qwen 2.5 1.5B Instruct via llama.cpp, ~13 t/s on Ryzen 5 3550H. No OpenAI, no Anthropic, no Google. Your document text never leaves our infrastructure.	⚠️ Adobe Sensei (cloud AI) Adobe AI routes content through Adobe's own cloud AI pipeline — separate from PDF processing infrastructure	✘ None disclosed	✘ None disclosed	✘ None disclosed	✘ None disclosed
Processing engines disclosed	✔ All engines named Ghostscript, Poppler, LibreOffice, Tesseract 5, PyMuPDF, YARA, ClamAV, PeePDF, pikepdf, Acorn, scikit-learn, LightGBM, imagehash — every tool documented	✘ None disclosed Described as proprietary; no library or engine names on any public page	✘ None disclosed	✘ None disclosed	✘ None disclosed	Partial Tesseract named for the desktop app OCR feature only; web-side engines not disclosed
Max upload — free tier	50 MB / file 200 MB total per request across all files	2 GB Stated on the Compress tool page (acrobat.adobe.com)	Not stated publicly Pricing page not accessible; per-tool limits not published on accessible pages	200 MB / task Varies by tool — lower limits on conversions (e.g. 15 MB for some); ilovepdf.com	500 MB / file Stated on tool pages at tools.pdf24.org	50 MB / file 200 MB / task; 3 tasks/hour; page cap 50 pages/task (sejda.com)
Open-source engines only	✔ 100% open source Ghostscript, Poppler, LibreOffice, Tesseract 5, PyMuPDF, YARA, ClamAV, PeePDF, pikepdf, Acorn, scikit-learn, LightGBM, imagehash — every engine is named open-source software	✘ No Proprietary Adobe processing pipeline; specific engines not disclosed on any public page	✘ Not disclosed	✘ Not disclosed	✘ Not disclosed	Partial Tesseract named for desktop OCR only; web processing pipeline not disclosed
Cryptographic signing standard	✔ PAdES-B (ETSI EN 319 102-1) Incremental CMS/PKCS#7 via pyhanko 0.34 — verifiable in Adobe Reader's Signatures panel. Draw/type/upload modes also available with embedded RSA-2048 cert	Acrobat standard signature Adobe Sign product available; standard acrobat.adobe.com e-sign. PAdES compliance not documented on public pages	Basic e-signature Sign tool available; signing standard not disclosed on accessible pages	Basic e-signature Sign tool available; signing standard not disclosed on accessible pages	Basic e-signature Sign tool available; signing standard not disclosed on accessible pages	Basic e-signature Sign tool available; signing standard not disclosed on accessible pages
Advertising / monetisation model	✔ No ads, no upsells No advertising, no tracking pixels, no in-tool upgrade prompts, no affiliate links — tool is self-funded	Freemium — subscription upsells Free basic use; persistent prompts to upgrade to paid Acrobat plan	Freemium — subscription upsells Free tier with task limits; upgrade prompts throughout the tool flow	Freemium — subscription upsells Free tier with limits; Premium plan promoted within tools	Free with ads Web app displays advertising on the free tier; Premium plan available for ad-free experience	Freemium — task-limited Free tier capped at 3 tasks/hour; paid plan promoted in tool UI

ℹ️ All competitor claims verified from official online tools sources as of March 2026: Adobe Terms · iLovePDF Terms §9.3 · iLovePDF FAQ · PDF24 Privacy Policy · Sejda Privacy Policy · Smallpdf retention confirmed from tool pages (privacy policy pages unavailable at time of research). PQ PDF claims are derived from api.php, _tool_head.php, and tool source files on this server.

vs. Top-tier Desktop PDF Editors

PQ PDF's editor has grown beyond the online-tools category. This table compares it against the leading paid desktop PDF editors — all of which require software installation and cost between $79 and $240 per year. Features verified from official documentation, help centres, and product forums, April 2026.

Capability	PQ PDF	Adobe Acrobat Pro	Foxit PDF Editor Pro	Nitro PDF Pro	PDF-XChange Editor Plus	PDFelement Pro
Cost & access
Price	✔ Free — $0 No subscription, no account, no trial expiry, no usage caps	$240 / yr $19.99/month billed annually (acrobat.adobe.com pricing, April 2026)	$130 / yr $129.99/year annual plan (foxit.com/shopping, April 2026)	$160 perpetual One-time license (gonitro.com, April 2026)	$79 perpetual One-time license + 1 yr maintenance (pdf-xchange.com, April 2026)	$80 / yr $79.99/year annual plan; $129.99 perpetual option (pdf.wondershare.com, April 2026)
Installation required	✔ None — runs in any browser Chrome, Firefox, Safari, Edge — any device, any OS, no extensions needed	✘ Windows / Mac desktop app	✘ Windows / Mac desktop app	✘ Windows desktop app	✘ Windows desktop app	✘ Windows / Mac desktop app
Editing capabilities
Deep content text editing	✔ Yes Edit embedded text span-by-span; auto-shrink on overflow; re-renders affected page immediately	✔ Yes Edit Content tool; paragraph & line editing in place (Adobe help, Edit PDF)	✔ Yes Edit Text & Images panel; font, size, colour, line spacing (foxit.com)	✔ Yes Edit Text mode; paragraph editing with reflow (help.gonitro.com)	✔ Yes Content Editing tab; rich text properties (help.pdf-xchange.com)	✔ Yes Edit PDF panel; line and paragraph editing (pdf.wondershare.com)
Cross-block text reflow (linked text frames across pages)	✔ Yes Draw linked thread_frame annotations; text flows across frames and pages automatically; binary word-bisect distributes text server-side via insert_textbox()	✘ Not available in Acrobat Acrobat's "Reflow" is an accessibility reading view only. Linked text frame threading is an InDesign feature — not present in Acrobat Pro (Adobe help, April 2026)	✘ Not documented No linked text frame feature found in Foxit help centre or product pages (April 2026)	✘ Not documented Not found in Nitro user guide or community forums (help.gonitro.com, April 2026)	✘ Not documented Not found in PDF-XChange help centre (help.pdf-xchange.com, April 2026)	✘ Not documented Not found in PDFelement documentation (pdf.wondershare.com, April 2026)
Image object editing (replace / move / delete embedded images)	✔ Yes Replace, move/resize, delete. Side panel with thumbnail list; canvas drag handles; affected page re-renders immediately	✔ Yes Edit Content → select image → replace, move, delete (Adobe Acrobat Pro, Edit PDF)	✔ Yes Edit Text & Images panel; replace, move, resize, delete (foxit.com help)	✔ Yes Edit Image mode; move, resize, replace, delete (help.gonitro.com)	⚠️ Partial Select Content; move/resize supported; replace via separate workflow (help.pdf-xchange.com)	⚠️ Partial Edit PDF → image selection; move and resize; replace requires re-insert (pdf.wondershare.com)
Vector object editing (reposition, recolour, delete native PDF paths)	✔ Yes Exposes all native PDF paths via page.get_drawings(); move, recolour (stroke), or delete each path; server redraws via shape.draw_line/bezier/rect	✔ Yes Edit Content → select vector objects; move, scale, recolour via properties panel (Adobe Acrobat Pro)	✔ Yes Edit Object mode; move, scale, recolour vector shapes (foxit.com, Edit PDF objects)	⚠️ Limited Basic object selection; move/resize available; dedicated path recolour not documented (help.gonitro.com)	✔ Yes Content Editing; select and modify vector shapes; stroke/fill colour properties (help.pdf-xchange.com)	⚠️ Limited Edit PDF → object selection; move/resize; dedicated native path recolour not documented (pdf.wondershare.com)
Prepress & print production
Prepress tools (bleed expansion, crop marks, registration marks, CMYK bar)	✔ Full prepress suite Configurable bleed (mm), crop mark length & gap, registration circles with crosshairs, 5-swatch CMYK colour bar, bleed area canvas preview, all applied server-side via PyMuPDF	✔ Yes — Print Production panel Add Printer Marks dialog: trim marks, registration marks, colour bars, page info; Bleed specification; Output Preview (Adobe Acrobat Pro, Print Production)	✔ Yes — confirmed in 2026 release Bleed marks at BleedBox corners, cut marks per page, overprint preview enabled in v2026.1.0.36452 (Foxit help + UpdateStar, April 2026)	✘ Not available Only basic crop/resize of pages. No crop mark, registration mark, or CMYK bar generation documented (Nitro community forum thread on bleed/trim box display, April 2026)	⚠️ Page boxes only Can set BleedBox, TrimBox, CropBox, ArtBox via UI — but cannot automatically generate crop marks or registration marks. Forum post confirms marks must be added via overlay workaround (forum.pdf-xchange.com, April 2026)	✘ Not documented No prepress mark generation feature found in PDFelement documentation or feature lists (pdf.wondershare.com, April 2026)
PDF page boxes (TrimBox, BleedBox, CropBox, ArtBox)	✔ All 4 — set on apply page.set_trimbox / set_bleedbox / set_cropbox / set_artbox written server-side via PyMuPDF	✔ All 4 Set via Print Production → Set Page Boxes (Adobe Acrobat Pro)	✔ All 4 Page Properties → Page Boxes panel (Foxit PDF Editor)	⚠️ Partial CropBox adjustment via Crop tool; other boxes display-only per community forum (community.gonitro.com)	✔ All 4 Page Boxes editor; full read/write for all four box types (help.pdf-xchange.com, V9 manual)	⚠️ Crop only CropBox adjustable; TrimBox/BleedBox/ArtBox not independently settable (pdf.wondershare.com)
Security & compliance
PDF forensics & threat scanning	✔ 47-engine suite — unique Multi-axis verdict (malware/exploit · integrity tampering · content-integrity / semantic-determinism · structure): static heuristics · dynamic sandbox · ML/SHAP anomaly · XFA FormCalc parser · JS AST emulator · font CharString emulator · XRef integrity graph · image stego · entropy topology · signature forgery / shadow-document detection · content-integrity / AI-ingestion detection (V/AP value-vs-appearance divergence · font glyph remapping · OCR text-layer poisoning · /Alt & /ActualText prompt injection) · MITRE ATT&CK attribution · local threat intelligence (6.4M+ indicators) · AI forensic report (Qwen 2.5 1.5B) — no comparable feature exists in any other PDF tool	✘ No threat scanning Acrobat has "Sanitize Document" (removes active content) and "Remove Hidden Information" — these are redaction/cleaning tools, not forensic analysers. No multi-engine threat scanner confirmed (Adobe help, April 2026)	✘ Not available No PDF threat analysis or forensics feature documented (foxit.com, April 2026)	✘ Not available	✘ Not available	✘ Not available
Active content inspector (JS, embedded actions — view & remove)	✔ Yes — side panel Lists all JavaScript, embedded actions, launch actions, URI actions, and form actions with selective removal	⚠️ Sanitize / Remove Hidden Info Remove Hidden Information strips active content in bulk; no item-by-item inspection panel (Adobe help, Acrobat Pro)	⚠️ Document Examine Examine Document scans for hidden data and some active content; selective removal limited (foxit.com)	✘ Not documented No active content inspector found in Nitro user guide (help.gonitro.com, April 2026)	✘ Not documented	✘ Not documented
AcroForm builder + JavaScript actions	✔ Full — 7 field types, 6 JS slots Text · CheckBox · RadioButton · ListBox · ComboBox · Signature · PushButton. JS slots: Validate, Calculate, Format, Keystroke, Focus, Blur. 13 built-in templates. Scripts stored as native PDF AA dictionaries.	✔ Full Complete AcroForm creation with all field types; full Acrobat JavaScript API including event.value, this.getField(), calculate, validate, format (Adobe Acrobat Pro, Forms)	✔ Full AcroForm designer; all major field types; JavaScript actions fully supported — edit/debug JS, AA dictionary entries (foxit.com/blog, Foxit developers, April 2026)	⚠️ Basic Form fields supported; JavaScript calculations documented in user guide but advanced JS API coverage is limited compared to Acrobat (help.gonitro.com, Nitro Pro Editing Text)	⚠️ Partial AcroForm fields supported; JavaScript available but not all AA slots fully exposed through the UI (help.pdf-xchange.com)	✘ No JavaScript actions Form field creation available; JavaScript AA action slots not documented as a feature in PDFelement (pdf.wondershare.com, April 2026)
Post-quantum encryption	✔ 31 algorithms — unique NIST ML-KEM-1024/768/512, HQC-128/192/256, FN-DSA variants, hybrid modes — client-side via @noble/post-quantum; server never sees plaintext	✘ Not available	✘ Not available	✘ Not available	✘ Not available	✘ Not available
OCR (searchable PDF)	✔ Tesseract 5 LSTM — 30+ languages 3 DPI modes, 4 PSM modes, confidence score, live text preview, up to 100 pages	✔ Yes — Acrobat OCR engine	✔ Yes — Foxit OCR engine	✔ Yes	✔ Yes — Enhanced OCR engine (Plus) The Plus edition includes a more accurate enhanced OCR engine (pdf-xchange.com)	✔ Yes
PDF/A & PDF/X export	✔ Both — PDF/A 1b/2b/3b & PDF/X 1a/3/4	✔ Both — PDF/A and PDF/X Full support for all PDF/A and PDF/X variants (Adobe Acrobat Pro)	⚠️ PDF/A — PDF/X partial PDF/A conversion confirmed; PDF/X export documented for some standards but not all variants (foxit.com)	⚠️ PDF/A — PDF/X limited PDF/A conversion available; PDF/X limited support (gonitro.com)	✔ Both PDF/A (1b, 2b, 2u, 3b) and PDF/X (1a, 3, 4) confirmed (help.pdf-xchange.com)	⚠️ PDF/A only PDF/A conversion available; PDF/X not documented in feature list (pdf.wondershare.com)
Workflow & AI
Automation / workflow builder	✔ Visual workflow builder — free Chain up to 15 PDF operations; save/load/export pipelines as JSON; run on one or many files in one click; no account required	✔ Acrobat Actions Record and replay multi-step actions; batch process folders; saved as .sequ files (Adobe Acrobat Pro)	✔ Batch Processing Batch processing for OCR, watermark, convert, redact, header/footer across multiple files (foxit.com/blog)	✘ Not available No visual workflow builder or batch action chain found in Nitro Pro (help.gonitro.com, April 2026)	⚠️ PDF-Tools (separate product) 81 preset tools and custom workflows via the PDF-Tools add-on; Editor itself cannot batch process (forum.pdf-xchange.com, April 2026)	✘ Not available No workflow builder or batch chaining found in PDFelement (pdf.wondershare.com, April 2026)
AI — document analysis & forensics	🤖 Self-hosted Qwen 2.5 1.5B — 4 AI features AI Forensic Report · AI Document Analysis · AI Redaction Suggestions · AI Change Analysis — all run on our own server, zero third-party AI API calls, no file data sent to cloud AI	⚠️ Adobe Sensei AI — cloud AI Assistant (paid add-on or included at higher tiers) — routes content through Adobe cloud AI pipeline	⚠️ Foxit AI Assistant — cloud $49.99/yr add-on; powered by third-party cloud AI (foxit.com/shopping)	✘ Not available	✘ Not available	⚠️ Wondershare AI — cloud AI features included with usage quotas; powered by Wondershare cloud AI pipeline
Privacy
Zero server-side file retention	✔ Yes — deleted during download `cleanup()` called inside `send_file()` — no retention window of any kind on any server	Desktop — local only PDF never leaves your machine for core editing; cloud features (Adobe cloud storage, Adobe AI) may store or transmit content	Desktop — local only PDF never leaves machine for core editing; Foxit cloud features and AI Assistant transmit content	Desktop — local only Core editing is fully local; no cloud component required for standard use (gonitro.com)	Desktop — local only Entirely local processing; no cloud subscription required (pdf-xchange.com)	Desktop — local only PDF never leaves machine for core editing; Wondershare cloud AI features transmit content

ℹ️ Desktop editor claims verified from official sources, April 2026: Adobe Acrobat pricing · Adobe reflow = accessibility view only · Foxit pricing · Foxit 2026 prepress features · Nitro pricing · Nitro bleed/trim box limitations · PDF-XChange pricing · PDF-XChange crop marks forum · PDFelement pricing. PQ PDF claims verified from source files on this server.

12 tools for everyday PDF manipulation. All processing is server-side; nothing is rasterised unless explicitly requested.

📎

Merge PDFs

Combine up to 20 PDFs into one (200 MB total). Drag thumbnails to reorder before merging. Real-time upload progress percentage.

Try it →

✂️

Split PDF

Split by every page, a fixed interval, custom page ranges, or interactive cut-point selection. Output is a ZIP of individual PDFs.

Try it →

🗜️

Compress PDF

Five quality presets plus custom DPI slider (50–600). Optional metadata stripping, linearisation, and stream recompression. Live before/after split-canvas preview from page 1. Shows size reduction after download.

Try it →

🔄

Rotate Pages

Rotate all, odd, even, or a custom range. Supports 90°/180°/270° and arbitrary decimal angles. Live canvas preview of page 1.

Try it →

📃

Extract Pages

Click a thumbnail grid to select the pages you want to keep. Selections auto-compress to ranges (e.g. 1–3, 5, 7–9).

Try it →

🗑️

Delete Pages

Click the thumbnail grid to mark pages for removal. Everything else is kept.

Try it →

🔀

Reorder Pages

Drag-and-drop page thumbnails to rearrange, then export the reordered PDF.

Try it →

🔧

Repair PDF

Reconstructs corrupted or malformed PDFs via Ghostscript. On upload, PDF.js diagnoses the file client-side — checking the header, xref table, and content streams — and shows red error badges or a green "readable" confirmation before any server work runs.

Try it →

📋

Flatten PDF

Permanently bakes form fields, annotations, and layers into the page content. Client-side pre-scan shows exactly what will be flattened — field counts, annotation types, layer names — with a green "already flat" badge if nothing is found.

Try it →

🎨

Grayscale / B&W

Convert a colour PDF to grayscale or pure black-and-white. Live before/after split-canvas preview: colour on the left, grayscale simulation on the right.

Try it →

📋

N-up / Imposition

Arrange multiple PDF pages on each output sheet: 2-up, 4-up, 6-up, 8-up, 9-up, or booklet (pages re-ordered for saddle-stitch binding). Uses PyMuPDF show_pdf_page() — vector output, no rasterisation. Page size and orientation selectable.

Try it →

📐

Auto-crop & Deskew

Remove excess white margins and correct page rotation. Three modes: crop only, fix rotation only, or both. Features a per-page interactive crop editor — see the deep dive below.

Try it →

Most deskew tools apply a single global correction. This one gives you a per-page interactive editor before anything is sent to the server.

How auto-detection works

After upload, each page is rendered via PDF.js. Text extraction and PyMuPDF's bounding-box analysis across text blocks, vector paths, and raster images detects the tight content boundary. A 20pt safety margin is added. The result is drawn as a draggable crop box over the rendered page.

The interactive crop editor

↕

8 drag handles 4 corners + 4 edge midpoints. Resize the keep area in any direction.

✋

Pan by dragging inside the box Move the crop region without resizing it.

🔁

Apply to all pages Normalises the current crop as proportional fractions and applies it to every page in the document — useful when all pages have consistent margins.

🔄

Reset page Re-runs auto-detection on the current page, discarding your manual adjustment.

What gets sent to the server

Per-page overrides are sent as a JSON array of {page, x0, y0, x1, y1} in PDF display-space points. Pages without manual overrides continue to use server-side auto-detection. The rotation fix bakes the /Rotate flag into the content stream so output pages have rotation=0 in all viewers — aspect ratio and coordinate mapping are preserved for 90°/180°/270° pages via an offset target-rect approach.

Compress PDF controls image DPI downsampling via Ghostscript and optionally applies stream-level recompression via qpdf. A live split-canvas preview renders page 1 as soon as you upload — original on the left, simulated compression on the right — so you can see the visual impact before committing to a download.

Quality presets

📱

Screen — 72 DPI Optimised for on-screen reading. Minimum file size. Suitable for email attachments where print quality is not required.

📖

eBook — 150 DPI Recommended preset. Balanced quality and file size — images remain sharp on screen and in basic print. Best choice for most documents.

🖨️

Printer — 300 DPI High-quality output suitable for desktop printing. Noticeably larger than eBook but retains photographic detail at print scale.

🎨

Prepress — 300 DPI with colour profiles preserved Maximum colour fidelity. Preserves embedded ICC profiles and applies Ghostscript's /prepress quality settings. Use for documents destined for a print shop or colour-critical workflows.

🎯

Custom — 50–600 DPI (10 DPI steps) Slider-controlled DPI for precise control. Useful when you know the exact output requirement — for example, 120 DPI for a mobile-only document or 600 DPI for archival-quality images.

Advanced options

🗑️

Strip metadata Removes the PDF Info dictionary — title, author, subject, keywords, creator, producer, and creation/modification dates. Useful when distributing documents publicly and you want to remove any authoring footprint.

⚡

Web-optimise (linearise) Restructures the PDF so the first page is available before the full file downloads — enabling browsers to render it progressively. Adds minor overhead to file size but significantly improves perceived load time for large documents hosted online.

📦

Recompress streams (qpdf) Runs qpdf's maximum Flate (Deflate) compression across all internal PDF data streams after Ghostscript finishes. Adds 2–10% additional savings on top of image resampling. Most effective on documents with many uncompressed object streams.

What gets compressed — and what doesn't

DPI settings affect raster images embedded in the PDF. Photographs, scanned pages, and screenshots will see the largest reductions — typically 40–90%. Vector content and text are not affected by DPI; they are resolution-independent and remain identical across all presets. A text-only document will see minimal reduction regardless of preset; in that case, enabling stream recompression and metadata stripping provides the most benefit.

Split-canvas preview

As soon as a file is uploaded, page 1 is rendered at full resolution in the browser via PDF.js. The canvas is split vertically — left half shows the original, right half simulates the compressed output at the selected DPI. The preview updates live as you switch presets. This shows the visual quality trade-off before any server processing occurs.

Split PDF supports four distinct split strategies. All modes use Poppler at the binary level — pages are extracted without re-rendering, so fonts, images, and text layers are preserved exactly as they appear in the source.

Split modes

📄

Every page Produces one PDF per page, packaged as a ZIP. A 40-page document becomes 40 individual PDFs. Useful for splitting scanned documents into individual records or for batch processing single-page files.

🔢

Every N pages (interval) Specify a chunk size (e.g. 10) and the document divides into equal-sized pieces — with a smaller final chunk if the total is not divisible. A 45-page document split every 10 produces four 10-page PDFs and one 5-page PDF.

✂️

Interactive cut points After upload, all pages render as thumbnails. Scissors (✂) icons appear between each pair of pages. Click a scissors icon to mark a split point — click again to remove it. Multiple split points produce multiple PDFs. A counter shows how many pieces the document will become before you submit.

📋

Custom page ranges Enter comma-separated ranges (e.g. 1-3, 5, 7-9). Each range becomes a separate PDF. Pages not covered by any range are discarded. Useful for extracting specific named sections from a longer document.

Output

Every Page, Interval, and Interactive modes output a ZIP archive. Custom ranges output a single PDF when one range is specified, or a ZIP for multiple ranges. No re-encoding occurs — output pages are binary-identical to the source pages.

N-up imposition places multiple source pages onto each output sheet. Unlike tools that rasterise pages before imposing them, this tool uses PyMuPDF's show_pdf_page() — pages are placed as live PDF content. Text remains selectable and searchable in the output; images are not re-compressed.

Layouts

📄

2-up (2×1)Two source pages side by side on one landscape sheet.

📋

4-up (2×2)Four source pages in a 2-column, 2-row grid.

📋

6-up (2×3)Six source pages in a 2-column, 3-row grid.

📋

8-up (2×4)Eight source pages in a 2-column, 4-row grid.

📋

9-up (3×3)Nine source pages in a 3-column, 3-row grid.

📖

Booklet (saddle-stitch) Pages are re-ordered and paired for saddle-stitch binding — when folded and stapled, they read in correct sequence. A 16-page document becomes 4 sheets: sheet 1 has pages 16&1 on the outside and 2&15 on the inside, and so on. Output is 2-up on landscape sheets, ready for duplex printing and folding.

Output options

Output page size is selectable (A4, Letter, Legal, A3, or original). Orientation (portrait/landscape) is configurable independently. Pages are auto-scaled to fit each cell while preserving aspect ratio.

13 tools converting between PDF and common document, spreadsheet, presentation, image, and web formats. All conversions run via locally-installed open-source engines — LibreOffice, Ghostscript, PyMuPDF, ImageMagick.

PDF to other formats

📝

PDF → Word

Export to .docx, .odt, .rtf, or .txt via LibreOffice. A format fidelity indicator shows star ratings (out of 4) for each output format before you convert.

Try it →

📊

PDF → Excel

Extracts tables to .xlsx via LibreOffice. Best suited to PDFs where table structure is preserved in the source document.

Try it →

🖼️

PDF → Images

Renders pages to PNG or JPEG at 72–600 DPI. Select all pages or a custom range. JPEG quality slider available. Live DPI preview: page 1 is rendered in a canvas at the selected DPI immediately on upload, showing actual pixel dimensions before processing. Download as ZIP.

Try it →

🗂️

PDF → PDF/A

Convert to PDF/A-1b, PDF/A-2b, or PDF/A-3b for long-term archival (ISO 19005). Fonts are embedded, transparency is flattened, colour profiles are attached.

Try it →

🖨️

PDF → PDF/X

Convert to print-industry PDF/X (X-1a, X-3, or X-4) via Ghostscript with CMYK colour conversion, /prepress quality, and configurable render intent. All fonts embedded, colour data print-shop compliant.

Try it →

💽

PDF → PowerPoint

Each page is rendered at 150 DPI via PyMuPDF and placed as a full-bleed image on its own slide using python-pptx. Slide dimensions match the original page aspect ratio.

Try it →

🌐

PDF → HTML

Converts pages to a styled HTML document using PyMuPDF page.get_text("html"), preserving font, size, and positioned text spans. Produces a single self-contained .html file with print-friendly styling.

Try it →

📄

PDF → Markdown

Uses pymupdf4llm — the AI/LLM-optimised layout analysis engine built on PyMuPDF 1.27 + ONNX. Detects headings, paragraphs, tables, code blocks, and list structures. Produces clean .md ideal for RAG pipelines and LLM ingestion.

Try it →

Other formats to PDF

📝

Word → PDF

Convert .doc / .docx / .odt / .rtf / .txt via LibreOffice. A fidelity indicator shows expected quality for the file type you upload.

Try it →

📊

Excel → PDF

Convert .xls / .xlsx / .ods / .csv via LibreOffice. A sheet selector fetches sheet names from the uploaded file so you can choose which sheets to convert.

Try it →

💽

PowerPoint → PDF

Convert .ppt / .pptx / .odp via LibreOffice. A slide selector fetches slide titles from the uploaded file so you can choose which slides to include.

Try it →

🖼️

Images → PDF

Pack JPEG / PNG / WebP / BMP / TIFF / GIF images into a single PDF via ImageMagick. Drag thumbnails to reorder before generating.

Try it →

🌐

HTML → PDF

Upload a .html / .htm file or enter any public URL. Converted via Playwright/Chromium — full Chromium rendering engine captures modern CSS, web fonts, lazy-loaded images, and JavaScript-rendered content. Page size, orientation, and margins are configurable.

Try it →

Most PDF-to-text converters flatten the document into a single stream of characters, destroying the layout information that makes it useful — headings become plain lines, tables become scrambled text, multi-column layouts interleave content from adjacent columns. pymupdf4llm analyses the document layout before extracting text.

Engine: pymupdf4llm + ONNX

pymupdf4llm is the AI/LLM-optimised extraction layer built on PyMuPDF 1.27 with an ONNX inference backend. It analyses bounding-box positions, font sizes, column boundaries, and text flow to infer document structure before generating Markdown — the same technique used by state-of-the-art document AI pipelines.

Structural elements detected

# Headings — H1–H4 inferred from font size hierarchy 📝 Paragraphs — reflowed, not line-broken 📊 Tables — GitHub-flavored pipe syntax 💻 Code blocks — monospace font detection, triple-backtick fencing • Bullet and numbered lists 📰 Multi-column layout handling

LLM and RAG use cases

🧠

RAG pipelinesChunk structured Markdown by heading for higher-quality retrieval than chunking raw PDF text. LangChain and LlamaIndex consume Markdown natively with heading-aware splitters.

💬

Direct LLM ingestionStructured Markdown preserves the relationships between headings, sub-sections, and tables that a flat text dump destroys — reducing hallucination rates when models reason over document content.

📚

Documentation and knowledge basesConvert internal PDF documentation to Markdown for version control, wiki import, or static site generation.

When it works best: PDFs with a native text layer (not scanned). For scanned documents, run OCR first to generate a searchable PDF, then convert to Markdown.

Convert PDF pages to individual image files using Poppler's pdftoppm renderer. A live preview shows exactly what the output will look like at the selected DPI — including the actual output pixel dimensions — before any server processing starts.

DPI options

📱

72 DPI — ScreenSmallest files. Suitable for thumbnails, web previews, or email where exact reproduction is not needed.

🖥️

96 DPI — Standard screenCommon web resolution. Matches the historical CSS reference pixel for 1:1 display on most monitors.

📸

150 DPI — Good quality (default)Recommended for most uses. Sharp enough for document review, form scanning, and presentation slides.

🖨️

300 DPI — Print qualityFull print resolution. Use when images will be printed or when fine detail — small text, thin rules — must be preserved. ZIP files will be proportionally larger.

Output formats

PNG — Lossless. All detail preserved. Recommended for technical documents, forms, and anything with sharp edges or small text. JPEG — Lossy with a configurable quality slider (50–100%, default 85%). Considerably smaller at quality 80+. Recommended for photo-heavy PDFs or when file size is critical.

Live DPI preview with file size estimate

As soon as a file is uploaded, page 1 is rendered in the browser at the selected DPI. The size estimate card shows the actual output pixel dimensions and an estimated file size per page — for example, "1240 × 1754 px per page — ~1.5 MB per page". PNG estimates use ~0.7 bytes/pixel (lossless document content). JPEG estimates scale with the quality slider: at quality 85 the multiplier is ~0.14 bytes/pixel — considerably smaller than PNG. Changing the DPI, format, or quality slider re-runs the estimate immediately so you know exactly how large the ZIP will be before any server processing starts.

Page selection and output

All pages or a custom range (e.g. 1-3, 5, 7-10). Output is a ZIP archive containing one image per page, named sequentially — page-001.png, page-002.png, etc.

PDF/X is the ISO standard for print exchange (ISO 15930). It constrains the PDF feature set to what is reliably reproducible by commercial presses — no RGB images, all fonts embedded, transparency flattened in most variants. This tool converts via Ghostscript's /prepress quality settings with configurable render intent.

PDF/X standards

🖨️

PDF/X-3 (recommended) ICC colour management allowed. RGB content is converted to CMYK DeviceCMYK using the selected render intent. Widest acceptance among print shops — the default for most commercial print submissions.

🖨️

PDF/X-1a CMYK and spot colours only — no ICC profiles, no RGB content of any kind. The strictest standard. Required by some newspaper and magazine publishers for predictable ink coverage.

🖨️

PDF/X-4 Extends PDF/X-3 to allow live transparency and layers. Modern print workflows that support PDF/X-4 handle transparency natively — preserving edge quality on gradients and drop-shadows without flattening artefacts.

Render intent — controls RGB → CMYK mapping

📋

Relative Colorimetric (default)Standard press intent. Clips out-of-gamut colours to the nearest reproducible value. White-point adapted to the output profile. Best for most business documents.

📸

PerceptualCompresses the entire colour gamut to fit within CMYK, preserving relative colour relationships. Out-of-gamut colours are not clipped — the whole image shifts slightly to maintain harmony. Best for photographs.

📊

SaturationPrioritises vivid, saturated colours over accuracy. Best for business graphics, charts, and presentations where colour impact matters more than fidelity.

🧪

Absolute ColorimetricNo white-point adaptation — reproduces colours exactly as defined in the source profile, including paper-white simulation. Used for proofing and colour matching against a specific reference.

What the conversion does

All RGB images are converted to DeviceCMYK. All fonts are embedded and subsetted. Transparency is flattened (PDF/X-1a and X-3). Ghostscript's /prepress output intent is applied. The resulting file meets the ISO 15930 constraint set for the selected variant and is accepted by commercial RIP workflows.

Nine tools covering PDF, Office, and universal file forensics & analysis, encryption, decryption, permanent content removal, watermarking, and cryptographic signing. All run server-side on local engines — nothing is sent to a third-party service.

🔬

PDF Forensics Scanner

Forensic analysis across 47 independent engines — structural, behavioural, provenance, ML anomaly detection with SHAP, local threat intelligence (URLhaus · MalwareBazaar · ThreatFox — 6.4M+ indicators, no external APIs), AcroForm field forensics, PDF signature forensics, phishing detection, embedded file analysis, and TLSH campaign attribution. MITRE ATT&CK mapping on every indicator. Results across 24 analysis tabs including 🤖 AI Forensic Report — Qwen 2.5 1.5B Instruct synthesises all 47 engine outputs into a structured verdict, with semantic context from live engine data: actual phishing phrases, JavaScript call targets, embedded payload strings, FormCalc code, and SHAP feature explanations fed directly to the model. Verdict is exec-vector-aware (high score with no execution vector caps at LIKELY_CLEAN). MALICIOUS verdict auto-labels the record as 'malicious'; CLEAN/LIKELY_CLEAN as 'benign'; SUSPICIOUS is not labeled (ambiguous). Triggers ML retrain at threshold — no user input needed. 9-mode sanitize: flatten to images, strip active content, remove JavaScript, remove embedded files, remove XFA, remove rich media, normalize structure, flatten forms, or strip metadata. The most technically deep tool on the site — see the deep dive.

Try it →

🗂️

Office Document Forensics Scanner

Forensic analysis of Word, Excel, PowerPoint, Outlook, Access, and Visio files across 23 independent engines — container integrity, VBA macro extraction (olevba · mraptor · pcodedmp), Excel 4.0 XLM/DDE chain analysis, OLE compound structure inspection, IOC extraction (URLs · IPs · domains · registry keys · base64 payloads), ClamAV antivirus, YARA rule engine, offline threat intelligence, isolation chamber detonation (unshare + strace), entropy anomaly detection, OOXML relationship forensics (remote template injection), metadata provenance, NLP social engineering classifier, intelligent cross-engine correlation, and AI forensic report (Qwen 2.5 · MITRE ATT&CK · verdict · confidence). 4-mode sanitize. Zero data retention.

Try it →

🔬

Universal File Forensics Scanner

Forensic analysis of all file types — images, audio, video, archives, executables, scripts, databases, fonts, certificates, and network captures — across 23 independent engines: file identification (magic bytes, MIME, polyglot detection), entropy & compression anomaly, metadata forensics (EXIF/GPS/ID3), IOC & string extraction (URLs, IPs, domains, Base64, reverse shell patterns), binary artifact analysis + XOR brute-force deobfuscation (255-key single-byte XOR decode over 512 KB — surfaces hidden C2 URLs, shellcode loaders, and obfuscated API names invisible to raw string extraction), PE executable analysis (imports, sections, anti-debug, overlay), ELF binary analysis (rootkit indicators, dangerous syscalls), archive inspection (zip bombs, path traversal, double-extension), image forensics & steganography (LSB chi-square, SVG JavaScript injection, PNG chunk abuse), script & code analysis (reverse shells, AMSI bypass, PHP webshells, obfuscation), watermark detection (EXIF copyright fields, ID3/XMP/IPTC tags, alpha channel overlay extraction, OCR burned-in text), six-layer isolation chamber detonation (strace syscalls · ltrace library calls · in-memory YARA dump analysis for PE/ELF/shellcode/Meterpreter/CobaltStrike payloads that never touch disk · fake DNS+HTTP network capture inside namespace · CPU/VM fingerprint masking via /proc/cpuinfo bind-mount · LibreOffice headless with macro security disabled for .doc/.docm/.xls/.xlsm/.xlsb/.ppt/.pptm/.odt · Playwright + Chromium with interaction simulation for HTML/SVG · faketime clock freeze at 2023-06-15), Windows execution layer (Wine 9.0) (.exe/.dll/.msi/.bat/.cmd/.ps1/.vbs/.vbe/.hta detonation in Linux namespace with same six-layer instrumentation · registry persistence detection across Run/RunOnce/Winlogon/Services keys · anti-VM/debugger evasion detection · suspicious process spawn tracking · post-run registry diff), real Windows micro-VM detonation (KVM/QEMU Windows 10 — genuine Windows kernel · PowerShell-monitored 30 s execution · process spawn tracking with 19 suspicious-process flags · netstat network connection capture · registry persistence key diffing · completely isolated, no VM network — triggers only on CRITICAL/HIGH risk samples), ClamAV, YARA (20 rules), offline threat intelligence, campaign intelligence (named cluster detection — e.g. PHANTOM-KRAKEN-07 · malware family classification: CobaltStrike/Emotet/QakBot/Meterpreter/Ransomware/Mirai/AsyncRAT/RedLine/Cryptominer/Shellcode · 90-day activity trend tracking · D3 force-directed intelligence graph · Campaign Dashboard at /tools/campaigns), intelligent correlation, and AI forensic report (Qwen 2.5 · MITRE ATT&CK · verdict · confidence). Zero data retention.

Try it →

🔬

File Fingerprint Comparator

Upload two PDF or Office documents to compare their structural fingerprints and security profiles side by side. Both files are scanned in parallel through all forensic engines, then diffed across 25+ security features — encrypted status, ClamAV, YARA matches, threat intel hit, macro presence, IOC counts, risk score, sandbox behaviour, and more — to produce a similarity score, variant verdict, and a differences-first comparison table. Useful for detecting malware variants, comparing suspicious attachments, or verifying document integrity. Supports cross-format comparisons (e.g. PDF vs Word). Zero data retention.

Try it →

🛡️

Protect PDF

Two modes: AES-256-CBC server-side with granular permissions, or client-side post-quantum encryption with 31 algorithms. In PQC mode the server never sees your plaintext. See the deep dive.

Try it →

🔓

Unlock PDF

Remove password protection (owner password required). Detects encryption type client-side before upload — shows AES-256 or PQC badge. PQC bundles (.pqcpdf) are auto-detected and routed to the quantum-safe decryption panel.

Try it →

⬛

Redact PDF

Two modes: text-pattern redaction (multi-pattern list, case sensitivity, whole-word matching) or mouse-drawn region redaction on a canvas preview. Redaction is permanent — content is erased server-side, not just covered. Includes 🤖 AI Redaction Suggestions — Qwen 2.5 1.5B analyses extracted text and proposes redaction patterns by PII category (names, emails, IDs, financial data, and more) with one-click add to the redaction list.

Try it →

💧

Add Watermark

Stamp text watermarks with 8-position placement, opacity, rotation, font size, font style, and hex colour. Apply to all, odd, even, or custom page ranges. Live canvas preview updates in real time as you adjust settings.

Try it →

✍️

Sign PDF & PAdES

Four signature modes: draw, type, upload image, or invisible PAdES cryptographic signature. All modes support RSA-2048 certificates — auto-generated or your own .p12. See the deep dive.

Try it →

PDF is the most abused document format for delivering malware. This forensics scanner runs 47 independent engines covering every investigative dimension — byte-level signatures, structural integrity, sliding-window entropy, provenance analysis, dynamic behavioural tracing, machine learning anomaly detection with SHAP explanations (IsolationForest + RandomForest + LightGBM), multi-parser differential analysis across six independent parsers, fully offline threat intelligence (URLhaus · MalwareBazaar · ThreatFox · FeodoTracker · OpenPhish — 6.4M+ indicators, zero external API calls), PDF digital signature forensics, phishing detection, AcroForm field forensics (JS triggers on field events, SubmitForm exfiltration targets, hidden fields, password fields, /AA hooks, calc-order chain exploitation), embedded file analysis (magic-byte classification, VBA macro detection, full ZIP archive content listing, nested PDF detection, PowerShell content analysis), PDF 2.0 (ISO 32000-2) structure analysis (Associated Files /AF, unencrypted-wrapper / encrypted-payload detection, document-part hierarchy, tagged-PDF namespaces), and TLSH + pHash + JS-fingerprint campaign attribution. Every indicator is tagged with MITRE ATT&CK technique IDs. Results are presented across 24 analysis tabs: Summary, Threats, Score, a per-engine two-panel browser (click any of the 47 engines for its full findings + structure fields), URLs, Streams, ML/SHAP, Sandbox, Threat Intel, MITRE, Differential Parsing, Polyglot, Phishing, Embedded Files, Signature Forensics, Revision History, Annotations, Metadata, XFA FormCalc, Action Graph, Deep Forensics (engines 34–43), 🤖 AI Forensic Report (Qwen 2.5 1.5B Instruct synthesises all 47 engine outputs into threat verdict, confidence rating, executive summary, key findings, MITRE technique grid, and recommended actions — fully local, structured JSON output, ~15–25 s on CPU), Raw JSON, and a Raw Forensics view showing decoded stream content, JavaScript sources, all indicator contexts, and the complete structure dump. File bytes never leave the server — no hash or data is sent to any external service at any point. Results are forensic-grade: each indicator is documented with engine source, severity, and contextual explanation. File size limit: 10 MB. Threat intelligence research (MalwareBazaar corpus, HP Wolf Security telemetry, Contagio malware archive) consistently shows real-world malicious PDFs are under 5 MB — exploit-kit payloads average 200 KB–1 MB, phishing lures 300 KB–4 MB, dropper PDFs up to 8 MB. The 10 MB cap covers every known threat class with 2× headroom. Scanning larger files requires enterprise deployment.

After a scan, a 9-mode sanitize panel appears. Basic: Flatten to Images (PyMuPDF raster rebuild — maximum safety, destroys all active content) · Strip Active Content (Ghostscript -dSAFER — moderate safety, text usually retained). Advanced — Surgical Cleaning: Remove JavaScript (/JS /AA nullified, layout preserved) · Remove Embedded Files (all /EmbeddedFile attachments) · Remove XFA Forms (/XFA definitions) · Remove Rich Media (/RichMedia /Movie /Sound) · Normalize Structure (qpdf rebuild — collapses incremental updates, disables object streams, decodes filter chains) · Flatten Forms (PyMuPDF bake() renders AcroForm widgets to static content) · Strip Metadata (/Info + XMP stream). All modes produce a new file; the original is never modified.

The 47 engines

Engine 1 Structure Validator Validates fundamental file structure before any content analysis: %PDF- header position (flagged if beyond byte offset 1,024), %%EOF marker count (>2 indicates incremental update stacking or exploit layering), xref table depth (>3 flagged), obfuscation codec count (ASCIIHexDecode / ASCII85Decode / LZWDecode >3 flagged on non-image streams — image XObjects are excluded since they legitimately use these codecs as standard output from PDF generators such as ReportLab and Ghostscript), and excessive filter chains (>120 /Filter entries). Proportional incremental injection: flags if the final revision adds >10 new objects compared to prior revisions — a disproportionately large final update is a strong indicator of post-signing payload injection. Linearized first-page object override detection — a set-difference check (new OIDs = incremental − baseline) catches added objects but silently passes redefined objects carrying the same ID. This engine computes the set intersection (redefined = baseline ∩ incremental) to catch object substitution. For linearized PDFs the /Linearized dictionary's /O field names the Page 1 primary object used by renderers for fast first-page display. If /O is in the redefined set and the incremental bytes contain /JavaScript, /AA, or /OpenAction, severity is Critical — renderers fast-pathing via the hint table display the injected content on first render without re-evaluating the override (MITRE T1036 + T1027). PDF 2.0 (ISO 32000-2) structures — records the /DPartRoot document-part hierarchy (§14.12) and tagged-PDF /Namespaces (§14.7.4); neutral structure, the latter part of the accessibility/semantic layer reality-drift attacks target. Collects: PDF version, linearised flag, binary comment presence.

Engine 2 Raw Pattern Scanner Scans raw file bytes for 45+ known-malicious byte sequences in six categories — JavaScript execution: /JavaScript, /JS, /Launch, /OpenAction, /AA — remote & form actions: /GoToR, /SubmitForm, /ImportData, /Rendition, /Hide — embedded & rich content: /EmbeddedFile, /RichMedia, /XFA, /AcroForm — obfuscation: /ObjStm, /JBIG2Decode, /ASCIIHexDecode — dangerous JS APIs: unescape(), eval(), String.fromCharCode, collab.getIcon (CVE-2009-0927), util.printf (CVE-2008-2992), media.newPlayer (CVE-2009-4324), Collab.collectEmailInfo (CVE-2007-5659) — shellcode: %u9090 (Unicode NOP sled), %u4141, %u0c0c%u0c0c heap-fill patterns. Evasion patterns: /Trans with JavaScript (page-transition trigger used to execute JS while evading action-based detection); /OpenAction hidden inside an AcroForm /DR indirect reference (indirect variant bypasses naive dictionary-key scanners). Each match records a context snippet (20 bytes before, 60 bytes after) for the Threats tab.

Engine 3 Stream Decompressor & Content Inspector Opens every object in the xref graph (up to 6,000 objects) via PyMuPDF and decompresses each stream via doc.xref_stream(xref) — catching JavaScript and shellcode hidden inside compressed objects that raw-byte scanners miss entirely. Calculates entropy using 512-byte sliding windows; any window exceeding 7.6 bits/byte on non-image streams flags encrypted, packed, or obfuscated payloads (detects shellcode splices that average out in whole-stream analysis). Decompression bomb detection flags streams with >500:1 compression ratio; image XObjects (/Subtype /Image, DCT, JPX, CCITT, JBIG2) are excluded from both the entropy check and the decompression bomb check — uniform-fill or solid-colour images legitimately achieve extreme compression ratios at near-zero entropy. Scans decompressed content for 14 JS/shellcode signatures. Returns up to 40 streams with xref number, entropy, type, and matched patterns.

Engine 4 Object Graph Traversal Maps parent/child object relationships across the xref graph and flags abnormal nesting depth, circular references, and shadow object trees.

Engine 5 URL Extractor URL extraction from all object streams — flags known malicious domains and suspicious URL patterns. Detects data: URI schemes (data:text/html, data:application/*) that deliver payloads without network requests, bypassing URL-reputation filters. Also detects hex-encoded URLs in JavaScript (\x68\x74\x74\x70 = "http") used to hide C2 addresses from static scanners.

Engines 6–9 Metadata / Font / CVE / Stats Engine 6 (Metadata Analyzer): Extracts and cross-validates all PDF Info dictionary and XMP metadata fields. Creation vs. modification timestamp delta analysis: a gap of 0–5 seconds between CreationDate and ModDate indicates scripted, automated document generation — a common characteristic of malware factory pipelines. Engine 7 (Font Analyzer): Unusual font names, encoding flags, and embedding status. Font objects are a common exploit carrier — malformed font tables trigger heap corruption in viewer rendering engines (e.g. CVE-2010-2883, Type1C font vulnerabilities). JBIG2 exploit detection follows indirect /FontFile* references: some exploits store the JBIG2-filtered stream on a separate object pointed to by the font dict rather than embedding the filter directly, and both forms are caught. Engine 8 (CVE Pattern Matcher): Byte-level CVE signature matching — known exploit patterns for CVE-2009-0658 (JBIG2), CVE-2009-4324 (/OpenAction JS), CVE-2010-2883 (font), and other historically weaponised PDF CVEs. Engine 9 (Structural Statistics): Object-to-page ratio heuristic — >50 objects per page is anomalous and flags potential exploit payload inflation. Zero-page detection: a PDF with 0 pages is a pure exploit payload with no legitimate document content (critical severity).

Engine 10 ExifTool Forensics Deep EXIF/XMP metadata forensics via ExifTool 12. Detects metadata inconsistencies, hidden authoring tool footprints, GPS data, and fields that conflict with PDF structure — useful for provenance analysis.

Engine 11 — qpdf Structural Integrity qpdf binary-level structural analysis — detects object stream corruption, incorrect xref table offsets, overlapping object definitions, and linearisation anomalies that indicate deliberate file manipulation. PDF 2.0 unencrypted-wrapper / encrypted-payload detection (ISO 32000-2 §7.6.7) — flags documents where a clear cover page carries an /AF file whose /AFRelationship is /EncryptedPayload (optionally with a /Collection wrapper view); the real content is sealed inside an encrypted attachment no static engine can read, graded on the tampering axis as a deliberate content-hiding construct.

Engine 12 — Signatures YARA Rule Matching (YARA 4.5) 24 custom YARA rules targeting PDF-specific exploit patterns, obfuscated JavaScript payloads, embedded binary signatures, CVE-specific byte patterns (CVE-2009-0658, CVE-2008-2992, CVE-2010-1240, CVE-2018-4990, CVE-2021 XFA, CVE-2024-41869 UAF, CVE-2024-45112 type confusion), PowerShell stager patterns, Cobalt Strike beacon signatures, and multi-stage dropper structures. External .yar rule files are loaded from a configured rules directory. Rules cover patterns not caught by byte-string matching alone.

Engine 13 — Deep Object PeePDF + pikepdf Analysis PeePDF (v0.4) deep object analysis — decodes compressed object streams (/ObjStm), reconstructs the internal object graph, and analyses suspicious cross-references, duplicate object definitions, and object version stacking. Supplemented by pikepdf (a modern libqpdf-based Python parser) which independently extracts the JavaScript Names tree, counts embedded file attachments, detects per-page /AA triggers, and provides a second independent indicator set. Crash/timeout behaviour of each parser is tracked separately.

Engine 14 — Sandbox Dynamic Behavioural Sandbox — 6 Renderers The PDF is rendered through six independent engines — Ghostscript, MuPDF, Poppler, LibreOffice Draw, Chromium PDFium, and pdf.js/Node — each inside an isolated Linux namespace via unshare --net --pid --mount with all syscalls captured by strace. The network namespace makes any connect() or sendto() syscall definitively malicious — there is no legitimate reason for a PDF renderer to initiate network contact in an isolated namespace. Detects: outbound C2 beacons, anonymous executable memory mappings (shellcode staging), unauthorised process spawning (code execution), filesystem escape attempts, DNS lookups, and fork-bomb patterns. PDFium (Playwright/Chromium) covers the Chrome browser attack surface — where most users now open PDFs. pdf.js/Node covers the Firefox/Mozilla rendering engine. LibreOffice Draw exposes OLE macro and embedded content paths. When all renderers complete without triggering, a confirmed clean result is explicitly surfaced so analysts know the sandbox ran successfully.

Engine 15 — Signatures ClamAV 1.4+ ClamAV signature scanning against 700,000+ malware signatures via local clamdscan daemon. The clamav user is a member of the www-data group so the daemon reads upload files directly — no --fdpass needed, no fallback to the slow single-process scanner. The only engine that makes external calls — and only for signature database updates via clamav.net, never for file analysis.

Engine 16 — ML ML Intelligence Engine Extracts a 38-dimensional feature vector from all preceding engine outputs. Applies four models: IsolationForest (unsupervised anomaly detection — works from scan 1, no labelled data required), RandomForest classifier (supervised — activates at ≥10 labelled samples; bootstrap pseudo-labeling supplements the set when below threshold but ≥1 malicious label exists), LightGBM (gradient-boosted ensemble with class-imbalance weighting, RF+LightGBM scores are averaged), and Bayesian contextual scoring. SHAP explanations use TreeExplainer for RandomForest/LightGBM and KernelExplainer (nsamples=50) for IsolationForest. Model drift detection warns when models have not been retrained in >30 days. Feature vectors and auto-inferred labels are persisted to PostgreSQL. Models retrain every 30 minutes via cron. No file content, filename, hash, or PII stored.

Engine 17 — Differential Multi-Parser Comparison (6 parsers, 8 dimensions) Runs MuPDF (mutool), Poppler (pdfinfo/pdfdetach), Ghostscript, qpdf, pdfminer, and Node.js pdf.js across 8 structural dimensions simultaneously: page count, object count, JavaScript presence, PDF version, encryption status, AcroForm presence, embedded file count, and OpenAction. Seven distinct discrepancy checks (Critical/High/Medium) flag hidden objects, shadow object trees, or deliberate parser-confusion exploits. Page delta scoring is weighted by magnitude (up to +70/critical for >50 page delta). A hard 30-second SIGALRM wraps the engine; pdfminer runs in a subprocess with timeout 6 for guaranteed hard-kill.

Engine 18 Polyglot / Binary Detector Two-layer polyglot detection. File-level polyglot — checks whether a recognised format magic signature (FF D8 FF JPEG, PK\x03\x04 ZIP, PNG, GIF, Gzip, OLE, RIFF) appears in the bytes before the %PDF- header. ISO 32000 §7.5.2 NOTE 1 allows arbitrary bytes before %PDF-; attackers use JPEG+PDF and ZIP+PDF polyglots to bypass format-based routing in email security gateways — the gateway classifies the file as a JPEG image and skips PDF scanning entirely. Stream-level polyglot — scans every stream (raw and decompressed) for embedded executable magic: ZIP, Windows PE (MZ + validated PE header via e_lfanew pointer), Linux ELF (with class-byte validation), Mach-O, Java class, OLE/CFBF, RAR, 7-Zip, embedded PostScript, HTML/XHTML, WebAssembly (\x00asm), and Python bytecode. Mid-stream scanning at non-zero offsets catches payloads prefixed by junk bytes. JAR files are detected via ZIP + META-INF/MANIFEST.MF.

Engine 19 — AST JavaScript AST Deobfuscator (Acorn) JavaScript extracted from /JS literals and keyword-bearing compressed streams (with Unicode \uXXXX pre-processing) is parsed into an AST via Acorn (Node.js, ECMAScript 2022) and walked for obfuscation constructs invisible to pattern-matching: eval() chains, String.fromCharCode() arrays (shellcode staging), unescape() decode pipelines, large numeric arrays (heap spray), new Function() dynamic construction, atob()/btoa() base64 decode chains, and property accessor obfuscation — including the split-string concatenation technique (window["ev"+"al"]) used to evade static keyword detection. Performs 6 iterative deobfuscation passes (each pass feeds its output into the next) to unravel multi-layer obfuscation chains. Also detects anti-sandbox patterns (app.platform, screen.width, navigator.*) and executes multi-stage eval chains in a Node.js VM sandbox to decode obfuscated payloads statically hidden from pattern-matching.

Engine 20 — TI Threat Intelligence (URLhaus · MalwareBazaar · ThreatFox — local, no external APIs) Queries four local PostgreSQL databases — no external API calls per scan, no rate limits, sub-millisecond lookups. URLhaus hashes (5M+ SHA-256 malware payload hashes), URLhaus URLs (70K+ malicious URLs, refreshed every 30 min), MalwareBazaar (1M+ confirmed malware samples with family labels), ThreatFox IOCs (176K+ hashes, URLs, and domains with malware families). All four feeds are downloaded in bulk and kept current by cron. SHA-256 hash matches are treated as definitive: they raise a critical indicator, auto-label the scan as malicious, and feed ML retraining. Domain-level matches (URL / C2 host lookups) raise a high indicator but do not auto-label — major trusted hosting platforms (GitHub, Google, Microsoft, Dropbox, etc.) are allowlisted to prevent false positives from PDFs that legitimately link to those domains. Every indicator is mapped to a MITRE ATT&CK technique ID.

Engine 21 — SigForensics PDF Signature Forensics (pyhanko) Deep forensics on PDF digital signatures across six dimensions. ByteRange coverage integrity — per ISO 32000 §12.8.1, offsets are measured from the %PDF- header, not absolute byte 0 (files may carry bytes before %PDF-): o1 must be 0 (coverage starts at %PDF-), both segments within file bounds, inner gap between segments must contain only the /Contents blob (extra bytes are an unsigned injection zone), and o2+l2 must reach at least the %%EOF marker. Shadow document detection vs full-save rewrite detection — when o2+l2 < %%EOF, the engine inspects the unsigned trailing region: execution vectors (/JavaScript, /Launch) indicate a shadow attack (CVE-2019-14980 class); xref/trailer/startxref structure without active content indicates a full-save rewrite — a PDF viewer rebuilt the entire file with new byte offsets, invalidating all existing signatures while leaving the visual signature appearance intact (this is the pattern produced by DocuSign and similar tools when a "Save As" copy is created). Both are flagged high with the distinction explicitly reported. /Contents structural validation — all-zero blobs, sub-32-byte blobs, and missing DER SEQUENCE header (0x30) all indicate a structurally signed but cryptographically empty document. SubFilter deprecation — adbe.pkcs7.sha1 (SHA-1 collision risk), adbe.x509.rsa_sha1 (no certificate chain), and unknown SubFilters (accepted without validation by permissive readers). Weak digest algorithm detection — MD5 and SHA-1 enable chosen-prefix collision attacks applicable to shadow document creation. Post-signature object injection also detected — execution vectors added in incremental updates after signing.

Engine 22 — Phishing Phishing Detection (urgency · brand impersonation · credential harvesting · QR codes) Multi-vector phishing analysis: 30+ urgency/deception phrases; brand impersonation keywords (Microsoft, Apple, PayPal, DocuSign, Adobe, DHL, IRS, and others); AcroForm credential harvesting — SubmitForm action + password-type field detection; QR code extraction and decoding via zbarimg with suspicious domain scoring. High urgency phrase density combined with brand impersonation scores as high-confidence phishing.

Engine 23 — EmbeddedFiles Embedded File Analysis (pdfdetach · magic bytes · VBA macros · ZIP content listing · nested PDF detection) Uses pdfdetach (Poppler) to extract every embedded file attachment. Inspects each for magic bytes: Windows PE (MZ), Linux ELF (\x7fELF), OLE/CFBF (\xd0\xcf), OOXML archives, script files (.bat, .ps1, .vbs, .sh), RAR, 7-Zip. Detects VBA macros in OOXML Office attachments (vbaProject.bin). Non-OOXML ZIP archives have their full contents listed (up to 50 entries) and are scanned for dangerous files (.exe, .dll, .ps1, .vbs, etc.) — flagged Critical when dropper files are present. Nested PDFs (embedded PDF documents) are detected and flagged — nested PDFs can carry independent malicious payloads processed outside outer-document defences. PowerShell .ps1 content analysis: embedded scripts are scanned for high-risk patterns including Invoke-Expression, DownloadString, and -ExecutionPolicy Bypass — all common stager and downloader primitives. Extracts readable strings from executables to surface suspicious API calls or IP addresses. A PDF carrying a PE executable is a confirmed dropper — scored critical.

Engine 24 — Campaign Campaign Attribution (TLSH fuzzy hash) Three similarity fingerprints are computed and compared against confirmed-malicious history in PostgreSQL: TLSH (full-PDF locality-sensitive hash — score <30 = near-identical, <100 = same campaign), pHash (perceptual hash of each page thumbnail via imagehash — hamming distance ≤8 = visual match, detects rebranded or re-formatted copies), and JS fingerprint (MD5 of sorted, normalised JavaScript fragments — catches code-reuse across campaigns). Self-matches are excluded: TLSH distance=0 (identical content, same file previously labeled malicious) is skipped to prevent a file from being flagged solely because it was scanned before. Campaign name is surfaced from MalwareBazaar family labels when a cluster match is found. Falls back to structural fingerprint for small files or when TLSH is unavailable.

Engine 25 — AcroForm AcroForm Field Forensics Deep analysis of interactive form fields across all pages via PyMuPDF widget enumeration. Detects JavaScript on field objects (/A and /AA dictionaries — JS fires on focus, blur, keystroke, validate, or calculate events, invisible during static review but executing in any Acrobat-compatible viewer); hidden NoExport fields (present in submitted data but not displayed to the user); password-type fields (credential harvesting indicators); SubmitForm exfiltration targets — the URL(s) to which all form field data is POSTed; /AA additional-action JS triggers on field objects (a secondary execution vector independent of /OpenAction); and calculation order (/CO) exploitation — adversaries reorder field calculations to chain JS evaluations across fields, enabling multi-step payload staging hidden entirely within form arithmetic. Value / Appearance Stream (V/AP) divergence detection — five sub-checks without rendering: (1) /NeedAppearances true (ISO 32000 §12.7.2) flags that AP streams are stale and will be regenerated from /V by the viewer, meaning the signed bytes cover a different appearance than what is displayed — critical when a digital signature is present; (2) checkbox and radio button /V vs /AS key comparison — a rendering-independent, zero-false-positive check for the displayed state disagreeing with the stored data value; (3) text, listbox, and combobox field AP stream text extraction with font encoding remap — decompresses the /AP /N content stream, resolves any /Encoding /Differences table in the AP font so raw byte codes are translated to their rendered glyphs before comparison (a font mapping byte 0x31 to glyph /nine renders “1” as “9” — invisible to plain byte comparison), then compares the rendered string to /V; listbox multi-select arrays joined (catches “I agree to $1,000” displayed as “I agree to $10”, or a dropdown rendering Option A while /V holds Option B); (4) image-based AP stream detection — when the AP stream invokes an image XObject via Do with no text operators, /V cannot be compared without image recognition; flagged [HIGH] for manual review; (5) blank AP stream — AP exists but draws no content, making the field value invisible to the viewer while it remains in the file bytes and covered by any signature. All enumerated field values (/V for every non-signature widget) are collected into a field-value map and passed to Engine 41 (JS Behavioral Emulation) so that doc.getField() returns real file values during JavaScript execution. Results feed into the Correlation Engine.

Engine 26 — RevHistory Document Revision History Splits the PDF at each %%EOF boundary and extracts per-revision metadata: author, producer, modification date, and new/modified/deleted object counts for each incremental update. Detects author identity changes between revisions, execution vectors injected (/JavaScript, /Launch, /EmbeddedFile, /OpenAction) after the original document was created, and large late-stage object injections in the final revision — the structural signature of automated exploit staging. Injection depth (revision number) is recorded for each vector. Results feed into the Correlation Engine.

Engine 27 — Annotations Annotation Forensics Enumerates every /Annot object across all pages and forensically analyses each action dictionary. Detects dangerous URI schemes (javascript:, data:, file://, vbscript:); JavaScript action triggers on annotation interaction; /Launch actions that spawn arbitrary programs; GoToR remote links that open external files; and SubmitForm actions that exfiltrate form data to external servers. Also inspects the /T (author/title) field of every annotation for XSS payloads — matching the CVE-2025-70401 attack vector in which PDF viewers pass the annotation author string through DOM reconciliation into innerHTML without sanitisation, executing injected scripts on every component re-render. Checks 15 patterns including <script>, onerror=, <svg><foreignObject> (the bypass used in the disclosed PoC), javascript:, and percent/unicode-encoded variants; handles both literal and hex-encoded /T values. Annotation-borne payloads are completely invisible to scanners that only analyse raw bytes or page content streams. Results feed into the Correlation Engine.

Engine 28 — NamedTree Named Tree Analysis Catalogues the full PDF action infrastructure: Named JavaScript Registry (/Names /JavaScript subtree — persistent JS objects callable by name from any action); /AA Additional Actions count (event-driven triggers on page open/close, print, save, field events); /OpenAction type classification (JavaScript, Launch, GoToR, URI, GoTo); /Perms cryptographic permission restrictions; and UR3 usage-rights signatures used to exploit extended viewer features. Deep DocMDP forensics — parses the /P permission level from /TransformParams (1 = no changes, 2 = form fill-ins, 3 = annotations and form fill-ins — the most exploitable); flags missing or out-of-spec /P values (some validators treat missing /P as maximally permissive); checks /SigFlags AppendOnly bit; validates /ByteRange and /Reference array presence; detects incremental updates that violate MDP constraints (JavaScript prohibited under all /P levels); and flags multiple /DocMDP entries (validator confusion attack). FieldMDP per-signature field lock (ISO 32000 §12.8.2.4, "File MDP") — a distinct transform from DocMDP that locks specific named form fields per approval signature rather than applying document-level constraints. Parses /TransformMethod /FieldMDP, /Action (Include = lock only named fields; Exclude = lock all except named), and the /Fields array. Flags: Include with empty /Fields (locks zero fields despite appearing to certify); Exclude with named fields (those fields are explicitly not locked, allowing post-signing modification of attacker-controlled fields); and incremental updates containing /Widget or /AcroForm modifications after a FieldMDP signature — validators differ on whether they check field names against the locked set (Acrobat does; pdf.js may not), making this a cross-viewer exploitation gap. PDF 2.0 Associated Files (/AF) (ISO 32000-2 §7.11.4) — enumerates the document- and page-level /AF arrays and records each filespec's /AFRelationship type (Source, Data, EncryptedPayload, Alternative, Supplement); a modern attachment surface a legacy /EF-only walk misses, with the attached streams analysed by the Embedded File engine. Results feed into the Correlation Engine.

Engine 29 — ContentStream Content Stream Forensics Inspects all decompressed content streams for dangerous PostScript execution operators: exec (dynamic code execution), run (file execution — detected as ) run requiring an explicit filename string argument, avoiding false positives from the English word "run" appearing in page content), token (string-to-code eval), setpagedevice (PostScript-to-system passthrough — bridges to the PostScript interpreter from PDF context), def. Also detects ICC color profile abuse — malformed /ICCBased profiles of anomalous size exploit heap buffer overflows (CVE-2021-21017 class). Flags content bombs: non-image streams exceeding 5 MB that may exhaust parser memory or conceal oversized payloads (image XObjects are excluded — large raster data is expected). Results feed into the Correlation Engine.

Engine 30 — ObjStm Object Stream Analysis PDF 1.5+ allows multiple objects to be compressed together in a single /ObjStm stream. Scanners that only search raw bytes will miss any object inside a compressed container. This engine decompresses every /ObjStm and re-scans the decompressed content for JavaScript, /Launch actions, /EmbeddedFile references, and high-entropy payloads (entropy >7.5 bits) that suggest encrypted content hidden inside compressed object bundles. Complements the Stream Inspector (Engine 3) with object-container-specific forensics. Results feed into the Correlation Engine.

Engine 31 — TokObf PDF Token Obfuscation Detector Decodes all PDF name token hex-escape sequences (/J#61vaScript → /JavaScript) and checks decoded names against a dangerous-keyword list: JavaScript, Launch, OpenAction, EmbeddedFile, AA, URI, SubmitForm, ImportData, GoToR, RichMedia, and others. Counts total hex-encoded name tokens, dangerous-keyword obfuscations, and unique obfuscated forms. Also detects whitespace-split keyword injection — byte sequences like /Java\nscript or /Lau\tch in the raw byte stream that evade simple string scanners; detection requires at least one actual whitespace character inside the keyword (a zero-width match would flag every normal /JavaScript token). Scans outside compressed stream bodies for formfeed byte injection (0x0C) and null bytes in the PDF header region — stream bodies are excluded since FlateDecode binary data naturally contains these bytes; both are classic evasion markers when found in the structural token layer. Excessive hex-encoded name tokens are flagged at a threshold of >500 tokens (benign PDF generators such as ReportLab routinely hex-encode colour names and resource keys, so only counts far beyond normal generator output are reported, at low severity). Every obfuscated dangerous keyword triggers a Critical indicator. Results feed into the Correlation Engine.

Engine 32 — XFA XFA FormCalc Parser Extracts and decompresses the XFA (XML Forms Architecture) data stream — an XML-based form description that supports an embedded scripting language called FormCalc. Detects auto-execute initialise/ready events and openURL / submit calls that silently exfiltrate data or fetch remote resources on form load. Flags exec() calls that pass strings to a FormCalc eval-style function and JavaScript snippets embedded within the XFA XML wrapper — a technique that bypasses AcroForm-specific scanners. Results feed into the Correlation Engine.

Engine 33 — ActGraph PDF Action Dependency Graph Constructs a directed graph of the complete PDF action chain: every /Next action pointer is followed to map the full execution sequence. Detects circular action cycles (infinite loops); deep chains exceeding 10 hops (overflows parser stack depth in hardened viewers); high fan-in nodes — single action objects referenced from many triggers simultaneously (covert shared-execution points); and sleeper nodes — actions present in the graph but unreachable from the nominal entry points, planted for deferred detonation via a separate trigger. The graph is serialised and available for raw forensic inspection. Results feed into the Correlation Engine.

Engine 34 — OCG OCG Layer Cloaking Enumerates every Optional Content Group (/OCG) layer defined in the /OCProperties dictionary. Detects layers configured as never-visible (display-state forced off in all circumstances) — a technique for hiding malicious content from visual review; screen/print divergence (content visible on screen but suppressed in print, or vice versa — used in watermarking and DLP-evasion attacks); and hidden clickable links inside invisible layers, which are fully interactive in Acrobat despite being visually absent. Results feed into the Correlation Engine.

Engine 35 — Unicode Unicode & Invisible Text Forensics Scans for Unicode bidirectional control characters (U+202E RLO, U+200F RLM, U+202D LRO, U+200E LRM, U+2066–U+2069 isolate markers) in text streams and document strings — the class of injection used in CVE-2023-36884 and filename-spoofing attacks. Detects rendering mode 3 (invisible text — used by Trojan-Source-style hidden content and some phishing kits to embed machine-readable payload over visible decoy text) and rendering mode 7 (clip mode — advanced invisibility). Invisible text detection works by directly parsing all content streams for the PDF Tr (text rendering mode) operator — PyMuPDF's span flags field encodes font flags (bold/italic/serif) rather than the rendering mode and cannot be used for this purpose. Flags homograph domains using Cyrillic/Greek/Armenian lookalike characters (confusable with ASCII). Results feed into the Correlation Engine.

Engine 36 — Trailer Trailer Chain Forensics Walks the raw trailer chain via /Prev byte-offset pointers without relying on any PDF library's repair logic. For each trailer, records the /ID array pair, the /Root reference, and the /Prev offset, building a chronological chain of all incremental updates. Detects Document ID mutation across updates (both entries of the /ID array should be stable after creation — mutation is a structural anomaly); /Root reference swaps between trailer versions (the Shadow Document Attack — a signed PDF whose signed version and visible version have different catalog roots); and malformed /Prev pointers that would confuse incremental-update-aware parsers. Results feed into the Correlation Engine.

Engine 37 — Codec Codec Exploit Parameter Validation Audits every compressed stream's filter parameters for known exploit patterns. CCITTFaxDecode: validates Columns and Rows against the stream length — out-of-bounds values trigger heap overflows in multiple decoders. JBIG2Decode: checks for a /JBIG2Globals reference (required for CVE-2009-0658 / Pwn2Own 2009 Adobe Reader exploit). DCTDecode: validates that the declared stream length is plausible for the claimed image dimensions. Multi-filter chains: flags streams using 3+ stacked decoders (a classic technique to slow forensic analysis and trigger parser differential vulnerabilities — each decoder in the chain may parse the preceding output differently). Results feed into the Correlation Engine.

Engine 38 — Entropy Physical Entropy Topology Computes per-256-byte sliding-window Shannon entropy across the raw file bytes, producing a high-resolution entropy map with structural awareness. Detects post-EOF high-entropy regions — encrypted payloads appended after the last %%EOF marker (invisible to all structure-respecting parsers); entropy cliffs — sudden sharp transitions between low-entropy and high-entropy regions that indicate injection boundaries; header entropy anomalies — unexpected compression or encryption in the first 256 bytes of the file; and under-entropy in compressed streams — near-zero entropy (<1.5 bits) in a compressed region that should be random (consistent with a decompression bomb). Image XObjects (/Subtype /Image) are excluded from the under-entropy check — solid-colour or uniform-fill images produce near-zero entropy in their compressed stream by design and are not suspicious. Uses the PDF's object offset table to partition the entropy map into structural regions (header, objects, streams, trailer, post-EOF). Results feed into the Correlation Engine.

Engine 39 — Stego Image Steganography & Tracking Beacons Extracts all embedded images (JPEG, PNG, BMP) via PyMuPDF and applies statistical steganalysis. LSB chi-square analysis: computes a chi-square statistic on the least-significant bits of each colour channel — a score above threshold indicates non-random LSB distribution consistent with LSB steganography (SteghideJPEG, OpenStego, etc.). Tracking beacons: flags 1×1 or sub-10px images that are HTTP/HTTPS URIs (invisible tracker pixels that phone home when the PDF is opened in a connected viewer). JPEG EXIF anomalies: parses EXIF metadata from all extracted JPEG images and flags maker notes, GPS tags, and unusual tag combinations that may fingerprint the author or embed covert data in EXIF fields. Results feed into the Correlation Engine.

Engine 40 — PDFA PDF/A Compliance Fraud Detector Checks whether a PDF claims PDF/A conformance (pdfaid:conformance and pdfaid:part XMP metadata) and, if so, validates that the document actually conforms to the declared standard. PDF/A forbids JavaScript, embedded executables, non-embedded fonts, encryption, and external references — all of which are attack vectors. Detecting a PDF that claims PDF/A but contains active content is a reliable indicator of a document engineered to bypass DLP systems and email gateways that whitelist PDF/A. Also checks for conformance level mismatch (e.g. claiming PDF/A-1a but using features only in PDF/A-2). Results feed into the Correlation Engine.

Engine 41 — JSEmul JavaScript Behavioral Emulation Executes extracted JavaScript in a sandboxed Node.js vm context with a full stub of the Acrobat JavaScript API — app, this, event, util, console, Doc, Field, and others. Intercepts and records all calls to dangerous methods: app.launchURL(), this.submitForm(), app.openDoc(), app.execMenuItem(), util.printd(). doc.getField(name) returns the actual /V value collected by Engine 25 for every non-signature field — not a hardcoded empty string — so conditional exploitation chains such as if (doc.getField('status').value == 'approved') { app.launchURL(c2) } are correctly evaluated; SUBMIT_FORM events capture the real field content rather than an empty string. doc.numFields reflects the true field count. Detects obfuscated eval() and string-concatenation assembly of dangerous payloads at runtime. Records the full call log: function name, argument list, and execution timestamp. Six-pass multi-eval resolution unwraps nested deobfuscation chains. Results feed into the Correlation Engine.

Engine 42 — Font Font CharString Emulator Decrypts and emulates Type 1 font CharString programs using the eexec and charstring decryption algorithms. The Type 1 CharString format is a stack-based bytecode interpreter with dangerous operators: seac (seac/accented-character — calls two other glyphs by name, enabling recursive execution that overflows the call stack in vulnerable renderers, used in exploits targeting Adobe Reader ≤9); excessive stack depth (CharString programs that push ≥200 values onto the stack, triggering stack exhaustion in strict interpreters); and abnormal subroutine depth (recursion deeper than 10 levels in the subr/globalsubr call chain). Flags obfuscated font binaries with unusually high entropy in the eexec-encrypted region. Results feed into the Correlation Engine.

Engine 43 — XRef XRef Integrity Graph Builds a complete cross-reference graph by parsing both traditional XRef tables and compressed XRef streams (/XRef objects, PDF 1.5+). Cross-references every declared object against actual byte positions in the file. Detects phantom objects — entries in the XRef table that point to byte offsets with no valid object header; orphan sleepers — objects present at valid byte offsets but absent from every XRef table (reachable only through raw parsing, not through standard readers); free-entry exploitation — free-list entries (f type) whose generation numbers deviate from standard increments (a technique for hiding objects that become reachable after a use-after-free in the parser); and object length fraud — stream objects whose declared /Length diverges from the actual byte count between stream markers. Reachability BFS starts from doc.pdf_catalog() — the authoritative PDF Catalog xref returned by the parser — rather than assuming OID 1 is always the root (which produces large false-positive orphan lists in non-standard PDFs). Orphaned Action objects are classified by subtype: execution subtypes (JavaScript, Launch, GoToR, ImportData, SubmitForm, GoToE) are flagged as dangerous; navigational subtypes (URI, GoTo, Named, Sound, Movie) are treated as benign and not flagged. Results feed into the Correlation Engine.

Engine 47 — Correlation Correlation Engine Cross-references all 47 prior engine findings and adds weighted bonus points (35–110) for dangerous combinations. DocMDP compound patterns: DocMDP bypass + JavaScript (+100, critical — certified document with execution vector); DocMDP bypass + weak algorithm (+105, critical — collision-assisted forgery); DocMDP P=3 + JavaScript (+70 — annotation-triggered execution under certification); multiple DocMDP transforms + signature (+65 — validator confusion spoofing); ByteRange not-from-zero + MDP (+90 — header left unsigned under certification); all-zero or sub-32-byte /Contents (+95 — structurally signed but cryptographically empty). FieldMDP compound patterns: FieldMDP Include+empty-fields + active content (+80 — locking-nothing certification with live payload); FieldMDP Exclude bypass + incremental form modification (+75 — selectively unlocked field modified post-signing); FieldMDP + JavaScript in incremental update (+85 — execution vector under per-field certification). V/AP divergence compound patterns: /NeedAppearances true + digital signature (+90, critical — signed bytes cover stale AP, viewer renders regenerated content not in signed bytes); V/AP mismatch + digital signature (+85, critical — displayed state and certified value structurally differ within the same signed byte range); /NeedAppearances + active form content (JS or SubmitForm) (+65 — displayed values may differ from what is executed or exfiltrated); /NeedAppearances + DocMDP constraint violation (+95, critical — uncertified modification made visible via viewer-regenerated appearance). Signature invalidation patterns: full-save rewrite + active content (+70 — cryptographically unsigned document with live payload); ByteRange gap + JavaScript (+80 — shadow attack confirmed). Polyglot patterns: file-level polyglot + active content (+75 — JPEG/ZIP header preceding %PDF- with execution vector). Classic: JavaScript + /OpenAction + high entropy = +100 bonus; JavaScript + /Launch = +75 bonus. Cross-engine: YARA heap-spray + JS, PeePDF vuln + JS, qpdf structural damage + active content, ExifTool exploit-kit fingerprint + execution. Dynamic sandbox: live network beacon + JS, runtime shellcode + heap spray, dynamic shell spawn + trigger. Form patterns: AcroForm JS field + SubmitForm exfiltration target, /AA keystroke trigger + credential field, calc-order chain + JS payload. Other patterns: token obfuscation + JS keyword, annotation JS trigger + auto-exec, post-signature revision injection + execution vector, object stream concealment + active content, named JS registry + OpenAction, XFA exec + auto-fire, action cycle + JS node, OCG hidden link + JS, trailer /Root swap + execution, codec OOB + active content, post-EOF entropy + execution, steganography + exfiltration target, PDF/A claim fraud + active content, JS emulation live call + obfuscated eval, font seac OOB + JS, XRef phantom object + orphan sleeper. TI + sandbox + YARA triple-confirmation. TI domain match + active content: a domain matching threat intelligence combined with JavaScript raises a high-confidence indicator. 60+ compound patterns. Multi-engine JS confirmation bonus amplified when 3+ engines confirm JS. Final score capped at 999.

Engine 48 — AI Synthesis 🤖 AI Forensic Report After all 47 forensic analysis engines complete, a Qwen 2.5 1.5B Instruct Q4_K_M LLM synthesises the structured scan output into a human-readable forensic report. The model runs on dedicated private hardware (Ryzen 5 3550H · 12 GB RAM · llama.cpp CPU-only) over an encrypted WireGuard tunnel — no OpenAI, no Anthropic, no Google, no third-party AI call of any kind. Your document data never leaves pqpdf.com infrastructure. Input to the model is a compact JSON object (~250–350 tokens) containing: risk score, critical/high indicator signal names, MITRE technique IDs, sandbox hit flag, structural stats, and threat intelligence match status — never raw binary PDF bytes. Output is a structured JSON object with seven fields: threat_verdict (MALICIOUS / SUSPICIOUS / LIKELY_CLEAN / CLEAN), confidence (HIGH / MEDIUM / LOW), executive_summary (one-sentence plain-English verdict), key_findings (array of {signal, severity, mitre_id} objects), observed_techniques (array of MITRE ATT&CK {id, name} pairs drawn only from IDs present in the scan), recommended_actions (array of strings), and false_positive_note (null or string). All enum fields (verdict, confidence, severity) are validated and normalised server-side — a fuzzy-match fallback corrects any model drift. Inference configuration: temperature 0.1 (near-deterministic), max 220 output tokens, json_object response format. Typical latency: ~15–25 s (CPU-only inference at ~13 tokens/s, no GPU required). Results appear in the dedicated 🤖 AI Forensic Report tab and as a compact verdict widget on the Summary tab.

Risk scoring

Each indicator contributes base points multiplied by min(occurrence_count, 3) — capped at 3 occurrences per finding type to prevent artificial inflation from a single pattern appearing many times. The Correlation Engine adds weighted bonus points on top for dangerous combinations.

Risk level	Base points per occurrence
Critical	50
High	25
Medium	10
Low	3

Clean

Low

1–29

Suspicious

30–149

High Risk

150–349

Dangerous

350–999

The headline score is the Threat Score (exploit + integrity-tampering). Findings are classified onto four forensic axes — exploit, tampering, deception (content/semantic-determinism: V/AP divergence, glyph remapping, OCR poisoning), and neutral structural/informational — and a confirmed deception finding grades the verdict on its own axis even at threat score zero. Deception and structural scores are reported separately and never inflate the malware verdict.

Forensic Console

During the scan a live terminal-style event log streams timestamped events to the browser — upload confirmation, per-engine START/DONE lines, and the final risk verdict. Section dividers separate Upload, Engines, and Results phases. The console can be collapsed or cleared without affecting the scan.

Result banner and risk levels

When all 47 engines complete, a full-width banner appears at the top of the Summary tab showing the risk level, a driver chip (Malware / Integrity / Content Integrity), an axis-appropriate explanation, and a Threat Score meter (0–999) with Threat / Deception / Structural sub-scores beneath it:

✅ Clean — threat 0, green 🟡 Low — threat 1–29, yellow 🟠 Suspicious — threat 30–149, orange ⚠️ High Risk — threat 150–349, red 🔴 Dangerous — threat 350–999, dark red

The verdict is driven by the Threat Score (exploit + integrity-tampering); a confirmed content-deception finding (V/AP divergence, glyph remapping, OCR poisoning) grades the verdict on its own axis even at threat 0.

Statistics grid — 15 fields

Below the banner a 15-cell grid shows key structural stats at a glance. Three cells are clickable and jump directly to the relevant tab. Cells turn red when values exceed safe thresholds:

Pages · Objects · File Size · PDF Version · Encrypted Embedded Files (clickable → Embedded tab, red if > 0) · Form Fields · Annotations · Links %%EOF Markers (red if > 2) · XRef Tables (red if > 3) · Total Streams (clickable → Streams tab) High-Entropy Streams (red if > 0) · URLs Found (clickable → URLs tab, red if > 0) · Threats Found (genuine exploit/tampering/deception findings; clickable → Threats tab, red if > 0) · Observations (neutral structural notes, not threats)

Scan report — 24 tabs

Results are rendered across 24 tabs. Each tab is independently navigable. Dynamic badges on several tabs update live (threat count, ML %, MITRE technique count, phishing signal score, embedded file count).

📊 Summary — risk banner + score meter, 15-cell stats grid, engines-completed pill strip (✓ {name} for all 47 that ran), ML probability bar + SHAP feature bars + false-positive/confirm-threat feedback buttons ⚠️ Threats — all indicators grouped Critical → High → Medium → Low, each card shows risk badge, engine label, count pill, key, description, byte-context snippet 📈 Score — score gauge (0–999), per-engine contribution bars, full per-indicator table (engine / indicator / risk / base pts / count / total pts) ⚙️ Engines — two-panel browser: sidebar (47 engines, status dot, findings pill), right panel shows full indicator cards, engine-specific data (stream table ③, URL list ⑤, SHAP bars ⑯, differential table ⑰, certificate chain ㉑, correlation bonuses + Per-Engine Indicator Counts + Final Risk Assessment ㉕) 🌐 URLs — all unique HTTP/HTTPS URLs from raw bytes and decompressed streams, per-URL copy button 📦 Streams — top 40 streams: XRef# · type · decompressed size · Shannon entropy bar (red if > 7.2) · status (OK / High Entropy / Patterns Found) · matched patterns. Suspicious rows amber, high-entropy rows orange. 🧠 ML — malicious probability bar, SHAP bar chart (red=malicious / green=benign per feature), feature importance bars, false-positive / confirm-threat feedback buttons (trains next model update) 🔬 Sandbox — 7-cell metrics grid (Behavioral Score · Network Attempts · Exec Attempts · Process Forks · FS Escape · Anon Exec Memory · Timeout, cells red at critical thresholds), renderer list, threat indicators, matched YARA rules 🌍 Threat Intel — confirmed-malware banner (if SHA-256 matches), per-database results (URLhaus · MalwareBazaar · ThreatFox · FeodoTracker · OpenPhish), domain-level TI matches, campaign attribution (TLSH · pHash · JS fingerprint), similar malicious samples with similarity % 🎯 MITRE — ATT&CK technique IDs mapped from indicators, grouped by tactic, indicator rows per technique 🧬 Parsing — 6 parser cards (MuPDF · Poppler · Ghostscript · qpdf · pdfminer · pdf.js), per-dimension comparison (pages · objects · JS · encryption · AcroForm · embedded files · linearised · OpenAction), mismatch severity badges 🧬 Polyglot — Engine ⑱ magic-byte hits (type + risk badge) + Engine ⑲ JS AST deobfuscation findings (eval · fromCharCode · unescape · large numeric arrays · new Function) 🎣 Phishing — signal score meter, urgency phrase tags, brand keyword tags, credential-harvesting detection, QR code decodes, OCR-extracted text from images 📎 Embedded — per-file cards: magic-byte type · size · VBA macro detection · ZIP content listing (50 entries, dangerous-extension flags) · PE import table · suspicious strings · nested PDF detection ✍️ Signature — signature count; DocMDP /P permission level (1/2/3) with bypass flag; FieldMDP per-field lock (Action · Fields · bypass detection); ByteRange coverage integrity (%PDF-header start check, bounds check, inner-gap analysis, %%EOF coverage check, shadow-attack vs full-save-rewrite distinction); /Contents structural validation (all-zero placeholder, DER header); SubFilter deprecation flags; post-signing revision diff; per-certificate cards (subject · issuer · dates · algorithm · self-signed · expired) 📜 History — per-revision timeline (engine 26): %%EOF count, per-revision author/producer/date, new/modified/deleted object counts per update, execution vectors injected post-creation, large late-stage injection alerts 📌 Annotations — per-page annotation cards (engine 27): type, action dictionary, dangerous URI scheme flags, JS/Launch/GoToR/SubmitForm action detection, risk badge per annotation 📋 XFA — XFA FormCalc findings (engine 32): auto-execute events, openURL/submit calls, exec() calls, embedded JavaScript in XFA XML 🗺️ Action Graph — PDF action dependency graph (engine 33): full action chain visualisation, cycle detection, deep chain alerts, fan-in maximisation nodes, sleeper/orphan action nodes 🧪 Deep Forensics — findings from engines 34–43: OCG layer cloaking (engine 34) · Unicode/invisible text (35) · trailer chain forensics (36) · codec exploit parameters (37) · physical entropy topology with post-EOF detection (38) · image steganography & tracking beacons (39) · PDF/A compliance fraud (40) · JS behavioral emulation call log (41) · font CharString emulator findings (42) · XRef integrity graph anomalies (43) 🤖 AI Forensic Report — Qwen 2.5 1.5B Instruct (self-hosted, no third-party AI) synthesises all 47 engine outputs into: threat verdict (MALICIOUS / SUSPICIOUS / LIKELY_CLEAN / CLEAN) · confidence rating · executive summary · key findings table with MITRE technique IDs and severity badges · observed MITRE ATT&CK technique grid · recommended actions · false-positive note. Structured JSON output, ~15–25 s CPU inference, near-deterministic (temperature 0.1). Compact AI verdict widget also shown inline on the Summary tab. 🏷️ Metadata — document metadata KV table · structure info KV table · full 47-engine structure dump 📋 Raw JSON — complete scan result JSON with syntax highlighting (strings · keys · booleans · nulls · numbers) and one-click copy 🔍 Raw Forensics — JS source code from streams · JS AST deobfuscation contexts · decoded stream content (3 KB preview) · every indicator context snippet · complete sorted KV dump from all 47 engines

Sanitize panel

After every scan (including clean results) a 9-mode sanitize panel appears below the result. Selecting a method sends the session token to the server, produces a new file, and reveals a Download Sanitized PDF button and a Scan the Sanitized File button to re-run the full 47-engine scan on the cleaned output. The original file is never modified.

ML data policy

The ML engine stores a 38-dimensional feature vector per scan (structural statistics: byte counts, entropy values, object type flags, parser discrepancy counts, sandbox syscall anomalies). No file content, no filename, no hash, no IP address, and no PII is stored. Feature vectors are used to retrain the IsolationForest, RandomForest, and LightGBM models every 30 minutes. Model drift detection reports if models have not been retrained in >30 days. Retained indefinitely — not subject to GDPR Article 17 as no personal data is involved. Full details on the Security page.

Standard mode — AES-256-CBC

Password is transmitted over TLS, used to encrypt via Ghostscript with AES-256-CBC, and never stored. Granular permission flags are configurable: print, copy, modify, annotate, form fill, accessibility, and assembly.

PQC mode — client-side quantum-safe encryption

In PQC mode the encryption happens in your browser before the file is uploaded. Key generation uses @noble/post-quantum — a local JavaScript library. The server receives only the encrypted .pqcpdf bundle. Your plaintext file never crosses the network unencrypted.

Why this matters: AES-256 is vulnerable to Shor's algorithm on a sufficiently powerful quantum computer. NIST standardised post-quantum key encapsulation mechanisms in 2024. PQ PDF is among the very few free online PDF tools that implement them.

Available algorithms (31 total, 29 quantum-resistant)

Organised by category. NIST = NIST-standardised primitive.

Classical

X25519 / Ed25519 / AES-256-GCM

Core PQ

Hybrid — Classical + Post-Quantum Post-Quantum — NIST Standardised ML-KEM-1024 — Pure KEM

Multi-Layer

Multi-Algorithm — Triple Layer Multi-KEM — Classical + PQ KEM Multi-KEM Triple — 3× KEM Redundancy Quad-Layer — 4-Layer Redundancy Lattice + Code — Mathematical Diversity PQ3-Stack — Forward Secrecy

HQC — Code-Based (NIST 2025)

HQC-128 — 128-bit security HQC-192 — 192-bit security HQC-256 — 256-bit security

FN-DSA (Falcon) — Lattice Signatures

FN-DSA 512 Compact — 666B sigs FN-DSA 1024 High-Security — 1.3KB sigs FN-DSA Floating-Point Hardened FN-DSA Dual Signature Redundancy FN-DSA Transition Stack — Hybrid TLS FN-DSA + ZK Stack — Privacy-First

Max Secure

PQ Lightweight — Embedded / IoT Pure PQ — High Assurance Hybrid Transition — NIST 5 + Classical Stateless — Hash-Based / Firmware Crypto-Agile Stack — Runtime Switching PQC + ZK Stack — Zero-Knowledge

Experimental

Quantum-Inspired Lattice Fusion Post-ZK Homomorphic Stack Quantum-Resistant Consensus Entropy-Orchestrated PQ Stack AI-Synthesized Crypto-Agile

Primitives used: Key encapsulation — ML-KEM-1024 (FIPS 203), HQC (NIST 2025 backup KEM), X25519. Signatures — ML-DSA-87 (FIPS 204), FN-DSA/Falcon (FIPS 206), SLH-DSA/SPHINCS+ (FIPS 205), Ed25519. Symmetric — AES-256-GCM, ChaCha20-Poly1305, Ascon-128a (NIST LWC). All key generation runs in your browser via @noble/post-quantum.

Signature modes

✏️

Draw Freehand on a canvas with full touch support. The drawn signature is composited onto the PDF page at your chosen position and size.

🔤

Type Your name is rendered as a signature image via ImageMagick using DejaVu-Sans-Oblique script font.

📷

Upload Use your own PNG or JPEG as the signature image. Transparency is preserved.

🔐

PAdES / Crypto Only An invisible cryptographic signature — no image drawn on the page. Verifiable in Adobe Reader's Signatures panel. Compliant with PAdES-B (ETSI EN 319 102-1) via pyhanko 0.34. The signature is written as an incremental update — the original content stream is never modified.

Visual placement controls (Draw / Type / Upload modes)

First / last / all / custom page selector. Two placement modes:

Snap grid — 3×3 position grid (left/center/right × top/middle/bottom) for one-click alignment.
Free placement — drag the signature to any position on the page. Coordinates are transmitted as fractional page offsets (pos_x_pct, pos_y_pct, range 0.0–1.0) and applied with sub-point precision regardless of page dimensions.

A size slider (40–300 pt). Live placement preview composites the signature image onto a rendered page 1 canvas in real time as position and size are adjusted.

Date stamp — an optional date string (up to 30 characters) can be rendered in small text directly below the signature image. Accepts any alphanumeric format, separators, and common date punctuation.

Certificate options

All modes embed a cryptographic digital signature. Certificate source is either an auto-generated ephemeral RSA-2048 self-signed certificate (created per-request, never stored) or a user-supplied .p12 / .pfx file. Signer name (required), email, reason, and location metadata are embedded in the CMS/PKCS#7 signature block.

Note: /tools/pades.php 301-redirects to /tools/sign.php?tab=pades — existing links and bookmarks continue to work.

Workflow

The initiator uploads a PDF, adds up to 10 signers (name + optional email), and chooses a signing order. The server creates an ephemeral workspace (/tmp/esign_{32hex}/, mode 0700) and generates a unique 256-bit secure token per signer. Each token produces a signing URL that can be shared directly — no account is required on either side. The initiator's tracking page polls status every 5 seconds and provides a download link once all signers have completed.

Signing order

➡

Sequential (chain) Each signer's completed PDF becomes the next signer's input. The output accumulates all signatures in order. Signer 2 cannot sign until Signer 1 has completed.

⇆

Parallel (all-at-once) All signers receive their link simultaneously and sign independently. The server merges signatures when all parties complete.

Signature placement

Each signer sees a page-1 thumbnail and can place their signature using the same three input modes as the solo Sign PDF tool (draw canvas, typed name, uploaded image). Placement supports the full 3×3 snap grid and free drag-and-drop positioning via fractional page coordinates (pos_x_pct, pos_y_pct). An optional date stamp can be rendered below the signature image.

Cryptographic enforcement — `require_crypto`

When the document creator enables require_crypto at creation time, signers who attempt to submit without enabling the PAdES-B cryptographic layer receive an error response: "A PAdES-B cryptographic signature is required for this document." This lets initiators mandate that every signature in the workflow is cryptographically verifiable in Adobe Reader's Signatures panel — not just a visual stamp. The certificate source is the signer's own .p12/.pfx or an auto-generated ephemeral RSA-2048 self-signed certificate created per request and never stored.

Workflow management (from the tracking page)

Add signer — append a new signer to an in-progress workflow; a fresh token and signing URL are generated immediately.
Remove signer — remove a signer who has not yet signed; their token is invalidated.
Cancel request — terminate the entire workflow; all tokens are invalidated and the workspace is scheduled for cleanup.
Return URL / copy link — the initiator can copy a resume link to return to the tracking page from any device.

Storage & retention

All state is stored in the ephemeral temp directory — no database writes, no cloud storage. The workspace has a 24-hour TTL; it is purged on expiry and at create-time cleanup. The final signed PDF is never stored beyond the TTL window. Zero retention applies to the e-sign workflow exactly as it does to all other tools.

Watermark renders directly to the PDF content stream via PyMuPDF — not as a separate annotation layer. The text is permanently embedded; it cannot be removed by deleting an annotation. A live canvas preview composites your watermark text over page 1 in real time as you adjust any setting.

Placement positions (8)

↗

Diagonal (full page) — defaultThe watermark spans the full page at 45° from bottom-left to top-right. The most common choice — visually unambiguous that the document is marked.

◎

CenterHorizontal text centred on the page. Prominent without the angle.

📍

Top-Left / Top-Right / Bottom-Left / Bottom-RightCorner placements. Useful for company name, document classification, or "DRAFT" in a corner without obscuring the main content area.

↕

Header / FooterFull-width centred text at the top or bottom of the page. Suitable for document titles, classification banners, or page footers.

Style controls

💧

Opacity — 5% to 100% (default 30%)Lower values produce a subtle ghost watermark that does not obscure content. Higher values produce an opaque stamp. The live preview renders the exact opacity as PyMuPDF will apply it.

🔤

Font size — 12 pt to 96 pt (default 44 pt)The preview updates immediately so you can confirm the text fits without truncation at the selected size.

📝

Font style — Bold, Regular, Italic, Bold ItalicBold is the default for legibility at lower opacities. Italic suits signature-style watermarks.

🎨

Colour — hex picker (default #cccccc)Any hex colour. Common choices: #cccccc neutral grey, #ff0000 red for CONFIDENTIAL, #0000ff blue for DRAFT.

Page targeting

Apply to All pages, Odd pages (recto-only in duplex documents), Even pages, or a custom range (comma-separated, e.g. 1-3, 5, 8-10).

Redaction is not the same as drawing a black box over text. A black rectangle drawn on top of text leaves the original text in the PDF file — it can be selected, copied, and searched by anyone who removes or moves the rectangle. Genuine redaction removes the underlying content from the PDF's data structures. This tool uses PyMuPDF's native redaction API, which permanently erases content at the structural level.

How it works: page.add_redact_annot() marks regions, then page.apply_redactions() removes the content from the page's content streams — text, images, and vector graphics within the region are erased, not covered.

Mode 1 — Text pattern redaction

Enter search patterns and the tool finds every matching text occurrence across the document and permanently removes it.

📋

Multi-pattern listAdd multiple patterns in one job — names, ID numbers, phone numbers, email addresses. All occurrences of all patterns are redacted in a single pass.

🔤

Case-sensitive matchingToggle on to distinguish "CONFIDENTIAL" from "confidential". Off by default — matches any case variant.

🔍

Whole-word matchingWhen enabled, "John" will not match "Johnson". Prevents partial-word false positives in names and technical terms.

Mode 2 — Canvas region redaction

Draw rectangular redaction areas directly on a rendered preview of each PDF page.

📷

Click-and-drag to draw regionsEach region is drawn as a rectangle on the canvas. Multiple regions per page. Coordinates are captured in PDF display-space points and sent to the server for precise structural erasure.

📄

Multi-page navigationNavigate through all pages and mark regions on each. A per-page region list shows how many areas are marked.

🗑️

Clear pageRemove all regions from the current page without affecting others.

Fill colour and page targeting

Black fill (standard) produces the visible redaction box. White fill is invisible on white backgrounds — useful when removing content without leaving a visible mark, such as stripping header metadata. Page targeting: All pages, Odd, Even, or a custom range.

11 tools for editing, filling, comparing, reading, and inspecting PDF documents. From a full visual editor to a font-embedding checker to table extraction.

✏️

Edit PDF

Full canvas editor: 24+ annotation & prepress tools, AcroForm builder, linked text reflow, vector object editing, and deep content editing. All edits flattened server-side.

Full editor guide → Try it →

📝

Fill PDF Form

Detect and fill all interactive AcroForm fields — text inputs, checkboxes, radio buttons, dropdowns, and list boxes. Values are written server-side via PyMuPDF. Optional flatten-after-fill bakes values into static content.

Try it →

🔍

Compare PDFs

Visual pixel-level diff of two PDFs. Configurable DPI (72–300) and sensitivity. Side-by-side previews render immediately when files are selected. Output is a highlighted diff PDF with change regions marked. Includes 🤖 AI Change Analysis — Qwen 2.5 1.5B classifies change significance (MAJOR/MODERATE/MINOR/NONE), change type, plain-English change_summary, details array (per-change breakdown), and recommendation.

Try it →

📄

Extract Text

Export all text to .txt with optional layout preservation, text encoding selection, and custom page range. Includes 🤖 AI Document Analysis — Qwen 2.5 1.5B classifies document type (13 categories) with classification_confidence, language, key entities (people, organisations, locations, dates, amounts), topics, and reading level.

Try it →

ℹ️

PDF Info

Full metadata inspection: title, author, subject, keywords, creator, producer, page count, dimensions, PDF version, encryption status, form type, tagged flag, fast web view, permission flags, and creation/modification dates. Shows a canvas preview of page 1 alongside the data.

Try it →

🔎

OCR PDF

Optical character recognition for scanned and image-based PDFs via Tesseract 5 LSTM. Three output formats, DPI control, four page segmentation modes, up to 100 pages per job. Returns OCR confidence score, word count, character count, and a live text preview tab. See the deep dive.

Try it →

🔖

Outline / Bookmarks

Load a PDF's existing table of contents. Add, rename, reorder, delete, and set the level (1–4) of each entry. Each row has a page-number input validated against the actual page count. Reads and writes via PyMuPDF get_toc() / set_toc().

Try it →

♿

Accessibility Checker

WCAG 2.1 / PDF/UA compliance audit via PyMuPDF. 8 checks: document title (2.4.2), language metadata (3.1.1), tagged structure (PDF/UA §7.1), image alt-text (1.1.1), reading order (1.3.2), font embedding (PDF/UA §7.21), bookmark navigation (2.4.5), and page-size consistency. Returns pass/fail with WCAG criterion references and overall A–F grade.

Try it →

🔤

Font Inspector

Lists every font across every page: name, type (Type1, TrueType, CIDFont, etc.), encoding, embedded status, subset flag (presence of + prefix in BaseFont name), and the pages each font appears on. Non-embedded fonts flagged in red — critical for print and PDF/UA compliance.

Try it →

🎨

Colour Inspector

Comprehensive colour audit across all PDF content — raster images, vector paths, shapes, and text. Detects DeviceRGB, DeviceCMYK, DeviceGray, Spot, ICC, Lab, and more. Flags overprint, transparency, and Total Ink Coverage over 300%. Ghostscript inkcov gives structured per-page CMYK percentages.

Try it →

📊

Tables to JSON

Extracts all tables from a PDF using pdfplumber with lines_strict strategy (explicit table borders from PDF path operators), falling back to text-position heuristics. First row becomes column headers. Output: {table_count, page_count, tables:[{id, page, rows, cols, headers, data}]}.

Try it →

The PDF Editor has grown into its own dedicated section. It now covers 24+ tools including annotation, form building, deep content text editing, image object editing, linked text reflow frames, vector object editing, and prepress marks. See the full PDF Editor guide →

Engine

Tesseract 5 with the LSTM neural network engine (OEM mode 1). The LSTM engine significantly outperforms the older pattern-matching engine on low-quality scans, handwriting, and non-standard fonts.

Output formats

📄

Plain text (.txt) All recognised text extracted as a flat text file.

🔍

Searchable PDF Original page images preserved with an invisible text layer overlaid. The document becomes copyable and searchable in any PDF viewer.

📦

Both (ZIP) A ZIP containing the .txt and the searchable PDF together.

Controls

DPI: 150 / 200 / 300. Higher DPI improves accuracy on dense text but increases processing time. Page segmentation modes (PSM): auto-detect, single column, single block, and sparse text — important for forms and tables where the default auto-detect makes wrong assumptions. Custom page range: up to 100 pages per job.

What comes back

Along with the output file, the response includes an OCR confidence score (per-word Tesseract TSV confidence averaged across all pages), word count, and character count. A live text preview tab in the browser lets you read extracted text without downloading the file.

Compare PDFs performs a page-by-page pixel-level diff between two documents. Because it operates on rendered pixels rather than the text layer, it works equally on text-based PDFs and scanned documents — any visual change is detected, including font substitutions, layout shifts, and image replacements that text-diff tools would miss.

Resolution

Both documents are rendered at the selected DPI before comparison. Higher DPI catches smaller visual differences but increases processing time and output file size.

⚡

100 DPI — FastSuitable for detecting large-scale changes: paragraph additions, section moves, image replacements.

⚖️

150 DPI — Balanced (default)Catches most meaningful changes including single-word edits, font changes, and minor layout shifts.

🔍

200 DPI — DetailedDetects subtle rendering differences, anti-aliasing changes, and minor typographic adjustments. Use when documents are visually similar and small changes are critical.

Sensitivity threshold

Controls the minimum per-pixel difference required to flag a change. Lower values catch more (including compression artefacts); higher values ignore minor differences.

📊

Low (threshold: 5)Detects nearly every pixel difference. Use when comparing documents known to be visually identical and you want to confirm that with precision.

⚖️

Medium (threshold: 15, default)Ignores minor rendering differences and JPEG artefacts. Flags meaningful content changes. The right choice for most document review workflows.

🔎

High (threshold: 30)Only flags substantial changes. Useful when comparing a scanned document against a digital version where scanner noise would otherwise produce false positives across the whole page.

Change map colour coding

🟥

Red — only in Document A (original)Content that has been removed or replaced in the revised version.

🟩

Green — only in Document B (revised)Content that has been added or changed in the revised version.

□

Gray — unchangedIdentical content in both documents, rendered at reduced opacity so changed regions stand out.

Preview and output

Side-by-side canvas previews of both documents render immediately when each file is selected — no upload required for the preview. Output is a single diff PDF with change regions overlaid on every compared page pair.

On upload, the tool reads the PDF's AcroForm dictionary and generates a matching input form in the browser — one input per field, typed to match the field's widget type. Fill the form in the browser, then submit: PyMuPDF writes the values server-side and returns the filled PDF.

Supported AcroForm field types

📝 Text field — single and multi-line ✅ Checkbox — true / false toggle 🔘 Radio button group 📋 Combo box / drop-down 📋 List box — single or multi-select

Flatten-after-fill

When the flatten option is enabled, field values are baked into the page content stream after writing — the output PDF has no interactive form layer. The filled values appear as static text. This is the correct format for archiving, printing, or sharing a completed form — interactive fields in a shared PDF can otherwise be re-edited by any recipient.

No-fields detection

If the uploaded PDF has no AcroForm dictionary, the tool shows a "No interactive fields found" notice immediately rather than presenting an empty form. For PDFs that need form fields added, use the Edit PDF tool's form builder.

Font Inspector

Enumerates every font used across every page. For each font the report shows:

Font name Type — Type1, TrueType, CIDFont, OpenType, etc. Encoding Embedded — Yes / No (non-embedded flagged red) Subset — + prefix in BaseFont name (e.g. ABCDEF+Helvetica) Pages — list of pages where this font appears

Why non-embedded fonts fail print: When a font is not embedded, the viewer or RIP must substitute it. Substitution changes glyph widths, reflows text, and breaks any layout that depends on exact positioning. PDF/X and PDF/UA compliance both require full font embedding. Non-embedded fonts are flagged in red.

Subset embedding: A + prefix means only the glyphs actually used in the document are included — reducing file size while remaining fully compliant with PDF/X and PDF/UA standards.

Colour Inspector

Audits colour space usage for print-readiness using five detection layers — covering every type of PDF colour content:

🖼️

Raster images (PyMuPDF extract_image()) Checks every embedded image's colour space via component count: 1 = DeviceGray, 3 = DeviceRGB, 4 = DeviceCMYK. RGB images are flagged — commercial presses expect CMYK, and RGB requires conversion during RIP processing, which can produce unexpected colour shifts.

📐

Vector drawings (PyMuPDF get_drawings()) PyMuPDF preserves the original colour space in drawing colour tuples: 1-component = Gray, 3-component = RGB, 4-component = CMYK. Catches all filled and stroked paths, shapes, and borders.

📝

Content-stream operator analysis Tokenises the raw PDF content stream to detect colour operators: rg/RG (DeviceRGB), k/K (DeviceCMYK), g/G (DeviceGray), and cs/CS for named colour spaces. This layer catches text colours and inline images that neither image extraction nor drawing analysis would detect.

🎨

Resource dictionary traversal Follows the page → Resources → ColorSpace/ExtGState reference chain in the raw PDF object tree to detect Separation (spot), DeviceN, ICCBased, Lab, CalRGB, and CalGray colour spaces, plus overprint (/OP true) and transparency (/ca, /CA, /BM) flags.

🖨️

Ghostscript ink coverage + Total Ink Coverage (TIC) Runs Ghostscript's inkcov device to compute per-page C/M/Y/K percentages. Calculates Total Ink Coverage (TIC = C+M+Y+K) per page and flags any page over 300% — a common press limit beyond which wet ink can cause trapping, drying, and registration problems.

The overall verdict — Print-ready (CMYK only, no RGB) or Requires conversion — is shown at the top of the report alongside the per-page breakdown and structured ink coverage table.

PDFs do not store tables as data structures — they store text characters at absolute positions and path objects that may or may not form visible borders. pdfplumber reconstructs table structure from these primitives using two strategies in sequence.

Detection strategies

□

Strategy 1 — lines_strict (explicit borders) Detects tables by finding horizontal and vertical line segments drawn by PDF path operators (l, re commands in the content stream). If the PDF was generated from software that draws explicit table borders — Word, Excel, LibreOffice, InDesign — this strategy reliably reconstructs cell boundaries. Applied first; if no tables are found, the fallback runs.

📝

Strategy 2 — Text-position heuristics (fallback) For borderless tables (where structure is implied by text alignment rather than drawn lines), pdfplumber infers columns and rows from the statistical distribution of text bounding boxes. Works on tables from PDF export pipelines that omit explicit borders.

JSON output schema

Output is a single .json file. The first row of each detected table is treated as column headers; subsequent rows become an array of objects keyed by those headers. Multiple tables per page are each represented as separate entries.

table_count — total number of tables found page_count — total pages in the document tables[].id — sequential table number tables[].page — page the table appears on tables[].rows / .cols — dimensions tables[].headers — array of column header strings tables[].data — array of row objects keyed by header

Limitations

Scanned PDFs (image-based, no text layer) are not supported — use OCR first, then extract tables. Tables spanning multiple pages are detected as separate tables. Merged cells are flattened to the flat row/column structure.

A full-featured canvas editor for PDF annotation, content editing, form building, linked text reflow, vector object editing, and professional prepress preparation. All changes are applied and permanently flattened server-side via PyMuPDF.

✏️ Open the Editor

Annotation Tools

Draw, mark up, and insert content directly onto PDF pages. All annotations are stored as structured JSON client-side and applied server-side on export.

🔤

Text & Callout

Click-to-place text labels and draggable multi-line text boxes. Callout tool adds a speech-bubble anchor point. Font family, size, colour, bold, italic, and alignment all configurable.

🖊️

Freehand & Eraser

Smooth ink path with configurable stroke width, colour, opacity, and dash style. Eraser paints opaque white strokes to visually cover content.

✒️

Lines & Arrows

Straight lines and directed arrows with arrowheads. Stroke width, colour, and dash style (solid / dashed / dotted / dash·dot).

▭

Shapes

Rectangle and Ellipse with independent stroke and fill colours. Polygon (click to set vertices, double-click to close) and Polyline (click to add segments).

🟡

Markup

Highlight (semi-transparent), Whiteout/Cover (solid white), Strikethrough, and Underline. All drag-to-define and fully repositionable.

📌

Sticky Notes & Hyperlinks

Sticky notes with configurable comment text and colour. Hyperlinks — drag to define a clickable area, supports URL or internal page-number links.

📏

Measure & Calibrate

Draw a line between two points to measure distance. Calibrate Measure sets a known reference distance and unit (px / mm / cm / in / pt) to scale all measurements.

■

Redact

Draw solid-black burn boxes. Right-click → Burn as Redaction permanently removes the content in that region from the PDF content stream on apply — not just covered.

🖼️

Insert Image

File picker for PNG, JPEG, GIF, WebP. Click and drag to position on the canvas. Added as a floating image annotation on top of the page content.

✍️

Signature (4 modes)

Draw (HiDPI canvas, mouse & touch) · Type (script font preview) · Upload image (PNG/JPEG) · PAdES invisible cryptographic signature (RSA-2048/SHA-256, pyhanko). Optional self-signed or uploaded .p12 certificate.

📷

Stamps & QR Codes

12 preset stamps (DRAFT, APPROVED, REJECTED, CONFIDENTIAL, TOP SECRET, VOID, COPY, FINAL, REVISED, REVIEW, NOT APPROVED, PAID) plus custom text stamps. QR codes in Small/Medium/Large sizes.

AcroForm Builder

Draw interactive PDF form fields directly onto the canvas. Fields are written as native AcroForm widgets — compatible with Adobe Reader, Acrobat, and standards-compliant viewers.

📝

7 Field Types

Text · CheckBox · RadioButton · ListBox · ComboBox · Signature field · PushButton. Each has field name, tooltip, Required/Read-only flags, font size, and colour.

💻

JavaScript Actions

6 action slots per field: Validate · Calculate · Format · Keystroke · Focus · Blur. 13 built-in templates (currency format, sum fields, date format, email validation, submit form, etc.). Scripts embedded as native PDF AA dictionaries.

Deep Content Editing

Edit what is already embedded in the PDF — not just annotations floating on top.

Document → Edit Text Content… opens a side panel. Unlike annotation overlays, this edits the actual text in the PDF content stream — block by block, span by span.

Workflow

Load text blocksget_text('dict') returns every span with its bbox, font, size, and colour.

Pixel-precise overlayA contenteditable div is placed over the canvas for each block. Hover reveals the border; focus makes the block opaque for reading.

Edit and applyadd_redact_annot blanks the original bbox, apply_redactions burns it, then insert_textbox re-inserts the new text with the original font, size, and colour.

Auto-shrinkIf new text overflows the block bbox, the backend reduces font size in steps (up to 5 attempts, minimum 6 pt) until the text fits.

Document → Edit Images… lists every image embedded in the PDF content stream with thumbnail previews and canvas handle overlays. Select any image to act on it.

📤

ReplaceUpload a new PNG, JPEG, GIF, WebP, or TIFF. Placed at the same rect via page.replace_image(xref), with a redact+insert fallback.

🗑️

DeleteWhite-fills the image rect with add_redact_annot + apply_redactions(images=PDF_REDACT_IMAGE_REMOVE).

⇹️

Move / ResizeDrag the canvas handle or enter X/Y/W/H values. Extracts original bytes, redacts old rect, re-inserts at new rect.

Advanced Editing

Three professional capabilities added for complex document workflows.

Prepress → Draw Reflow Frame lets you draw linked text frames that overflow automatically from frame to frame, across the same page or across different pages — identical to text threading in InDesign.

How it works

Draw framesEach frame gets a numbered blue badge showing its position in the thread chain.

Link framesClick Link to Another Frame in the side panel, then click a second frame (on any page) to join the chain. Frames renumber automatically.

Enter textAll text lives on the root frame (order 0). The server distributes it across the chain using insert_textbox() with binary word-bisect overflow detection — fitting the maximum word count into each frame before passing overflow to the next.

Per-frame style: font family, size, colour, bold, italic, and alignment.

Document → Vector Object Editor exposes the native PDF vector paths drawn directly in the page content stream — not annotation overlays — for repositioning, recolouring, and deletion.

Workflow

🔍

Load vectorspage.get_drawings() returns every path with colour, fill, width, and bounding box. Listed in a side panel with colour swatches.

⇹️

MoveDrag or enter X/Y offset. Server covers the original with a white rect, then redraws every sub-path (lines, beziers, rects) at the offset coordinates.

🎨

RecolourPick a new stroke colour. Server covers the original and redraws with the new colour — fill is preserved.

🗑️

DeleteThe path's rect is covered with a white draw_rect via page.new_shape().

Prepress → Prepress Settings opens a modal for professional print preparation. A live canvas preview shows marks without modifying the document; they are applied permanently on export.

Bleed & Marks

✂️

Bleed expansionConfigurable bleed margin in mm. page.set_mediabox() grows all four edges by bleedMm × 72 / 25.4 pt on apply.

✜️

Crop / Trim marksFour pairs of corner lines at the trim-box boundary, extending into the bleed gutter. Length and gap both configurable.

◎

Registration marksCircles with crosshairs in each bleed gutter for CMYK plate alignment on press.

🎨

CMYK colour barFive filled swatches (Cyan · Magenta · Yellow · Black · 50% Grey) drawn in the bottom bleed strip.

PDF Page Boxes

Prepress settings can write all four standard PDF page-box metadata entries on apply:

TrimBox — page.set_trimbox() — original content area BleedBox — page.set_bleedbox() — expanded MediaBox including bleed CropBox — page.set_cropbox() — matches TrimBox (printer default view) ArtBox — page.set_artbox() — matches TrimBox

Page & Document Operations

📄

Page Organisation

Add blank page at end · Insert before/after · Duplicate · Delete · Rotate 90° CW/CCW. Drag-to-reorder pages in the thumbnail sidebar. Right-click context menu per thumbnail.

🔖

Bookmarks & Layout

Bookmark editor builds a navigable table of contents via set_toc(). Page numbers and headers/footers configurable with position, font, size, and colour — applied to all pages server-side.

📤

Export Options

PDF · PDF/A · PDF/X · Word · Plain Text · HTML · Markdown · Excel · PowerPoint · PNG ZIP. Each export applies current annotations first. Options: compress, flatten, grayscale, password protection.

🛡️

Security & Inspection

Unlock password-protected PDFs. Active Content Inspector scans for JavaScript and embedded actions with selective removal. Forensics scan for structural analysis. Font Inspector and Accessibility Checker available in-editor.

↩️

Session & History

Per-page unlimited undo/redo (Ctrl+Z / Ctrl+Y). Auto-save to IndexedDB at configurable intervals (15 s – 5 min). 30-minute inactivity session timer with 5-minute warning. Draft persistence across page reloads.

⌨️

Keyboard Shortcuts

V (select) · T (text) · D (draw) · L (line) · A (arrow) · R (rect) · E (ellipse) · F (toggle fill) · H (highlight) · N (sticky) · ←/→ (page nav) · Ctrl+Z/Y (undo/redo) · Del (delete selected) · Ctrl+F (find) · Ctrl+G (jump to page).

How Edits Are Applied

All annotation data — positions, colours, text, field definitions, reflow chains, vector edits, and prepress settings — is collected client-side as structured JSON and POSTed to the edit-apply endpoint alongside the original PDF.

The server runs a three-pass pipeline in Python via PyMuPDF:

Pre-pass 1 — vector edits: page.get_drawings() is consulted; moved/recoloured/deleted paths are covered with white and redrawn.
Pre-pass 2 — text reflow: thread_frame chains are grouped, sorted, and text distributed across frames with insert_textbox().
Main pass — all other annotations: Text, shapes, images, signatures, stamps, QR codes, form fields, redactions, page numbers, headers/footers, and page operations.
Post-pass — prepress marks: MediaBox expanded for bleed, crop marks/registration circles/CMYK swatches drawn, page-box metadata written.

The output PDF has all annotations permanently baked into the page content stream. The sandbox runs all Python inside a 4-layer isolation: prlimit → AppArmor → Linux namespaces → pqpdf-sandbox tmpfs.

The Workflow Builder chains multiple PDF operations into a single automated pipeline. Build once, run on any PDF.

How it works

Add steps from the step picker, configure per-step parameters, and drag to reorder. Upload one or more PDFs and run the full pipeline in one click. Each step processes the output of the previous step.

Supported pipeline steps

🔄

RotateRotate all / odd / even / custom range by 90°/180°/270°.

🗜️

CompressAny of the five quality presets.

💧

WatermarkText watermark with all placement, opacity, and style parameters.

🛡️

ProtectAES-256 password protection with permission flags.

🔓

UnlockRemove password protection.

🎨

GrayscaleConvert to grayscale or black-and-white.

📋

FlattenBake all form fields and annotations into page content.

🔧

RepairReconstruct corrupted or malformed PDFs.

📃

Extract PagesKeep only the selected pages from the document.

🗑️

Delete PagesRemove selected pages from the document.

🔀

Reorder PagesRearrange pages into a custom order.

🗂️

Convert to PDF/AArchive-compliant conversion (PDF/A-1b, 2b, or 3b).

✍️

SignThree modes: typed visual only; typed visual + digital certificate; digital-only with auto self-signed RSA-2048 cert.

⬛

RedactPermanent text-pattern removal with case-sensitive option and black or white fill.

✂️

Split every N pages Terminal step — outputs a ZIP of equal-sized PDF chunks. Useful for batch-scanned documents or splitting large reports into individual sections.

Saving and composing workflows

💾

Save named workflows to localStorage Saved locally in your browser — no server, no account, no sync.

📥

Load vs. Append Load replaces the current workflow. + Append joins a saved workflow onto the end of the current one — letting you compose complex pipelines from saved building blocks.

📤

Export / import as JSON Share workflows with colleagues or version-control them alongside your documents.

⚙️

Workflow Builder

Chain 15 operations, save named pipelines to localStorage, export/import as JSON, and run on multiple PDFs in a single job.

Try it →

Everything for building on top of PQ PDF — a complete REST API and a live system health & benchmark dashboard tracking all processing engines in real time.

REST API

◈

83 operations — one endpoint

Every tool on this site is exposed as a named operation via POST https://api.pqpdf.com/v1/{operation}. Merge, split, compress, convert, redact, OCR, sign, scan — the full suite, scriptable.

⚡

API-key auth + IP whitelisting

Keys carry per-key rate limits (hourly & daily), optional CIDR IP whitelists, and a human-readable label. SHA-256 hashed at rest — the plaintext is shown once at creation and never stored.

🔄

Stateful multi-step sessions

Form fill, PDF editing, and async security scanning use a session token workflow: initialise → operate → apply/download. Tokens are scoped to your X-Session-Id header and expire automatically.

⚶

Zero retention — same guarantee as the web UI

API-submitted files are processed in an isolated temp directory and deleted immediately after the response is streamed. No file is written to a persistent store. Usage metadata (timestamp, operation, file size) is logged for 30 days; file content is never logged.

🛡️

TLS 1.3, same transport security as the web UI

All API traffic runs through pqcrypta-proxy — TLS 1.3 only, ECDHE key exchange, ChaCha20-Poly1305 / AES-256-GCM cipher suites, HSTS, OCSP stapling, and certificate transparency logs.

Operation categories

📎

Core manipulationMerge, split, compress, rotate, extract/delete/reorder pages, N-up imposition, auto-crop & deskew, flatten, grayscale, repair.

🔄

ConvertPDF ↔ Word, Excel, PowerPoint, HTML, Markdown, Images, PDF/A, PDF/X. Office → PDF, Images → PDF, HTML → PDF.

🛡️

Security & privacyForensic scanner (async, 47 engines + AI report), sanitise, protect, unlock, redact, watermark, sign/PAdES, e-signature.

✏️

Content & annotationFull visual editor (stateful), form fill (stateful), compare, extract text, OCR, PDF info, bookmarks, tables → JSON, accessibility checker, font inspector, colour inspector.

⚙️

Key managementCreate, list, revoke API keys; add/remove IP whitelist entries — all programmatically.

Quick example

# Compress a PDF via the REST API (HTTP/3 required)
curl -s --http3-only -X POST https://api.pqpdf.com/v1/compress \
  -H "X-API-Key: pqpdf_your_key_here" \
  -H "X-Session-Id: $(uuidgen)" \
  -F "file=@large.pdf" \
  -F "quality=balanced" \
  --output compressed.pdf

◈

REST API Reference

Full endpoint reference — every operation, every parameter, auth setup, code examples in curl and Python, error codes, and rate-limit details.

View API docs →

🏢

On-Premise / Enterprise

Deploy the full API stack inside your own infrastructure — no rate limits, no file size caps, no external calls. PHI-safe, air-gapped deployments supported.

Learn more →

System Health & Benchmarks

A live health dashboard tracks every PDF processing tool on an automated 15-minute cron cycle. Results are stored in PostgreSQL; the dashboard streams the latest session over a JSON partial endpoint.

●

Per-tool status monitoring

Each of the 45+ processing tools is exercised with a real test file every 15 minutes. Status is classified as healthy, degraded, timeout, or error. The dashboard shows the current session's result alongside HTTP status codes and error messages for any failures.

⏱

7-day rolling benchmarks

Processing latency is tracked per tool — avg, P50, and P95 response times — aggregated into daily buckets. The benchmarks tab shows 7-day trends alongside uptime percentage, giving a picture of both speed and reliability over time.

⚡

Live nav widget

The hamburger navigation menu fetches /health/?partial=health on open and displays a colour-coded dot — green (≥ 95% healthy), amber, or red — with a live tool count. No polling; the fetch fires once per menu open.

●

Health Dashboard

Live status of all PDF processing tools — per-tool health, HTTP status, error messages, and session timestamp. Updates every 15 minutes.

View dashboard →

⏱

Benchmarks

7-day rolling performance data — avg, P50, and P95 latency per tool with uptime percentage. Accessible from the Benchmarks tab on the dashboard.

View benchmarks →

How every request is handled — from upload to download to deletion.

Request lifecycle

1 Upload — File arrives over HTTPS/TLS 1.3+ with HSTS preload. Magic-byte check: first 4 bytes must be %PDF for PDF operations. Size checked: 50 MB per file, 200 MB total.

2 Isolation — A private temp directory is created: sys_get_temp_dir() . '/pdftool_' . bin2hex(random_bytes(12)) with permissions 0700. No other process can access it.

3 Processing — The appropriate engine runs inside the temp directory, wrapped in a four-layer process isolation sandbox (see below). Every shell command receives paths via escapeshellarg(). A 120-second timeout wraps every external process. At most 4 heavy jobs run simultaneously.

4 Stream — readfile($path) begins streaming the output to your browser over the existing HTTP connection.

5 Delete — cleanup() is called immediately after readfile() returns. The temp directory and all its contents are deleted while your download is still in flight. There is no retention window.

Security controls

🛡️

Strict Content Security Policy

Every page generates two fresh random nonces per request. script-src allows only 'nonce-{ext}' and 'nonce-{inline}'. No unsafe-inline, no unsafe-eval. style-src 'self' — no inline styles anywhere in any HTML, including this page.

⏱️

Rate Limiting & Concurrency Control

Two independent rate-limiting layers run on every request. Session-based: 10 operations per 5-minute sliding window per browser session. IP-based: 30 operations per 5-minute window per source IP — generous enough for shared NAT networks but still bounds individual abusers. Both are backed by Redis with filesystem fallback so limits are always enforced. Polling and keepalive operations (edit-ping, pdf-scan-poll, esign-status) are exempt from both limits to avoid blocking live progress UIs. Returns HTTP 429 when either limit is exceeded.

A third layer enforces server-wide concurrency: at most 4 heavy operations execute simultaneously. When that limit is reached the server returns HTTP 503 ("Server is busy — please try again shortly"). Lightweight operations and status polls are exempt. All limit breaches are recorded as structured security events.

📁

File Validation

Two-step validation before any processing: (1) magic-byte check — first 4 bytes must be %PDF; (2) secondary structural parse via pdfinfo — the file must be parseable as a valid PDF cross-reference table. Both checks must pass; a file that starts with %PDF but contains no valid PDF structure is rejected. Repeated failures within a session are counted — three consecutive failures trigger a security event log entry. MIME type validated against allowlist. No user-controlled string reaches the shell without escapeshellarg(). Page range inputs are validated against /^\d+$/ before any integer conversion.

🛡️

Four-Layer Process Isolation Sandbox

Every invocation of a heavy external tool — Ghostscript, Python, LibreOffice, Playwright, ImageMagick — passes through a mandatory four-layer sandbox chain. The architecture is sandbox-by-default: new tools are sandboxed automatically; an explicit opt-out is required to exempt a tool (only four read-only helpers are exempt: pdfinfo, qpdf, pdfseparate, pdftotext).

Layer 1 — prlimit: kernel-enforced resource caps applied before any process image loads: 1.5 GB virtual memory (RLIMIT_AS), 512 MB max file write (RLIMIT_FSIZE), 256 processes (RLIMIT_NPROC), 512 open file descriptors (RLIMIT_NOFILE).

Layer 2 — AppArmor aa-exec: transitions the process into the pqpdf-unshare mandatory-access-control profile. Required on Ubuntu 24.04+ where user namespace creation is gated behind the AppArmor userns permission. The profile grants only what unshare and the sandbox script need; all other filesystem writes are denied.

Layer 3 — unshare (Linux namespaces): creates isolated kernel namespaces. --user --map-root-user — the process believes it is root but holds no real capabilities. --net — private network stack with no interfaces; the tool cannot connect to the internet or any internal service; any connect() syscall fails. --pid --fork — isolated PID tree; child processes cannot escape to the host. --ipc — private shared memory and message queues. --mount — private mount namespace so bind-mounts are invisible to the host.

Layer 4 — pqpdf-sandbox script: runs inside the new namespaces, mounts a 512 MB tmpfs as scratch space so all I/O happens in-memory and vanishes when the namespace exits, bind-mounts the job directory into the scratch tmpfs, applies a CPU time limit via ulimit -t (enforced after the PID namespace fork, avoiding a kernel sigprocmask conflict), then execs the real tool binary. No shell remains after exec.

SANDBOX_MIN_LEVEL = 'full' in production — if any layer is unavailable the operation fails rather than running unsandboxed. Degraded execution is always logged as a security event.

🔐

Transport Security

HTTP/3 over QUIC v1 (RFC 9000) — primary protocol. TLS 1.3 only; TLS 1.0, 1.1, and 1.2 disabled — no downgrade possible. Key exchange uses X25519MLKEM768 hybrid post-quantum cryptography (NIST FIPS 203). Cipher suite: TLS_AES_256_GCM_SHA384. Certificate: Let's Encrypt ECDSA + SHA-384, CT-logged. HSTS preload eligible (max-age=31536000; includeSubDomains; preload). Full transport details →

📋

Security Event Logging

Security-relevant events are written as structured NDJSON to /var/log/pqpdf/security.ndjson — one event per line, ingestible by Elasticsearch, Loki, Datadog, or jq. Events logged: invalid HTTP method, unknown operation, session rate limit breach, IP rate limit breach, concurrency limit reached, file size exceeded, total upload size exceeded, repeated PDF validation failures (threshold: 3 consecutive), malformed page range input. Every entry carries a hashed session token (first 12 hex chars of sha256(session_id()) — stable but cannot be used to hijack the session), IP address, operation name, and sanitised user-agent string. Falls back to error_log() if the log file is not writable so no event is silently dropped. A live Security Dashboard at /security-dashboard.php presents aggregated telemetry — event timeline, activity heatmap, top source IPs, and a filterable event log table with CSV/JSON export. The dashboard is token-gated via the PQPDF_DASHBOARD_TOKEN environment variable.

🚫

Spam & Bot Protection

The contact form layers four independent defences: (1) AI behavioural verification — client-side analysis of interaction patterns before the submit button is enabled; (2) honeypot fields — two hidden inputs invisible to humans are sent with every submission; any non-empty value causes the server to reject the request via SpamException; (3) server-side spam pattern matching — pharmaceutical keywords, excessive capitalisation, disposable email domains, and common bot phrases; (4) IP-based rate limit — maximum 5 submissions per hour per IP, enforced in PostgreSQL before any email is dispatched.

Engine stack

All engines run locally. No file data is ever sent to a third-party service.

Engine	Used by	External calls?
Ghostscript	Compress, watermark, rotate, protect, unlock, flatten, grayscale, repair, PDF/X	None
Poppler	Merge, split, extract text, to-images, PDF info	None
qpdf	Protect/unlock, structural analysis (scanner)	None
LibreOffice	All Office ↔ PDF conversions (Word, Excel, PowerPoint, ODT, ODS, ODP)	None
Playwright / Chromium	HTML → PDF (URL and file modes, JavaScript rendering, lazy-load, web fonts)	None (sandboxed)
ImageMagick	Images → PDF, typed signature rendering	None
Tesseract 5 LSTM	OCR PDF	None
PyMuPDF 1.27	Edit, fill, nup, deskew, outline, a11y, font/colour inspect, PDF info, scanner engines 1–9	None
pymupdf4llm	PDF → Markdown	None
python-pptx	PDF → PowerPoint	None
pdfplumber	Tables to JSON	None
pyhanko 0.34	PAdES / Sign PDF (incremental CMS/PKCS#7)	None
endesive	Visual + crypto sign modes	None
ExifTool 12	Scanner engine 10	None
YARA 4.5	Scanner engine 12 (24 custom rules + external .yar support)	None
ClamAV 1.4+	Scanner engine 15 (700k+ signatures)	Signature updates only (clamav.net)
PeePDF 0.4	Scanner engine 13	None
prlimit + AppArmor aa-exec + unshare + pqpdf-sandbox	Four-layer process isolation sandbox — wraps every heavy tool invocation; also used explicitly by Scanner engine 14 for the dynamic behavioral sandbox with strace syscall tracing	None (network namespace isolates all tools)
pikepdf	Scanner engine 13 (supplemental PDF parser — JS Names tree, EmbeddedFiles, per-page AA)	None
scikit-learn + LightGBM	Scanner engine 16 (IsolationForest + RandomForest + LightGBM ensemble, model drift detection)	None
Acorn (Node.js)	Scanner engine 19 (JS AST deobfuscation, ECMAScript 2022, 6 iterative deobfuscation passes)	None
imagehash	Scanner engines 24, 39 (pHash perceptual similarity for campaign attribution · LSB chi-square steganalysis and tracking beacon detection)	None
Node.js vm	Scanner engine 41 (JS behavioral emulation — sandboxed Acrobat API stub, runtime call interception: app.launchURL, this.submitForm, app.openDoc)	None
python-tlsh	Scanner engine 24 (TLSH locality-sensitive hash for campaign clustering)	None
@noble/post-quantum	Protect PDF — PQC mode (runs in browser)	None

Full technical details — temp-dir lifecycle, TLS configuration, CSP nonce implementation, ML data policy, and vulnerability reporting contact: legal/security.php

PQ PDF runs behind PQCrypta Proxy — a Rust-based QUIC proxy (built on quinn) that provides HTTP/3, WebTransport, and post-quantum hybrid TLS at the network layer. Every connection uses TLS 1.3 with X25519MLKEM768 hybrid key exchange — the same algorithm now deployed by Chrome, Firefox, and Cloudflare. TLS 1.0, 1.1, and 1.2 are disabled entirely.

HTTP/3 Primary protocol

QUIC v1 RFC 9000

TLS 1.3 Only — 1.0/1.1/1.2 off

X25519MLKEM768 PQ hybrid key exchange

48 ms TLS handshake

3 ms TTFB

0.00% Packet loss

Post-quantum hybrid key exchange

X25519MLKEM768 combines a classical algorithm with a post-quantum algorithm. Both must be broken simultaneously for the key exchange to be compromised.

🧬

ML-KEM-768 NIST FIPS 203

The post-quantum half. ML-KEM-768 (formerly Kyber-768) is a lattice-based key encapsulation mechanism standardised by NIST in August 2024. It provides 192-bit post-quantum security — the key cannot be recovered by either a classical computer or a cryptographically-relevant quantum computer.

🔑

X25519 Classical ECDH

The classical half. X25519 (Curve25519 Diffie-Hellman) is the fastest and most-audited elliptic-curve key exchange in production use. Constant-time arithmetic eliminates timing side-channels. Secure against all known classical attacks.

🧷

Hybrid binding

The final session key is derived from the output of both components. An adversary must break both X25519 and ML-KEM-768 to recover the key. If either algorithm is later found broken, the connection is still protected by the other — forward security is maintained at two independent levels.

⌛

"Harvest now, decrypt later" protection

Nation-state actors are known to archive encrypted traffic today, intending to decrypt it once a sufficiently powerful quantum computer exists. X25519MLKEM768 renders those archives useless — even a future quantum computer cannot reconstruct the session key from captured ciphertext.

🧪

Is your site post-quantum ready?

Check whether your server negotiates X25519MLKEM768 or another PQC hybrid key exchange — and whether TLS 1.2 is still reachable.

Test PQC readiness →

Protocol stack

pqpdf.com — clients negotiate the highest protocol they support via ALPN and Alt-Svc; HTTP/2 is available as a fallback. api.pqpdf.com — HTTP/3 only. The proxy does not advertise h2 in its TCP ALPN for this host; HTTP/2 clients are downgraded to HTTP/1.1 and rejected with 426 Upgrade Required. All connections enforce TLS 1.3 — there is no downgrade path to TLS 1.2.

Protocol	Transport	TLS	ALPN	Multiplexing	HOL blocking
HTTP/3 Primary	QUIC v1 (UDP)	TLS 1.3 (QUIC built-in)	`h3`	✔ Stream-level	✔ Eliminated
WebTransport	QUIC v1 (UDP)	TLS 1.3 (QUIC built-in)	`h3`	✔ Bidirectional streams + datagrams	✔ Eliminated
HTTP/2 pqpdf.com only	TCP	TLS 1.3	`h2`	✔ Stream-level	⚠️ TCP-level HOL remains
HTTP/1.1 Rejected (426)	TCP	TLS 1.3	`http/1.1`	✘ None	⚠️ Request + TCP HOL

⚡

Does your site support HTTP/3, QUIC, and WebTransport?

The PQCrypta scanner checks HTTP/3 negotiation, QUIC version, WebTransport availability, Alt-Svc headers, and 0-RTT configuration.

Scan HTTP/3 & QUIC →

QUIC v1 — security hardening

RFC 9000 defines a suite of anti-abuse mechanisms. All are enabled.

🚫

0-RTT disabled Secure

Zero-RTT session resumption is intentionally off. While 0-RTT reduces latency for repeat connections, it opens a replay window where an adversary can re-submit captured early data. Disabling it eliminates this class of attack. All connections use full 1-RTT handshakes.

📝

Address validation & Retry tokens

Before allocating connection state, the proxy issues a RETRY packet with a server-generated token. The client must echo this token in its next Initial, proving it controls the claimed source address. Prevents IP spoofing and connection-state exhaustion attacks.

📡

Anti-amplification limit RFC 9000 §8.1

The server sends no more than 3× the bytes received from an unvalidated client address. This prevents the QUIC handshake from being weaponised as a UDP amplification vector — a significant concern for protocols that can send large responses to small initial packets.

🔄

Stateless reset

If the proxy loses connection state (e.g. after a restart), it sends a STATELESS_RESET to the client, cleanly terminating the connection rather than leaving the client retransmitting into a broken session indefinitely.

🗺️

Connection migration

QUIC connections are identified by a Connection ID, not the IP/port 4-tuple. A client that switches from Wi-Fi to mobile data mid-upload continues the same logical connection without starting over. Connection migration is enabled with address re-validation on path change.

🧪

GREASE RFC 8701

The proxy sends Generate Random Extensions And Sustain Extensibility values in TLS extension slots. This prevents middleboxes and TLS stacks from hardcoding assumptions about which extension IDs are valid — keeping the protocol extensible as new standards are adopted.

⚡

CUBIC congestion control RFC 9002

QUIC-native loss recovery with CUBIC CC. Initial congestion window: 12,000 bytes. Path MTU Discovery (PMTUD) enabled — the proxy probes for the optimal UDP payload size (measured MTU 1,452 bytes, UDP MTU 1,200 bytes, datagram payload 1,162 bytes).

❌

ECH — not yet supported

Encrypted Client Hello (RFC 9289) would encrypt the SNI field, hiding the target hostname from network observers. It is not yet supported by PQCrypta Proxy. The TLS handshake itself is fully encrypted; only the SNI in the Client Hello remains visible to on-path observers.

WebTransport

Available on port 443 (path /) and port 4433. WebTransport runs over HTTP/3 and exposes QUIC streams and unreliable datagrams directly to browser code.

📦

Multiplexed streams

Unlike WebSockets (which layer over HTTP/1.1 TCP), WebTransport streams are independent QUIC streams with no head-of-line blocking. Multiple large file operations can transfer in parallel — one slow stream does not stall others.

⚡

Unreliable datagrams

Beyond streams, WebTransport supports fire-and-forget datagrams (max 1,162 bytes each). Ideal for low-latency signals — live progress events, cancellation, real-time preview requests — where retransmitting stale data would add unnecessary latency.

🔐

Inherits QUIC security

Every WebTransport session shares the underlying QUIC connection's TLS 1.3 encryption, X25519MLKEM768 key exchange, address validation, and anti-amplification hardening. No separate security layer to configure.

TLS 1.3 — cipher & certificate

Parameter	Value	Notes
Cipher suite	`TLS_AES_256_GCM_SHA384`	AES-256-GCM authenticated encryption; 256-bit key; SHA-384 transcript hash
Key exchange	`X25519MLKEM768`	Hybrid: X25519 classical ECDH + ML-KEM-768 post-quantum (NIST FIPS 203)
Signature	`ecdsa-with-SHA384`	Certificate signed with ECDSA + SHA-384; P-384 curve
Certificate issuer	Let's Encrypt E8	Free public CA; certificate transparency logged; 90-day auto-renewal
ALPN	`h3`	HTTP/3 negotiated via TLS ALPN extension
TLS versions	TLS 1.3 only	TLS 1.0, 1.1, and 1.2 explicitly disabled — no downgrade possible
GREASE	Enabled	Random extension values injected to prevent middlebox ossification (RFC 8701)
Handshake RTTs	1-RTT only	0-RTT disabled; full handshake on every new connection — no replay window
ECH	Not yet supported	SNI remains visible to on-path observers; ECH (RFC 9289) planned

Connection & performance metrics

Measured by PQCrypta scanner against pqpdf.com, March 2026.

Metric	Value	Notes
TLS handshake	48 ms	Full 1-RTT QUIC Initial + Handshake packet exchange
TTFB	3 ms	Time to first byte from proxy after handshake
RTT	0 ms	Sub-millisecond measured round-trip time
Packet loss	0.00%	0 of 16 packets lost during scan
Congestion control	CUBIC	RFC 9002 QUIC loss recovery + CUBIC CC algorithm
Initial CWND	12,000 bytes	~8 QUIC packets before ACK feedback required
Max stream data	12,000 bytes	Initial per-stream flow control window
MTU / UDP MTU	1,452 / 1,200 bytes	PMTUD enabled; max datagram payload 1,162 bytes
Idle timeout	20 s	Server-initiated close after 20 s of inactivity
Proxy processing	2.64 ms	PQCProxy internal duration (`Server-Timing: proxy;dur=2.64`)

HTTP/3 response headers

Headers sent on every HTTP/3 response that convey transport metadata, observability, and client hint negotiation.

Header	Value / Purpose
`Alt-Svc`	`h3=":443"; ma=86400, h3=":4434"; ma=86400` — advertises HTTP/3 on ports 443 and 4434; browsers cache for 24 hours
`Server-Timing`	`proxy;dur=2.64;desc="PQCProxy Processing", quic;desc="QUIC v1"`
`Priority`	`u=3` — RFC 9218 Extensible Prioritisation Scheme; urgency 3 (default)
`Accept-CH`	DPR · Viewport-Width · Width · ECT · RTT · Downlink · Sec-CH-UA-Platform · Sec-CH-UA-Mobile — client hints for adaptive responses
`NEL`	Network Error Logging configured — browsers report transport failures to the Report-To endpoint
`Report-To`	Reporting API endpoint for NEL, CSP violation, and COOP violation reports
103 Early Hints	Supported — server can push `Link: preload` hints before the full response is ready

Implementation

PQCrypta Proxy v0.2.1 — purpose-built Rust proxy using the quinn library (the leading Rust QUIC implementation, also used by Cloudflare). Rated A++ / HTTP/3 Ultimate by the PQCrypta scanner with 95% confidence: "Post-Quantum ready, Rust QUIC (quinn), HTTP/3 RFC 9114, Standard port (443)".

Why privacy-first matters for PDFs

How PQ PDF is funded

What the 45 tools cover

How PQ PDF compares

vs. Top-tier Desktop PDF Editors

How auto-detection works

The interactive crop editor

What gets sent to the server

Quality presets

Advanced options

What gets compressed — and what doesn't

Split-canvas preview

Split modes

Output

Layouts

Output options

PDF to other formats

Other formats to PDF

Engine: pymupdf4llm + ONNX

Structural elements detected

LLM and RAG use cases

DPI options

Output formats

Live DPI preview with file size estimate

Page selection and output

PDF/X standards

Render intent — controls RGB → CMYK mapping

What the conversion does

The 47 engines

Risk scoring

Forensic Console

Result banner and risk levels

Statistics grid — 15 fields

Scan report — 24 tabs

Sanitize panel

ML data policy

Standard mode — AES-256-CBC

PQC mode — client-side quantum-safe encryption

Available algorithms (31 total, 29 quantum-resistant)

Signature modes

Visual placement controls (Draw / Type / Upload modes)

Certificate options

Workflow

Signing order

Signature placement

Cryptographic enforcement — require_crypto

Workflow management (from the tracking page)

Storage & retention

Placement positions (8)

Style controls

Page targeting

Mode 1 — Text pattern redaction

Mode 2 — Canvas region redaction

Fill colour and page targeting

Engine

Output formats

Controls

What comes back

Resolution

Sensitivity threshold

Change map colour coding

Preview and output

Supported AcroForm field types

Flatten-after-fill

No-fields detection

The 8 checks

Grading

Font Inspector

Colour Inspector

Detection strategies

JSON output schema

Limitations

Annotation Tools

AcroForm Builder

Deep Content Editing

Workflow

Advanced Editing

How it works

Workflow

Bleed & Marks

Cryptographic enforcement — `require_crypto`