Architecture Summary
- ✅ Zero retention — temp directory deleted while download streams.
- ✅ No third-party cloud — all processing runs on pqpdf.com servers only.
- ✅ TLS 1.3 only — cipher
TLS_AES_256_GCM_SHA384, HSTS 2-year preload, no TLS 1.2 or below. - ✅ HTTP/3 + QUIC v1 + WebTransport — A++ "HTTP/3 Ultimate" rating from the PQCrypta HTTP/3 & WebTransport Scanner — the only free, URL-based tool that probes a remote server specifically for WebTransport support; QUIC address validation, retry, anti-amplification. Standard response headers independently rated A+ by securityheaders.com.
- ✅ Post-Quantum TLS key exchange —
X25519MLKEM768hybrid KEM negotiated where supported. - ✅ Proxy WAF — pqcrypta-proxy enforces rate limiting, JA3/JA4 fingerprint blocking, circuit-breaker protection, and geo-blocking in front of the application.
- ✅ Strict CSP with per-request nonces — no inline scripts, no eval.
- ✅ ML data contains no PII — feature vectors only, no file content.
1. Temp-Directory Lifecycle
1.1 Creation
Every API request that processes a file creates one isolated temporary directory before any engine is invoked:
$tmp = sys_get_temp_dir() . '/pdftool_' . bin2hex(random_bytes(12));
The suffix is 24 hexadecimal characters derived from 12 bytes of
random_bytes() (CSPRNG). Directory permissions are set to 0700
(owner read/write/execute only). No other process on the server can list or access files
inside the directory.
1.2 Isolation
All uploaded files are written only inside this directory. Shell commands receive file paths
via escapeshellarg() — no user-controlled string ever reaches the shell
interpreter unescaped. Every external command is wrapped with a
timeout 120 prefix; any process that exceeds 120 seconds is killed
unconditionally by the OS.
1.3 Deletion
Output is delivered via send_file():
readfile($output_path)— begins streaming bytes to the browser.cleanup($cleanup_paths)— called immediately afterreadfile()returns, performingrm -rfon the temp directory.exit— terminates the PHP process.
The temp directory is deleted while the HTTP response is still in flight.
There is no post-download cleanup job, no garbage-collection cron, and no retention window.
If a request fails before send_file() is reached (e.g. a processing error),
the error handler also calls cleanup() before returning the error JSON.
1.4 Failure modes
| Scenario | Outcome |
|---|---|
| Processing engine error | cleanup() called in error handler; JSON error returned; temp dir deleted. |
| Shell command timeout (120 s) | Command killed by OS; PHP receives non-zero exit code; error handler triggers cleanup. |
| Client disconnects mid-download | PHP detects broken pipe; readfile() returns; cleanup() still executes. |
| PHP fatal error (OOM, etc.) | PHP register_shutdown_function is not currently registered for cleanup; in this edge case the OS will eventually reclaim the temp dir according to the system's /tmp cleanup policy (typically at reboot or via systemd-tmpfiles). |
2. Transport Security
2.1 Protocol stack
All connections to pqpdf.com are served by pqcrypta-proxy (a Rust/Quinn reverse proxy) fronting the PHP application. The full protocol stack is:
| Protocol | Status | Standard |
|---|---|---|
| HTTP/3 over QUIC | ✅ Supported — negotiated by default | RFC 9114 / RFC 9000 QUIC v1 |
| HTTP/2 | ✅ Supported — fallback | RFC 7540 |
| HTTP/1.1 | ✅ Supported — legacy fallback | RFC 7230 |
| WebTransport | ✅ Supported on port 443, path /webtransport | draft-ietf-webtrans-http3 |
| TLS 1.3 | ✅ Only version accepted | RFC 8446 |
| TLS 1.2 and below | ❌ Disabled | — |
2.2 TLS configuration
Only TLS 1.3 is accepted. The negotiated cipher suite is
TLS_AES_256_GCM_SHA384. ALPN advertises h3 (HTTP/3) as the
preferred protocol via:
Alt-Svc: h3=":443"; ma=86400, h3=":4434"; ma=86400
2.3 Post-Quantum hybrid key exchange
The proxy negotiates X25519MLKEM768 — a hybrid key exchange combining
classical X25519 ECDH with ML-KEM-768 (CRYSTALS-Kyber, NIST FIPS 203) — where the
client supports it. This provides forward secrecy that is resistant to both classical
and quantum adversaries. The x-quantum-resistant response header confirms
the active PQC algorithms:
x-quantum-resistant: ML-KEM-1024, ML-DSA-87, X25519MLKEM768 x-security-level: Post-Quantum Ready
2.4 QUIC security properties
The QUIC implementation (pqcrypta-proxy, Rust/Quinn) enforces:
- Address Validation — tokens verified on every new connection.
- Retry packets — enabled to prevent amplification on initial handshake.
- Anti-amplification limit — 3× limit on data sent before path validation.
- Stateless Reset — supported for clean connection teardown without state.
- 0-RTT disabled — replay-attack protection; all connections use 1-RTT.
- Connection migration — supported for mobile network transitions.
2.5 HSTS
Every response includes:
Strict-Transport-Security: max-age=63072000; includeSubDomains; preload
This enforces HTTPS for 2 years across pqpdf.com and all subdomains. The domain is submitted to the browser HSTS preload list; first-time visitors never make a plain-HTTP request.
2.6 Certificate
The TLS certificate is issued by a public CA and renewed automatically. Certificate transparency logs are public. Application-layer certificate pinning is not enforced; browsers validate via the standard CA chain.
2.7 Response security headers
Every response from pqpdf.com carries the full security header stack. Standard HTTP response headers achieve an A+ rating on securityheaders.com. The full HTTP/3, QUIC v1, WebTransport, and PQC hybrid KEM stack achieves an A++ "HTTP/3 Ultimate" rating on the PQCrypta HTTP/3 & WebTransport Scanner — the only free, URL-based tool that probes remote servers for WebTransport support (securityheaders.com tests HTTP response headers only; it does not scan QUIC or WebTransport):
| Header | Value (summarised) | Purpose |
|---|---|---|
Content-Security-Policy |
Per-request nonces; default-src 'self'; no unsafe-inline or unsafe-eval |
XSS mitigation; script-injection prevention |
Strict-Transport-Security |
max-age=63072000; includeSubDomains; preload |
2-year HTTPS enforcement; preload-list eligible |
X-Frame-Options |
DENY |
Blocks clickjacking via iframe embedding |
X-Content-Type-Options |
nosniff |
Prevents MIME-type sniffing attacks |
Referrer-Policy |
strict-origin-when-cross-origin |
Limits referrer leakage on cross-origin navigation |
Permissions-Policy |
camera=(), microphone=(), geolocation=(), interest-cohort=(), fullscreen=(self), payment=() |
Disables browser APIs not required; opts out of FLoC/Topics |
Cross-Origin-Resource-Policy |
cross-origin |
Allows cross-origin fetch of resources (e.g. images in other pages) |
Cross-Origin-Opener-Policy |
unsafe-none |
Window opener access; set to unsafe-none for tool popup compatibility |
Cross-Origin-Embedder-Policy |
unsafe-none |
Allows cross-origin resources without CORS/CORP requirement |
X-Permitted-Cross-Domain-Policies |
none |
Blocks Flash/PDF cross-domain policy files |
X-Download-Options |
noopen |
Prevents IE from executing downloaded files in the browser context |
X-DNS-Prefetch-Control |
off |
Disables speculative DNS prefetching to reduce information leakage |
x-quantum-resistant |
ML-KEM-1024, ML-DSA-87, X25519MLKEM768 |
Signals active post-quantum algorithms to clients and scanners |
x-security-level |
Post-Quantum Ready |
Operational security posture indicator |
NEL |
max_age=86400; include_subdomains=true |
Network Error Logging — browser reports network failures to our collector |
Report-To |
Endpoint: api.pqcrypta.com/reports |
Reporting API endpoint for NEL and CSP violation reports |
x-ratelimit-limit / x-ratelimit-remaining |
Limit: 50 requests per window | Transparent rate-limit status communicated to clients |
3. Content Security Policy Detail
Every tool page and legal page generates two random nonces per request:
- ext_script_nonce — applied to
<script src="...">tags loading external JS files from/js/. - inline_script_nonce — applied to any
<script>block containing inline initialisation code.
Both nonces are 16 bytes of random_bytes() encoded as base64 — 128 bits
of entropy, unpredictable per request. The CSP is set to:
default-src 'self';
script-src 'self' 'nonce-{ext}' 'nonce-{inline}';
style-src 'self';
img-src 'self' data: blob: https: http:;
font-src 'self' data:;
connect-src 'self' https://api.pqpdf.com;
worker-src 'self' blob:;
child-src 'none';
object-src 'none';
frame-src 'none';
upgrade-insecure-requests;
There is no unsafe-inline, no unsafe-eval, and no wildcard
source. worker-src 'self' blob: is permitted to allow PDF.js and
Web Workers used by client-side tool rendering; blob: worker scripts are
generated entirely from trusted local modules. connect-src permits
calls to api.pqpdf.com for tool operations. All JavaScript event
handlers are registered via addEventListener() in external ES modules;
no onclick, onload, or other HTML event attributes
exist in any page.
4. ML Intelligence Engine — Data Policy
4.1 What is stored
When the PDF Forensics Scanner processes a file, the ML Intelligence Engine extracts a 38-dimensional feature vector from the aggregated outputs of the 44 detection engines and stores it in PostgreSQL. The 38 features are entirely structural and statistical measurements:
- Counts of suspicious byte patterns, object types, and stream anomalies.
- Shannon entropy values for non-image streams.
- Boolean flags for JavaScript presence, encryption status, AcroForm, XFA, polyglot signatures, etc.
- ClamAV match flag, YARA rule match count, ExifTool anomaly count.
- Differential parser discrepancy counts across 8 dimensions.
- Dynamic sandbox syscall anomaly score and strace indicator count.
- Composite risk score and ML probability from the current model run.
What is not stored: file content, file name, file hash, IP address, user agent, session identifier, or any personally identifiable information. The feature vector cannot be used to reconstruct or identify the original file.
4.2 How it is used
Stored feature vectors are used exclusively to train and retrain the ML models:
- IsolationForest (unsupervised anomaly detection) — active from scan 1; detects statistical outliers without requiring labeled data.
- RandomForest classifier (supervised) — activates once 50 or more feature vectors carry a user-submitted feedback label.
A cron job runs ml/train.py every 30 minutes to refit both models on the
accumulated dataset. Trained model artefacts are written to ml/models/ and
loaded into the next scan request.
4.3 Feedback labels
The scan report UI presents a feedback panel where you can voluntarily label a result as malicious or benign. Submitting feedback appends a label column to the corresponding feature vector row. Feedback is stored in the same PostgreSQL table as the feature vector — no separate record is created. Feedback is entirely optional and anonymous; it cannot be linked to you or the file you scanned.
4.4 Retention
Feature vectors and feedback labels are retained indefinitely to support continuous model improvement. No time-based purge is currently implemented. Because records contain no PII and cannot identify individuals or files, they are not subject to erasure obligations under GDPR Article 17. If you have a specific concern, contact us via the contact form.
5. Rate Limiting, WAF, and Abuse Prevention
5.1 Proxy-layer defences (pqcrypta-proxy)
All traffic to pqpdf.com passes through pqcrypta-proxy, a custom Rust reverse proxy with the following security layers operating before any request reaches the PHP application:
-
Rate limiting — per-IP sliding-window rate limiter. The
x-ratelimit-limitandx-ratelimit-remainingresponse headers communicate the current window to clients. Requests exceeding the limit receive HTTP429. -
WAF — security blocklist — blocks known malicious IP ranges,
Tor exit nodes, and addresses with recent abuse records. Blocked requests receive
HTTP
403with headerx-waf-block: 1; they are counted separately from legitimate request failures in metrics. - WAF — bot/scanner blocklist — JA3 and JA4 TLS fingerprint matching to identify and block automated scanners, credential stuffers, and vulnerability probes. Operates at the TLS handshake layer before HTTP is parsed.
- Geo-blocking — configurable country-level access controls.
- Circuit breaker — if the upstream PHP application becomes unresponsive or error rates spike, the circuit breaker opens and returns a service-unavailable response rather than queueing connections, protecting the backend from cascade failure.
- QUIC anti-amplification — 3× limit on data sent before path validation to prevent UDP amplification attacks (see Section 2.4).
5.2 Application-layer rate limiting
In addition to proxy-layer controls, the PHP application enforces a session-based
sliding-window counter of 10 operations per 5-minute window per
session. When exceeded, the API returns HTTP 429 with a plain-text
message; no silent failure occurs.
The following polling and keepalive operations are explicitly exempt from
application-layer rate limiting to avoid blocking live progress UIs:
edit-page, edit-ping, edit-qr-generate,
pdf-scan-poll. Exempt operations call session_write_close()
immediately to release the session lock so concurrent poll requests do not queue.
6. File Validation
Before any processing engine is invoked, uploaded files pass through three checks in
api.php:
-
Size limits —
MAX_FILE_SIZE = 52,428,800bytes (50 MB) per file;MAX_TOTAL_SIZE = 209,715,200bytes (200 MB) across all files in a single request. Exceeded limits return HTTP413. -
Magic-byte check — For PDF operations, the first 4 bytes of the
uploaded file are read with
fread($fh, 4)and compared against'%PDF'. Files that fail this check are rejected before any engine runs. Office and image uploads use extension and MIME checks instead. -
MIME type — PDF uploads check
['application/pdf', 'application/x-pdf']. Office uploads check against a list of allowed extensions (docx, doc, ppt, pptx, xls, xlsx, odt, odp, ods).
7. Post-Quantum Encryption (Protect PDF)
The Protect PDF tool offers two encryption modes:
- Standard (AES-256-CBC) — password-based encryption applied server-side by Ghostscript / qpdf. The password is transmitted over TLS and used only during the request; it is never stored.
-
PQC mode — key generation and wrapping happens
entirely in the browser using the
@noble/post-quantumlibrary before the file is uploaded. 31 post-quantum algorithms are supported, including NIST-standardised ML-KEM-1024, HQC-128/192/256, FN-DSA variants, and hybrid modes. The server receives only an already-encrypted bundle (.pqcpdf); the plaintext file never travels over the network in PQC mode.
PQC key material is generated client-side and never sent to the server. The server has no access to PQC private keys.
8. Third-Party Dependencies
All server-side processing engines are open-source software running locally on the pqpdf.com server. No file data is forwarded to any external API or third-party cloud service during processing. The full engine list:
| Engine | Purpose | External network calls |
|---|---|---|
| Ghostscript | Compress, watermark, rotate, protect, flatten, grayscale, repair | None |
| Poppler (pdfunite, pdftoppm, pdftotext, pdfinfo) | Merge, split, extract text, to-images, info | None |
| qpdf | Protect / unlock, structural analysis | None |
| LibreOffice | Office ↔ PDF conversion | None |
| ImageMagick | Images → PDF | None |
| Tesseract 5 | OCR | None |
| PyMuPDF / Python | Scan engines ①–⑩, edit/fill apply | None |
| ExifTool 12 | EXIF/XMP metadata analysis (Engine ⑪) | None |
| YARA 4.5 | Rule-based pattern matching (Engine ⑬) | None |
| ClamAV 1.4+ | Signature scanning 700k+ sigs (Engine ⑰) | Signature updates only (clamav.net) |
| PeePDF 0.4 | Deep object analysis (Engine ⑭) | None |
| strace + unshare | Dynamic behavioral sandbox (Engine ⑮) | None (network namespace isolated) |
| scikit-learn | ML anomaly detection + classifier (Engine ⑱) | None |
| acorn (Node.js) | JavaScript AST deobfuscation (Engine ⑴) | None |
ClamAV signature database updates are the only outbound network calls made by processing
engines. These pull signature definition files from clamav.net; no file
content is transmitted.
9. Reporting a Security Vulnerability
If you discover a security vulnerability in PQ PDF, please report it responsibly:
- Email contact@pqcrypta.com with a description of the issue.
- Include steps to reproduce, affected endpoint or page, and your assessment of impact.
- We aim to acknowledge reports within 48 hours.
- We will work to remediate confirmed issues promptly and credit researchers who follow responsible disclosure.
- Please allow reasonable time for a fix before public disclosure.
10. Enterprise On-Premise Security
10.1 Architecture
Enterprise on-premise deployments run the identical stack described in this document — the same PHP application, the same Python processing scripts, the same sandboxing model, and the same Apache security headers — within the organisation's own infrastructure. No component phones home to pqpdf.com or any PQ PDF server. All processing, logging, and ML training occurs entirely on the organisation's own machines.
10.2 Security Responsibility Model
For on-premise deployments the security responsibilities are divided as follows:
| Responsibility | PQ PDF (software supplier) | Licenced organisation |
|---|---|---|
| Application security (code, sandboxing, CSP, input validation) | ✓ Provided in software | Keep deployment updated |
| Operating system hardening and patching | — | ✓ Organisation |
| Network perimeter (firewall, WAF, DDoS protection) | — | ✓ Organisation |
| TLS certificate management | Scripts provided (certbot / self-signed) | ✓ Organisation owns and renews |
| PostgreSQL access control and encryption at rest | Schema and grants provided | ✓ Organisation |
| Redis security (auth, bind address, persistence) | Default bind 127.0.0.1 in deploy script | ✓ Organisation |
| Log retention and access control | Logrotate config provided (90 days) | ✓ Organisation sets policy |
| Security updates to PQ PDF application | ✓ Published with release notes | ✓ Organisation applies updates |
| Vulnerability disclosure response | ✓ 48 h acknowledgement target | Notify PQ PDF of deployment-specific findings |
10.3 Supplied Security Configuration
The enterprise distribution includes the following security-related configuration out of the box:
- deploy.sh — installs and configures AppArmor profiles for the sandbox binary, sets
tmp/permissions (0750), configures PHP-FPM withopen_basedirrestrictions, and disables unnecessary PHP functions. - install.sh — generates an Apache vhost with TLS 1.3, HSTS preloading, OCSP stapling, and the full security headers stack (CSP, X-Frame-Options, Referrer-Policy, Permissions-Policy).
- setup-proxy.sh — configures pqcrypta-proxy with rate limiting, JA3/JA4 scanner fingerprint blocking, geo-blocking, and circuit-breaker protection.
- pqpdf-sandbox — wraps processing commands with Linux namespaces (
unshare), dropping network access, restricting filesystem mounts, and enforcing resource limits.
Organisations should review and adapt these configurations to their own security baseline before going live, particularly firewall rules, rate-limit thresholds, and log retention periods.
10.4 Compliance Frameworks
PQ PDF's on-premise architecture is designed to be compatible with common compliance frameworks, including:
- GDPR / UK GDPR — zero file retention, no third-party data transfer, local ML training, configurable log retention.
- HIPAA — no PHI leaves the organisation's infrastructure; all processing is local; audit logging via Apache access logs; no BAA required with PQ PDF as software supplier (not a business associate).
- ISO 27001 / SOC 2 — security controls documentation, input validation, least-privilege process model, dependency inventory in PREREQUISITES.md.
- Cyber Essentials — TLS encryption in transit, strict CSP, no unnecessary open ports, regular dependency updates.
PQ PDF does not hold any certifications on behalf of customer deployments. Organisations must conduct their own assessment and certification against the relevant framework.
10.5 Security Reviews and Compliance Enquiries
For penetration testing authorisation, compliance questionnaires, security architecture reviews, or to request deployment-specific documentation, use the contact form or email contact@pqcrypta.com with the subject line Enterprise / Security Review.