Privacy & Security

Architecture Summary

✅ Zero retention — temp directory deleted while download streams.
✅ No third-party cloud — all processing runs on pqpdf.com servers only.
✅ TLS 1.3 only — cipher TLS_AES_256_GCM_SHA384, HSTS 2-year preload, no TLS 1.2 or below.
✅ HTTP/3 + QUIC v1 + WebTransport — A++ "HTTP/3 Ultimate" rating from the PQCrypta HTTP/3 & WebTransport Scanner — the only free, URL-based tool that probes a remote server specifically for WebTransport support; QUIC address validation, retry, anti-amplification. Standard response headers independently rated A+ by securityheaders.com.
✅ Post-Quantum TLS key exchange — X25519MLKEM768 hybrid KEM negotiated where supported.
✅ Proxy WAF — pqcrypta-proxy enforces rate limiting, JA3/JA4 fingerprint blocking, circuit-breaker protection, and geo-blocking in front of the application.
✅ Strict CSP with per-request nonces — no inline scripts, no eval.
✅ ML data contains no PII — feature vectors only, no file content.

1. Temp-Directory Lifecycle

1.1 Creation

Every API request that processes a file creates one isolated temporary directory before any engine is invoked:

$tmp = sys_get_temp_dir() . '/pdftool_' . bin2hex(random_bytes(12));

The suffix is 24 hexadecimal characters derived from 12 bytes of random_bytes() (CSPRNG). Directory permissions are set to 0700 (owner read/write/execute only). No other process on the server can list or access files inside the directory.

1.2 Isolation

All uploaded files are written only inside this directory. Shell commands receive file paths via escapeshellarg() — no user-controlled string ever reaches the shell interpreter unescaped. Every external command is wrapped with a timeout 120 prefix; any process that exceeds 120 seconds is killed unconditionally by the OS.

1.3 Deletion

Output is delivered via send_file():

readfile($output_path) — begins streaming bytes to the browser.
cleanup($cleanup_paths) — called immediately after readfile() returns, performing rm -rf on the temp directory.
exit — terminates the PHP process.

The temp directory is deleted while the HTTP response is still in flight. There is no post-download cleanup job, no garbage-collection cron, and no retention window. If a request fails before send_file() is reached (e.g. a processing error), the error handler also calls cleanup() before returning the error JSON.

1.4 Failure modes

Scenario	Outcome
Processing engine error	`cleanup()` called in error handler; JSON error returned; temp dir deleted.
Shell command timeout (120 s)	Command killed by OS; PHP receives non-zero exit code; error handler triggers cleanup.
Client disconnects mid-download	PHP detects broken pipe; `readfile()` returns; `cleanup()` still executes.
PHP fatal error (OOM, etc.)	PHP `register_shutdown_function` is not currently registered for cleanup; in this edge case the OS will eventually reclaim the temp dir according to the system's `/tmp` cleanup policy (typically at reboot or via `systemd-tmpfiles`).

2. Transport Security

2.1 Protocol stack

All connections to pqpdf.com are served by pqcrypta-proxy (a Rust/Quinn reverse proxy) fronting the PHP application. The full protocol stack is:

Protocol	Status	Standard
HTTP/3 over QUIC	✅ Supported — negotiated by default	RFC 9114 / RFC 9000 QUIC v1
HTTP/2	✅ Supported — fallback	RFC 7540
HTTP/1.1	✅ Supported — legacy fallback	RFC 7230
WebTransport	✅ Supported on port 443, path `/webtransport`	draft-ietf-webtrans-http3
TLS 1.3	✅ Only version accepted	RFC 8446
TLS 1.2 and below	❌ Disabled	—

2.2 TLS configuration

Only TLS 1.3 is accepted. The negotiated cipher suite is TLS_AES_256_GCM_SHA384. ALPN advertises h3 (HTTP/3) as the preferred protocol via:

Alt-Svc: h3=":443"; ma=86400, h3=":4434"; ma=86400

2.3 Post-Quantum hybrid key exchange

The proxy negotiates X25519MLKEM768 — a hybrid key exchange combining classical X25519 ECDH with ML-KEM-768 (CRYSTALS-Kyber, NIST FIPS 203) — where the client supports it. This provides forward secrecy that is resistant to both classical and quantum adversaries. The x-quantum-resistant response header confirms the active PQC algorithms:

x-quantum-resistant: ML-KEM-1024, ML-DSA-87, X25519MLKEM768
x-security-level: Post-Quantum Ready

2.4 QUIC security properties

The QUIC implementation (pqcrypta-proxy, Rust/Quinn) enforces:

Address Validation — tokens verified on every new connection.
Retry packets — enabled to prevent amplification on initial handshake.
Anti-amplification limit — 3× limit on data sent before path validation.
Stateless Reset — supported for clean connection teardown without state.
0-RTT disabled — replay-attack protection; all connections use 1-RTT.
Connection migration — supported for mobile network transitions.

2.5 HSTS

Every response includes:

Strict-Transport-Security: max-age=63072000; includeSubDomains; preload

This enforces HTTPS for 2 years across pqpdf.com and all subdomains. The domain is submitted to the browser HSTS preload list; first-time visitors never make a plain-HTTP request.

2.6 Certificate

The TLS certificate is issued by a public CA and renewed automatically. Certificate transparency logs are public. Application-layer certificate pinning is not enforced; browsers validate via the standard CA chain.

2.7 Response security headers

Every response from pqpdf.com carries the full security header stack. Standard HTTP response headers achieve an A+ rating on securityheaders.com. The full HTTP/3, QUIC v1, WebTransport, and PQC hybrid KEM stack achieves an A++ "HTTP/3 Ultimate" rating on the PQCrypta HTTP/3 & WebTransport Scanner — the only free, URL-based tool that probes remote servers for WebTransport support (securityheaders.com tests HTTP response headers only; it does not scan QUIC or WebTransport):

Header	Value (summarised)	Purpose
`Content-Security-Policy`	Per-request nonces; `default-src 'self'`; no `unsafe-inline` or `unsafe-eval`	XSS mitigation; script-injection prevention
`Strict-Transport-Security`	`max-age=63072000; includeSubDomains; preload`	2-year HTTPS enforcement; preload-list eligible
`X-Frame-Options`	`DENY`	Blocks clickjacking via iframe embedding
`X-Content-Type-Options`	`nosniff`	Prevents MIME-type sniffing attacks
`Referrer-Policy`	`strict-origin-when-cross-origin`	Limits referrer leakage on cross-origin navigation
`Permissions-Policy`	`camera=(), microphone=(), geolocation=(), interest-cohort=(), fullscreen=(self), payment=()`	Disables browser APIs not required; opts out of FLoC/Topics
`Cross-Origin-Resource-Policy`	`cross-origin`	Allows cross-origin fetch of resources (e.g. images in other pages)
`Cross-Origin-Opener-Policy`	`unsafe-none`	Window opener access; set to unsafe-none for tool popup compatibility
`Cross-Origin-Embedder-Policy`	`unsafe-none`	Allows cross-origin resources without CORS/CORP requirement
`X-Permitted-Cross-Domain-Policies`	`none`	Blocks Flash/PDF cross-domain policy files
`X-Download-Options`	`noopen`	Prevents IE from executing downloaded files in the browser context
`X-DNS-Prefetch-Control`	`off`	Disables speculative DNS prefetching to reduce information leakage
`x-quantum-resistant`	`ML-KEM-1024, ML-DSA-87, X25519MLKEM768`	Signals active post-quantum algorithms to clients and scanners
`x-security-level`	`Post-Quantum Ready`	Operational security posture indicator
`NEL`	`max_age=86400; include_subdomains=true`	Network Error Logging — browser reports network failures to our collector
`Report-To`	Endpoint: `api.pqcrypta.com/reports`	Reporting API endpoint for NEL and CSP violation reports
`x-ratelimit-limit` / `x-ratelimit-remaining`	Limit: 50 requests per window	Transparent rate-limit status communicated to clients

3. Content Security Policy Detail

Every tool page and legal page generates two random nonces per request:

ext_script_nonce — applied to <script src="..."> tags loading external JS files from /js/.
inline_script_nonce — applied to any <script> block containing inline initialisation code.

Both nonces are 16 bytes of random_bytes() encoded as base64 — 128 bits of entropy, unpredictable per request. The CSP is set to:

default-src 'self';
script-src 'self' 'nonce-{ext}' 'nonce-{inline}';
style-src 'self';
img-src 'self' data: blob: https: http:;
font-src 'self' data:;
connect-src 'self' https://api.pqpdf.com;
worker-src 'self' blob:;
child-src 'none';
object-src 'none';
frame-src 'none';
upgrade-insecure-requests;

There is no unsafe-inline, no unsafe-eval, and no wildcard source. worker-src 'self' blob: is permitted to allow PDF.js and Web Workers used by client-side tool rendering; blob: worker scripts are generated entirely from trusted local modules. connect-src permits calls to api.pqpdf.com for tool operations. All JavaScript event handlers are registered via addEventListener() in external ES modules; no onclick, onload, or other HTML event attributes exist in any page.

4. ML Intelligence Engine — Data Policy

4.1 What is stored

When the PDF Forensics Scanner processes a file, the ML Intelligence Engine extracts a 38-dimensional feature vector from the aggregated outputs of the 44 detection engines and stores it in PostgreSQL. The 38 features are entirely structural and statistical measurements:

Counts of suspicious byte patterns, object types, and stream anomalies.
Shannon entropy values for non-image streams.
Boolean flags for JavaScript presence, encryption status, AcroForm, XFA, polyglot signatures, etc.
ClamAV match flag, YARA rule match count, ExifTool anomaly count.
Differential parser discrepancy counts across 8 dimensions.
Dynamic sandbox syscall anomaly score and strace indicator count.
Composite risk score and ML probability from the current model run.

What is not stored: file content, file name, file hash, IP address, user agent, session identifier, or any personally identifiable information. The feature vector cannot be used to reconstruct or identify the original file.

4.2 How it is used

Stored feature vectors are used exclusively to train and retrain the ML models:

IsolationForest (unsupervised anomaly detection) — active from scan 1; detects statistical outliers without requiring labeled data.
RandomForest classifier (supervised) — activates once 50 or more feature vectors carry a user-submitted feedback label.

A cron job runs ml/train.py every 30 minutes to refit both models on the accumulated dataset. Trained model artefacts are written to ml/models/ and loaded into the next scan request.

4.3 Feedback labels

The scan report UI presents a feedback panel where you can voluntarily label a result as malicious or benign. Submitting feedback appends a label column to the corresponding feature vector row. Feedback is stored in the same PostgreSQL table as the feature vector — no separate record is created. Feedback is entirely optional and anonymous; it cannot be linked to you or the file you scanned.

4.4 Retention

Feature vectors and feedback labels are retained indefinitely to support continuous model improvement. No time-based purge is currently implemented. Because records contain no PII and cannot identify individuals or files, they are not subject to erasure obligations under GDPR Article 17. If you have a specific concern, contact us via the contact form.

5. Rate Limiting, WAF, and Abuse Prevention

5.1 Proxy-layer defences (pqcrypta-proxy)

All traffic to pqpdf.com passes through pqcrypta-proxy, a custom Rust reverse proxy with the following security layers operating before any request reaches the PHP application:

Rate limiting — per-IP sliding-window rate limiter. The x-ratelimit-limit and x-ratelimit-remaining response headers communicate the current window to clients. Requests exceeding the limit receive HTTP 429.
WAF — security blocklist — blocks known malicious IP ranges, Tor exit nodes, and addresses with recent abuse records. Blocked requests receive HTTP 403 with header x-waf-block: 1; they are counted separately from legitimate request failures in metrics.
WAF — bot/scanner blocklist — JA3 and JA4 TLS fingerprint matching to identify and block automated scanners, credential stuffers, and vulnerability probes. Operates at the TLS handshake layer before HTTP is parsed.
Geo-blocking — configurable country-level access controls.
Circuit breaker — if the upstream PHP application becomes unresponsive or error rates spike, the circuit breaker opens and returns a service-unavailable response rather than queueing connections, protecting the backend from cascade failure.
QUIC anti-amplification — 3× limit on data sent before path validation to prevent UDP amplification attacks (see Section 2.4).

5.2 Application-layer rate limiting (web UI)

In addition to proxy-layer controls, the PHP application enforces a session-based sliding-window counter of 10 operations per 5-minute window per session. When exceeded, the API returns HTTP 429 with a plain-text message; no silent failure occurs.

The following polling and keepalive operations are explicitly exempt from application-layer rate limiting to avoid blocking live progress UIs: edit-page, edit-ping, edit-qr-generate, pdf-scan-poll. Exempt operations call session_write_close() immediately to release the session lock so concurrent poll requests do not queue.

5.3 REST API rate limiting (api.pqpdf.com)

The external REST API at api.pqpdf.com enforces a separate, database-backed rate-limit scheme per API key:

Default hourly limit: 100 requests per key per UTC calendar hour, tracked in a PostgreSQL counter table with UPSERT semantics. Exceeded limits return HTTP 429.
Default daily limit: 500 requests per key per UTC calendar day.
Both limits are configurable at key creation time and are enforced before the request is proxied to the processing engine, so no file upload processing occurs for rate-limited requests.
Polling and keepalive operations (edit-ping, edit-page, pdf-scan-poll, esign-status, esign-preview) are exempt to avoid inflating counters during long-running stateful workflows.

6. File Validation

Before any processing engine is invoked, uploaded files pass through three checks in api.php:

Size limits — MAX_FILE_SIZE = 52,428,800 bytes (50 MB) per file; MAX_TOTAL_SIZE = 209,715,200 bytes (200 MB) across all files in a single request. Exceeded limits return HTTP 413.
Magic-byte check — For PDF operations, the first 4 bytes of the uploaded file are read with fread($fh, 4) and compared against '%PDF'. Files that fail this check are rejected before any engine runs. Office and image uploads use extension and MIME checks instead.
MIME type — PDF uploads check ['application/pdf', 'application/x-pdf']. Office uploads check against a list of allowed extensions (docx, doc, ppt, pptx, xls, xlsx, odt, odp, ods).

7. Post-Quantum Encryption (Protect PDF)

The Protect PDF tool offers two encryption modes:

Standard (AES-256-CBC) — password-based encryption applied server-side by Ghostscript / qpdf. The password is transmitted over TLS and used only during the request; it is never stored.
PQC mode — key generation and wrapping happens entirely in the browser using the @noble/post-quantum library before the file is uploaded. 31 post-quantum algorithms are supported, including NIST-standardised ML-KEM-1024, HQC-128/192/256, FN-DSA variants, and hybrid modes. The server receives only an already-encrypted bundle (.pqcpdf); the plaintext file never travels over the network in PQC mode.

PQC key material is generated client-side and never sent to the server. The server has no access to PQC private keys.

8. Third-Party Dependencies

All server-side processing engines are open-source software running locally on the pqpdf.com server. No file data is forwarded to any external API or third-party cloud service during processing. The full engine list:

Engine	Purpose	External network calls
Ghostscript	Compress, watermark, rotate, protect, flatten, grayscale, repair	None
Poppler (pdfunite, pdftoppm, pdftotext, pdfinfo)	Merge, split, extract text, to-images, info	None
qpdf	Protect / unlock, structural analysis	None
LibreOffice	Office ↔ PDF conversion	None
ImageMagick	Images → PDF	None
Tesseract 5	OCR	None
PyMuPDF / Python	Scan engines ①–⑩, edit/fill apply	None
ExifTool 12	EXIF/XMP metadata analysis (Engine ⑪)	None
YARA 4.5	Rule-based pattern matching (Engine ⑬)	None
ClamAV 1.4+	Signature scanning 700k+ sigs (Engine ⑰)	Signature updates only (clamav.net)
PeePDF 0.4	Deep object analysis (Engine ⑭)	None
strace + unshare	Dynamic behavioral sandbox (Engine ⑮)	None (network namespace isolated)
scikit-learn	ML anomaly detection + classifier (Engine ⑱)	None
acorn (Node.js)	JavaScript AST deobfuscation (Engine ⑴)	None

ClamAV signature database updates are the only outbound network calls made by processing engines. These pull signature definition files from clamav.net; no file content is transmitted.

9. REST API Security (api.pqpdf.com)

9.1 Authentication

The REST API uses API-key authentication exclusively. Keys are passed in the X-API-Key HTTP request header. Raw key values are never stored — at key creation time the raw key is hashed with SHA-256 and only the hex digest is written to the database. Authentication computes the SHA-256 hash of the presented key and performs a constant-time equality check against the stored digest. Keys follow the format pqpdf_<48 hex chars> (192 bits of entropy from rand::thread_rng() in the Rust gateway).

9.2 IP Whitelisting

Each API key can have zero or more IPv4/IPv6 addresses or CIDR ranges attached as an IP whitelist. When any entries are present, requests from IPs not matching any listed entry are rejected with HTTP 403 before authentication succeeds. IP matching uses the PostgreSQL INET type and network containment operators; a /32 host entry matches a single address, a wider CIDR matches any address in the range.

9.3 Transport

All traffic to api.pqpdf.com is routed through the same pqcrypta-proxy instance as the web UI, inheriting the same TLS 1.3-only, HSTS, post-quantum hybrid key exchange, and WAF protections described in Sections 2 and 5. The API does not accept plain HTTP connections.

9.4 Process Isolation

Requests authenticated by the REST API are proxied to the same api.php processing endpoint used by the web UI. All four layers of process isolation (prlimit resource limits, AppArmor mandatory access control, Linux user namespaces, and per-request tmpfs temporary directories) apply identically to API requests. No isolation layer is bypassed for programmatic access.

9.5 Session Binding

Stateful operations (PDF Editor, Form Fill, Outline, eSign, async security scan) use the X-Session-Id header to bind requests to a PHP session via a per-(key_id, session_id) reqwest client that maintains a PHP session cookie jar in memory in the Rust gateway. Session state is stored in-process only and is evicted after 30 minutes of inactivity. No session data is written to disk.

9.6 Usage Logging

Every non-rate-limited API request produces an asynchronous, non-blocking insert into pqpdf_api_usage containing: key ID, operation name, client IP (INET), HTTP status code, duration in milliseconds, and response size. No file content, file name, or file hash is included in usage logs. Usage records are retained for 30 days.

10. Reporting a Security Vulnerability

If you discover a security vulnerability in PQ PDF, please report it responsibly:

Email contact@pqcrypta.com with a description of the issue.
Include steps to reproduce, affected endpoint or page, and your assessment of impact.
We aim to acknowledge reports within 48 hours.
We will work to remediate confirmed issues promptly and credit researchers who follow responsible disclosure.
Please allow reasonable time for a fix before public disclosure.

11. Enterprise On-Premise Security

10.1 Architecture

Enterprise on-premise deployments run the identical stack described in this document — the same PHP application, the same Python processing scripts, the same sandboxing model, and the same Apache security headers — within the organisation's own infrastructure. No component phones home to pqpdf.com or any PQ PDF server. All processing, logging, and ML training occurs entirely on the organisation's own machines.

10.2 Security Responsibility Model

For on-premise deployments the security responsibilities are divided as follows:

Responsibility	PQ PDF (software supplier)	Licenced organisation
Application security (code, sandboxing, CSP, input validation)	✓ Provided in software	Keep deployment updated
Operating system hardening and patching	—	✓ Organisation
Network perimeter (firewall, WAF, DDoS protection)	—	✓ Organisation
TLS certificate management	Scripts provided (certbot / self-signed)	✓ Organisation owns and renews
PostgreSQL access control and encryption at rest	Schema and grants provided	✓ Organisation
Redis security (auth, bind address, persistence)	Default bind 127.0.0.1 in deploy script	✓ Organisation
Log retention and access control	Logrotate config provided (90 days)	✓ Organisation sets policy
Security updates to PQ PDF application	✓ Published with release notes	✓ Organisation applies updates
Vulnerability disclosure response	✓ 48 h acknowledgement target	Notify PQ PDF of deployment-specific findings

10.3 Supplied Security Configuration

The enterprise distribution includes the following security-related configuration out of the box:

deploy.sh — installs and configures AppArmor profiles for the sandbox binary, sets tmp/ permissions (0750), configures PHP-FPM with open_basedir restrictions, and disables unnecessary PHP functions.
install.sh — generates an Apache vhost with TLS 1.3, HSTS preloading, OCSP stapling, and the full security headers stack (CSP, X-Frame-Options, Referrer-Policy, Permissions-Policy).
setup-proxy.sh — configures pqcrypta-proxy with rate limiting, JA3/JA4 scanner fingerprint blocking, geo-blocking, and circuit-breaker protection.
pqpdf-sandbox — wraps processing commands with Linux namespaces (unshare), dropping network access, restricting filesystem mounts, and enforcing resource limits.

Organisations should review and adapt these configurations to their own security baseline before going live, particularly firewall rules, rate-limit thresholds, and log retention periods.

10.4 Compliance Frameworks

PQ PDF's on-premise architecture is designed to be compatible with common compliance frameworks, including:

GDPR / UK GDPR — zero file retention, no third-party data transfer, local ML training, configurable log retention.
HIPAA — no PHI leaves the organisation's infrastructure; all processing is local; audit logging via Apache access logs; no BAA required with PQ PDF as software supplier (not a business associate).
ISO 27001 / SOC 2 — security controls documentation, input validation, least-privilege process model, dependency inventory in PREREQUISITES.md.
Cyber Essentials — TLS encryption in transit, strict CSP, no unnecessary open ports, regular dependency updates.

PQ PDF does not hold any certifications on behalf of customer deployments. Organisations must conduct their own assessment and certification against the relevant framework.

10.5 Security Reviews and Compliance Enquiries

For penetration testing authorisation, compliance questionnaires, security architecture reviews, or to request deployment-specific documentation, use the contact form or email contact@pqcrypta.com with the subject line Enterprise / Security Review.