PDF Tools Office Document Forensics Scanner

🗂️ Office Document Forensics Scanner

Forensic analysis of Word, Excel, PowerPoint, Outlook, Access, and Visio files across 23 independent engines — container integrity, encryption detection, metadata provenance, OOXML relationship forensics (remote template injection), embedded payload detection (PE/ELF/scripts), VBA macro extraction (olevba · mraptor · pcodedmp), Excel 4.0 XLM/DDE analysis, OLE compound structure inspection, IOC extraction (URLs · IPs · domains · registry keys · base64 payloads), ClamAV antivirus, YARA rule engine, offline threat intelligence (URLhaus · MalwareBazaar · ThreatFox · FeodoTracker), LibreOffice behavioural rendering, isolation chamber detonation (unshare + strace), entropy & compression anomaly detection, OPC rule validation, OOXML schema validation, font & theme forensics, MIME/transport forensics, digital signature forensics, NLP social engineering classifier (regex + LLM), intelligent cross-engine correlation, and AI forensic report (Qwen 2.5 · MITRE ATT&CK · verdict · confidence). 27 analysis tabs. Zero data retention.

No ads. No tracking. No data sold. Ever.
🗂️
Office Document Forensics Scanner
Deep forensic analysis of Word, Excel, PowerPoint, Outlook, Access, and Visio files across 23 independent engines — container integrity, encryption detection, metadata provenance, OOXML relationship forensics (remote template injection), embedded payload detection (PE/ELF/scripts), VBA macro extraction using olevba · mraptor · pcodedmp, Excel 4.0 XLM/DDE chain analysis, OLE compound structure inspection, IOC extraction (URLs · IPs · domains · registry keys · base64 payloads), ClamAV antivirus, YARA rule engine, offline threat intelligence (URLhaus · MalwareBazaar · ThreatFox · FeodoTracker), LibreOffice behavioural rendering, isolation chamber detonation (unshare + strace), entropy & compression anomaly detection, OPC rule validation, OOXML schema validation, font & theme forensics, MIME/transport forensics, digital signature forensics, NLP social engineering classifier (regex + LLM), intelligent cross-engine correlation, and AI forensic report (Qwen 2.5 · MITRE ATT&CK · verdict · confidence). Results across 27 analysis tabs. Zero data retention — file deleted immediately after analysis.
Multi-Engine Forensic Architecture container · crypto · metadata · relationships · embedded · VBA · XLM/DDE · OLE2 · IOC · ClamAV · YARA · threat intel · LibreOffice · sandbox · entropy · OPC · schema · fonts · MIME · signatures · NLP · correlation · AI 23 engines
Container & Format ID
Validates ZIP (OOXML) or OLE2 container structure, checks part relationships, content-type map anomalies, embedded file signatures, and computes SHA-256 / MD5 / SHA-1 hashes for threat intelligence correlation.
Encryption Detection
Detects encryption indicators, weak cipher modes (RC4 40-bit legacy), CryptoAPI structure anomalies, and password-hash artifacts. Encrypted documents can hide malicious macros from static scanners.
Metadata Forensics
Extracts core document properties: author, last-saved-by, revision count, creation and modification timestamps, company, application version. Detects anti-forensic metadata tampering, mismatched authorship, and anomalous revision history patterns.
Relationship Forensics
Parses all OOXML .rels files to detect external relationships. Flags remote template injection (attachedTemplate), suspicious external links, IP-based URLs, UNC paths, and smb:// targets — all common initial-access techniques.
Embedded Payload Detection
Scans document streams for embedded PE executables (MZ/PE headers), ELF binaries, shell scripts, nested archives, OLE Package objects (dropper technique), and PowerShell / certutil / mshta / regsvr32 LOLBin invocations.
VBA Macro Analysis
Runs olevba (JSON + text fallback), mraptor, and pcodedmp. Extracts VBA source, compiled p-code bytecode (executes even with stripped source), auto-exec triggers, suspicious APIs (Shell, CreateObject, URLDownloadToFile), obfuscation indicators, and embedded IOCs.
XLM / DDE Analysis
Parses Excel 4.0 (XLM) macro sheets and DDE field references — a legacy attack vector still active in modern campaigns. Extracts formula chains and EXEC() / CALL() execution paths invisible to standard VBA scanners.
OLE2 Structure Analyzer
Deep inspection of OLE compound document streams: storage tree reconstruction, directory entry enumeration, embedded PE / OLE objects, CLSID identification, sector chain integrity, and exploit-pattern byte sequences inside individual storage streams.
IOC & String Extraction
Extracts all Indicators of Compromise from raw bytes and decoded streams: HTTP/HTTPS/FTP URLs, IPv4 addresses, domain names, email addresses, Windows file paths, registry key references, PowerShell fragments, Base64-encoded payloads, and embedded hash strings.
ClamAV Antivirus
Runs the document through the ClamAV signature database — over 8 million malware signatures including Office macro exploits, macro droppers, and known weaponised document families. Detects encrypted documents and macro-carrying files.
YARA Rule Engine
Runs 12 curated YARA rules compiled at runtime: VBA auto-exec dropper, heavy obfuscation, XLM shell execution, DDE injection, template injection, shellcode NOP sled, LOLBin references, process injection, embedded PE, encoded PowerShell, suspicious CLSID, and external connection patterns.
Threat Intelligence
Correlates file hashes (SHA-256, MD5, SHA-1) and extracted IOCs against an offline threat intelligence database: URLhaus malicious URLs, MalwareBazaar hash index, ThreatFox IOC feed, and FeodoTracker C2 IP list.
LibreOffice Behavioural
Renders the document in LibreOffice headless mode under a timeout. Captures macro-load attempts logged to stderr, rendering failures that indicate corrupt/exploit structure, and documents that trigger errors only when opened — behavioural signals invisible to static analysis.
Isolation Chamber Detonation
Opens the document inside a fully isolated Linux namespace (unshare --net --pid --fork --mount --ipc) while strace monitors every syscall. Detects network beacon attempts, process spawning, suspicious file writes, and LOLBin execution that only manifest at open-time.
Entropy & Compression Anomaly
Computes Shannon entropy (bits/byte) for every stream and XML part inside OOXML and OLE2 containers. Flags encrypted or compressed blobs (≥7.2 bits/byte), unexpectedly high-entropy XML parts (≥6.5), and regions that evade static pattern matching through encoding. Images, fonts, and legitimate compressed streams are automatically exempted.
OPC Rule Engine
Validates Open Packaging Conventions (ECMA-376 Part 2) structural rules: [Content_Types].xml existence and parse integrity, all declared parts present in ZIP, no duplicate part names, well-formed .rels files, no path traversal in internal targets, and suspicious external targets (IP addresses, UNC paths, smb://, file://). Malformed OPC is a reliable indicator of intentional weaponisation or parser confusion attacks.
OOXML Schema Validator
Checks every XML and VML part for: well-formedness violations that parsers silently recover from, XXE injection attempts (SYSTEM/PUBLIC DOCTYPE declarations), embedded null bytes used to terminate string comparisons, and oversized CDATA blobs (>50 KB) in unexpected document parts — a steganographic payload carrier technique.
Font & Theme Forensics
Inspects embedded font files (TTF/OTF/EOT/WOFF), theme XML for suspicious content (remote URLs, UNC paths, LOLBin references), custom document properties with encoded payloads, custom XML data islands (hidden structured data stores), and external data connections (xl/connections.xml) — channels used to exfiltrate data or stage secondary payloads.
MIME / Transport Forensics
Deep analysis of .eml and .msg email files: sender/Reply-To domain mismatch (spoofing indicator), SPF/DKIM/DMARC authentication failure headers, social engineering subject patterns, executable attachment detection (PE, scripts, LNK, HTA), embedded URL extraction, X-Mailer fingerprinting, and MSG binary parsing via extract-msg with raw fallback.
Digital Signature Forensics
Extracts and analyses digital signatures from OOXML (_xmlsignatures/) and OLE2 (\x05DigitalSignature stream). Identifies signer identity, timestamp, and signing algorithm. Detects weak algorithms (SHA-1/MD5), unsigned VBA macros inside a signed document structure — a known bypass to make untrusted macros appear trusted to enterprise security controls.
NLP Social Engineering Classifier
Two-tier detection of social engineering language in document text. Tier 1: fast regex matching across 5 categories — urgency, impersonation (CEO/IT/government), financial fraud (wire transfer, gift cards), credential harvesting, and security alert spoofing. Tier 2: Qwen 2.5 LLM classifies intent, technique, and target persona when regex patterns fire — providing explainable AI-based social engineering detection.
Intelligent Correlation Engine
Cross-engine signal aggregation that identifies attack chains no single engine can see alone. Applies 10 correlation rules: DROPPER_CHAIN, C2_BEACON, TEMPLATE_INJECTION, ENCRYPTED_PAYLOAD, CREDENTIAL_THEFT_UNC, LIVING_OFF_LAND, TARGETED_ATTACK, SIGNATURE_BYPASS, HIGH_CONFIDENCE_MALWARE, and PARSER_CONFUSION — each with MITRE ATT&CK mapping and confidence scoring. Runs after all structural engines, before AI.
🤖
AI Forensic Report
Qwen 2.5 3B analyses all engine findings and produces a structured verdict: MALICIOUS / SUSPICIOUS / LIKELY_BENIGN / CLEAN with confidence level, executive summary, attack chain narrative, MITRE ATT&CK techniques, and recommended actions. Runs last, after all other engines complete.
Supported: .docx .docm .doc .xlsx .xlsm .xlsb .xls .pptx .pptm .ppt .rtf .one .vsdx .vsdm .msg .eml .ics .mdb .accdb
🧹 Sanitize Options 1 max safety 3 surgical
📄 Convert to PDF Max Safety
✂️ Strip Macros Surgical
🏷️ Strip Metadata Surgical
🔄 Convert to OOXML Surgical
🗂️
Drop your Office document here or click to browse
23 forensic engines · ClamAV · YARA · VBA/XLM · OLE · IOC · NLP · Entropy · OPC · Signatures · Threat Intel · Sandbox · Correlation · AI Report · MITRE ATT&CK · Max 10 MB
📄
🏢 Free scanner: 10 MB limit — covers 99% of real-world malicious Office documents (most weaponised docs are under 2 MB). Need to scan larger files? Enterprise deployment removes all size limits.
Uploading…
① Container ② Crypto ③ Metadata ④ Relationships ⑤ Embedded ⑥ Macros ⑦ XLM ⑧ OLE ⑨ IOC ⑩ ClamAV ⑪ YARA ⑫ Threat Intel ⑬ LibreOffice ⑭ Sandbox ⑮ Entropy ⑯ OPC ⑰ Schema ⑱ Fonts ⑲ MIME ⑳ Signatures ㉑ NLP ㉒ Correlation 🤖 AI Report
Forensic Console idle
────────── Forensic console ready — awaiting scan
🧹 Sanitize Document Remove active content · produce clean output · original unchanged
📄 Convert to PDF Max Safety Renders via LibreOffice — destroys all macros, VBA, XLM, OLE objects, and active content with certainty. Produces a static PDF.
✂️ Strip Macros Surgical Removes all VBA and XLM macros while preserving document content, formatting, and structure.
🏷️ Strip Metadata Surgical Removes author, revision history, last-saved-by, company, and all custom properties from the document.
🔄 Convert to OOXML Surgical Converts legacy OLE2 formats (doc/xls/ppt) to modern OOXML — eliminates OLE exploit surface while preserving content.
Sanitizing…
⚠️