①
Container & Format ID
▼
Validates ZIP (OOXML) or OLE2 container structure, checks part relationships, content-type map anomalies, embedded file signatures, and computes SHA-256 / MD5 / SHA-1 hashes for threat intelligence correlation.
Detects encryption indicators, weak cipher modes (RC4 40-bit legacy), CryptoAPI structure anomalies, and password-hash artifacts. Encrypted documents can hide malicious macros from static scanners.
Extracts core document properties: author, last-saved-by, revision count, creation and modification timestamps, company, application version. Detects anti-forensic metadata tampering, mismatched authorship, and anomalous revision history patterns.
④
Relationship Forensics
▼
Parses all OOXML .rels files to detect external relationships. Flags remote template injection (attachedTemplate), suspicious external links, IP-based URLs, UNC paths, and smb:// targets — all common initial-access techniques.
⑤
Embedded Payload Detection
▼
Scans document streams for embedded PE executables (MZ/PE headers), ELF binaries, shell scripts, nested archives, OLE Package objects (dropper technique), and PowerShell / certutil / mshta / regsvr32 LOLBin invocations.
Runs olevba (JSON + text fallback), mraptor, and pcodedmp. Extracts VBA source, compiled p-code bytecode (executes even with stripped source), auto-exec triggers, suspicious APIs (Shell, CreateObject, URLDownloadToFile), obfuscation indicators, and embedded IOCs.
Parses Excel 4.0 (XLM) macro sheets and DDE field references — a legacy attack vector still active in modern campaigns. Extracts formula chains and EXEC() / CALL() execution paths invisible to standard VBA scanners.
⑧
OLE2 Structure Analyzer
▼
Deep inspection of OLE compound document streams: storage tree reconstruction, directory entry enumeration, embedded PE / OLE objects, CLSID identification, sector chain integrity, and exploit-pattern byte sequences inside individual storage streams.
⑨
IOC & String Extraction
▼
Extracts all Indicators of Compromise from raw bytes and decoded streams: HTTP/HTTPS/FTP URLs, IPv4 addresses, domain names, email addresses, Windows file paths, registry key references, PowerShell fragments, Base64-encoded payloads, and embedded hash strings.
Runs the document through the ClamAV signature database — over 8 million malware signatures including Office macro exploits, macro droppers, and known weaponised document families. Detects encrypted documents and macro-carrying files.
Runs 12 curated YARA rules compiled at runtime: VBA auto-exec dropper, heavy obfuscation, XLM shell execution, DDE injection, template injection, shellcode NOP sled, LOLBin references, process injection, embedded PE, encoded PowerShell, suspicious CLSID, and external connection patterns.
Correlates file hashes (SHA-256, MD5, SHA-1) and extracted IOCs against an offline threat intelligence database: URLhaus malicious URLs, MalwareBazaar hash index, ThreatFox IOC feed, and FeodoTracker C2 IP list.
⑬
LibreOffice Behavioural
▼
Renders the document in LibreOffice headless mode under a timeout. Captures macro-load attempts logged to stderr, rendering failures that indicate corrupt/exploit structure, and documents that trigger errors only when opened — behavioural signals invisible to static analysis.
⑭
Isolation Chamber Detonation
▼
Opens the document inside a fully isolated Linux namespace (unshare --net --pid --fork --mount --ipc) while strace monitors every syscall. Detects network beacon attempts, process spawning, suspicious file writes, and LOLBin execution that only manifest at open-time.
⑮
Entropy & Compression Anomaly
▼
Computes Shannon entropy (bits/byte) for every stream and XML part inside OOXML and OLE2 containers. Flags encrypted or compressed blobs (≥7.2 bits/byte), unexpectedly high-entropy XML parts (≥6.5), and regions that evade static pattern matching through encoding. Images, fonts, and legitimate compressed streams are automatically exempted.
Validates Open Packaging Conventions (ECMA-376 Part 2) structural rules: [Content_Types].xml existence and parse integrity, all declared parts present in ZIP, no duplicate part names, well-formed .rels files, no path traversal in internal targets, and suspicious external targets (IP addresses, UNC paths, smb://, file://). Malformed OPC is a reliable indicator of intentional weaponisation or parser confusion attacks.
⑰
OOXML Schema Validator
▼
Checks every XML and VML part for: well-formedness violations that parsers silently recover from, XXE injection attempts (SYSTEM/PUBLIC DOCTYPE declarations), embedded null bytes used to terminate string comparisons, and oversized CDATA blobs (>50 KB) in unexpected document parts — a steganographic payload carrier technique.
⑱
Font & Theme Forensics
▼
Inspects embedded font files (TTF/OTF/EOT/WOFF), theme XML for suspicious content (remote URLs, UNC paths, LOLBin references), custom document properties with encoded payloads, custom XML data islands (hidden structured data stores), and external data connections (xl/connections.xml) — channels used to exfiltrate data or stage secondary payloads.
⑲
MIME / Transport Forensics
▼
Deep analysis of .eml and .msg email files: sender/Reply-To domain mismatch (spoofing indicator), SPF/DKIM/DMARC authentication failure headers, social engineering subject patterns, executable attachment detection (PE, scripts, LNK, HTA), embedded URL extraction, X-Mailer fingerprinting, and MSG binary parsing via extract-msg with raw fallback.
⑳
Digital Signature Forensics
▼
Extracts and analyses digital signatures from OOXML (_xmlsignatures/) and OLE2 (\x05DigitalSignature stream). Identifies signer identity, timestamp, and signing algorithm. Detects weak algorithms (SHA-1/MD5), unsigned VBA macros inside a signed document structure — a known bypass to make untrusted macros appear trusted to enterprise security controls.
㉑
NLP Social Engineering Classifier
▼
Two-tier detection of social engineering language in document text. Tier 1: fast regex matching across 5 categories — urgency, impersonation (CEO/IT/government), financial fraud (wire transfer, gift cards), credential harvesting, and security alert spoofing. Tier 2: Qwen 2.5 LLM classifies intent, technique, and target persona when regex patterns fire — providing explainable AI-based social engineering detection.
㉒
Intelligent Correlation Engine
▼
Cross-engine signal aggregation that identifies attack chains no single engine can see alone. Applies 10 correlation rules: DROPPER_CHAIN, C2_BEACON, TEMPLATE_INJECTION, ENCRYPTED_PAYLOAD, CREDENTIAL_THEFT_UNC, LIVING_OFF_LAND, TARGETED_ATTACK, SIGNATURE_BYPASS, HIGH_CONFIDENCE_MALWARE, and PARSER_CONFUSION — each with MITRE ATT&CK mapping and confidence scoring. Runs after all structural engines, before AI.
Qwen 2.5 3B analyses all engine findings and produces a structured verdict: MALICIOUS / SUSPICIOUS / LIKELY_BENIGN / CLEAN with confidence level, executive summary, attack chain narrative, MITRE ATT&CK techniques, and recommended actions. Runs last, after all other engines complete.