Summary
Avoid using eval() when traversing HTML.
html.escape only sanitises characters for HTML output; it does not protect against code injection when the escaped string is later fed to eval(). The safe approach is to treat the node identifier as data, not as executable code, and use a lookup table or a proper parser instead of eval.
Root Cause
eval()executes any Python expression contained in the string.html.escapeconverts<,>,&,"and'to HTML entities, but the resulting string is still a valid Python literal (e.g.,'a'→'a').- An attacker can supply a string such as
__import__('os').system('rm -rf /')which passesescapeunchanged and gets executed.
Why This Happens in Real Systems
- Developers treat identifiers or configuration values as code snippets.
- Convenience of
evalseems attractive for dynamic dispatch, but the boundary between data and code becomes blurred. - Lack of input validation and reliance on “HTML escaping” creates a false sense of security.
Real-World Impact
- Remote code execution (RCE) on the server running the scraper.
- Privilege escalation if the process runs with elevated rights.
- Data exfiltration or destruction of logs, backups, or other assets.
- Legal and compliance violations when compromised systems process user‑generated URLs.
Example or Code (if necessary and relevant)
# Unsafe version (original)
def discover_nodes(node, leaf_nodes=[]):
node_value = eval(escape(node)) # <-- dangerous
soup = BeautifulSoup(node_value, 'lxml')
...
# Safe version using a dictionary lookup
pages = {
'root': ' ',
'a': '',
'b': '',
'c': '',
'd': '',
}
def discover_nodes(node, leaf_nodes=None):
if leaf_nodes is None:
leaf_nodes = []
node_value = pages.get(node, '')
soup = BeautifulSoup(node_value, 'lxml')
links = soup.find_all('a')
if not links:
leaf_nodes.append(node)
return leaf_nodes
for link in links:
discover_nodes(link['href'], leaf_nodes)
return leaf_nodes
print(discover_nodes('root')) # ['a', 'd', 'c']
How Senior Engineers Fix It
- Eliminate
eval: Replace it with a deterministic data structure (dict, DB, cache). - Validate all external inputs against a whitelist of allowed node identifiers.
- Use typing and static analysis tools to catch unsafe dynamic execution.
- Log and monitor any unexpected identifiers before processing.
- If dynamic execution is truly required, sandbox the environment (e.g.,
ast.literal_evalfor literals only).
Why Juniors Miss It
- They often conflate escaping for HTML with escaping for code execution.
- Limited exposure to threat modeling leads to trusting
escapeas a universal sanitizer. - The convenience of
evalhides its risks, and junior developers may not be aware of the difference between data and code. - Lack of mentorship on secure coding patterns and code‑review feedback.