Summary
The lxml python package is vulnerable to mXSS due to the use of improper parser. The parser used doesn’t imitate browsers, which causes different behaviors between the sanitizer and the user’s page. This can result in an arbitrary HTML/JS code execution.
Product
lxml from 1.2 up to 4.6.1
Impact
Using lxml as a sanitizer might not fulfill its purpose.
Steps to reproduce
1 | >>> from lxml.html.clean import clean_html |
Expected result:
<svg><style><img src=x onerror=alert(1)></style></svg>
And<noscript><style><a title="</noscript><img src=x onerror=alert(1)>"></style></noscript>
Remediation
Update lxml dependency to 4.6.2 or above.
Credit
This issue was discovered and reported by Checkmarx SCA Security Researcher Yaniv Nizry.