26

I have read in different sites about different ways to bypass XSS sanitizers, like double encoding special characters such as /<>:"', or employing different encoding schemes.

However most of these sites do not explain why these attacks may work and in which cases. For instance I know modern browsers URL-encode special characters, while cURL doesn't do that; thus you cant create a PoC using cURL (or Burp Proxy).

Can someone give a very detailed explanation on how browsers encode and handle input data (both POST and GET) and how a typical (PHP) web application handles these? From my little experience in PHP (while I was in university) I was just accessing parameters like $_POST["query"] without even thinking about encoding, security practises, etc.

3
  • 2
    cURL won't automatically URL-encode, but nothing stops you from doing it manually. When you access the parameters on the server side, they are already decoded in most cases. Commented May 8, 2018 at 17:54
  • The differnce between browsers and cURL is a separate issue from how a double encoding attack works.
    – Anders
    Commented May 8, 2018 at 18:54
  • alf.nu/alert1 Play the game, be amazed. If you can get at least to level 10, you'll probably realize how terrifyingly hard XSS protection is :) Many of the tasks require you to exploit double encoding, and it's pretty scary.
    – Luaan
    Commented May 9, 2018 at 7:57

1 Answer 1

35

Lets look at this example payload (A), encoded once (B) and twice (C):

A. <script> alert(1) </script>
B. %3Cscript%3E alert(1) %3C%2Fscript%3E
C. %253Cscript%253E alert(1) %253C%252Fscript%253E

Double encoding can be used to bypass XSS filters when different parts of the applicaition makes different assumptions about if a variable is encoded or not. For instance consider the following vulnerable code:

$input = htmlentities($_GET["query"]);
echo urldecode($input);

This code would block the payload if it was just single encoded (as in B). PHP URL decodes your GET variables for you by default (turning B into A), so < and > would be passed to htmlentities that neutralizes them. However, if you instead send in C, it would be URL decoded to B that would pass through htmlentities unchanged. Since it is URL decoded again before it is echoed, it turns into the dangerous payload A.

So the bug here is that there is another layer of URL decoding after the XSS filter. When the two lines are next to each other like this, the problem is quite obvious. But these two things can be in separate modules making it hard to detect. Since it's hard to keep track of what strings are URL encoded, it is tempting to just throw in an extra decoding to be sure - after all, it usually doesn't affect unencoded data.

The PHP manual actually warns about this:

Warning: The superglobals $_GET and $_REQUEST are already decoded. Using urldecode() on an element in $_GET or $_REQUEST could have unexpected and dangerous results.

In my opinion the manual is not cautious enough here - decoding any untrusted data after filtering for XSS is dangerous, no matter where it comes from. Be extremely careful with modifying your data after you have filtered it!

For more reading, see OWASP.

5
  • Clearly simply decode all data in a loop until it reaches a steady state, then sanitize, then repeat until that reaches a steady state. ;)
    – Yakk
    Commented May 8, 2018 at 20:17
  • 18
    @Yakk TODO: find an encoding where this process does not terminate :-)
    – Bergi
    Commented May 8, 2018 at 20:39
  • or just don't include % in your character whitelist Commented May 9, 2018 at 3:28
  • 2
    @Bergi I feel like this might be a good challenge for CodeGolf.SE Commented May 9, 2018 at 9:40
  • Since URL decoding will either leave the string unchanged or decrease its leangth, there is no such payload. But I like @Bergi's instincts. :-)
    – Anders
    Commented May 9, 2018 at 9:44

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .