HTMLPurifier Truncating Output

So I had an interesting problem arise today where HTMLPurifier (http://htmlpurifier.org/) was truncating all of my output except that which was in between PHP tags. This was very strange behavior that I had not yet seen.

Here was my view:

<div class='groom_log_content'>
  <fieldset class='border-fields'>
    <legend class='bold'><?php echo $groom['content_name']; ?></legend>
    <p class='editable_textarea fix_space' id='<?php echo $groom['content_id']; ?>'><?php echo $this->cleaner->purify($content); ?></p>
  </fieldset>
</div>

The content that was being passed into this view was user submitted from a support chat that looked something like this:

- The customer posts their MySQL connection string in chat with obvious errors and the agent tells them it looks correct.
5:17:52am Name: <html>
<head>
<title>Connecting to MySQL with PHP</title>
</head>
<body>
<?php
$db_host = 'localhost';
$db_user = 'user';
$db_pass = 'pass';
$conn = mysql_connect('host', 'user', 'pass', 'db');
if(! $conn )
{
die('Could not connect: ' . mysql_error());
}
echo 'Connected successfully';
mysql_close($conn);
?>
</body>
</html>
5:18:18am Name: is that the correct information to input so i can locate the database
5:19:13am Name: That looks to be correct.
Suggestion: Look at this line, it is the important one: $conn = mysql_connect('host', 'user', 'pass', 'db');
Though you cannot diagnose the code, you could immediately correct two issues.
You must look for a valid cPanel database, database username, and address.
Here, neither the database name nor the database username would be simply 'user', and
if this is connecting to a database locally you would use localhost instead of the IP address.

With this input being passed to purifier the returned content was:

&lt;?php
$db_host = 'localhost';
$db_user = 'user';
$db_pass = 'pass';
$conn = mysql_connect('host', 'user', 'pass', 'db');
if(! $conn )
{
die('Could not connect: ' . mysql_error());
}
echo 'Connected successfully';
mysql_close($conn);
?&gt;

This was very peculiar to me because I have never seen HTMLPurifier strip content before and after PHP code. So I started my investigation there. After playing with htmlspecialchars() and stripping out the PHP code all together I was able to determine that the problem was with the actual HTML tags at the beginning and end.

HTMLPurifier does not support the <body>, <head>, and <html> tags so was just stripping out all of the content.

To resolve this I just stripped it out prior to passing it to the purifier:

$content = str_replace('<html>', '', $content);
$content = str_replace('<head>', '', $content);
$content = str_replace('<title>', '', $content);
$content = str_replace('<body>', '', $content);
$content = str_replace('</body>', '', $content);
$content = str_replace('</html>', '', $content);

 

 

Write a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.