Updated: 11/14/2007


This class can be used to remove unwanted tags and data from HTML document. It takes a string with the HTML document to clean and parses it assuming a given character set encoding. The class can perform several types of clean-up operations like: - Removing style definitions - Remove tags or attributes based on white lists or blacklists - Use the HTML tidy extension to clean the document and format the output as XHTML and drop proprietary attributes from Microsoft Word HTML documents - Drop empty paragraphs - Remove needless white space - Fill empty table cells