Metadata-Version: 1.1
Name: collective.soupstrainer
Version: 2.2
Summary: Clean up HTML using BeautifulSoup and filter rules.
Home-page: https://github.com/collective/collective.soupstrainer
Author: Florian Schulze
Author-email: florian.schulze@gmx.net
License: GPLv2+
Description: collective.soupstrainer
        =======================
        
        
        Quite often there is a need to clean up HTML from some source, be it user
        input or data gathered by scraping, which needs to be cleaned up. With the
        SoupStrainer class in collective.soupstrainer this is made easy. It uses
        beautifulsoup4 to parse and clean up HTML. The constructor of the class takes
        four arguments.
        
        exclusions
            This is a list of tuples with two items each. The first item is a list of
            tag names, the second item is a list of attributes. If the list of
            attributes is empty, then each tag in the first list is completely
            removed from the passed in HTML. If the list of tags is empty, then each
            attribute listed is completely removed. If there are both tags and
            attributes listed, then the attributes are only removed from matching
            tags.
        
        style_whitelist
            This is a white list of CSS styles allowed in 'style' attributes. All
            other styles are removed.
        
        class_blacklist
            This is a black list for CSS classes. Each matching class is removed from
            'class' attributes.
        
        parser
            This is the parser used by beautifulsoup4, when the strainer is called with
            a string. It must be an installed parser for beautifulsoup4, defaults to
            ``html.parser``
        
        An instance of the SoupStrainer class can be called directly with one
        argument. The argument can either be a string, in which case it will
        internally be parsed by beautifulsoup4 and the result will be unicode (or 
        string in python 3), or it can be a parsed HTML tree created by beautifulsoup4,
        in which case it will be modified in place and be returned again.
        
        Changelog
        =========
        
        2.2 (2021-03-25)
        ----------------
        
        - Do not stop after the first replace of a tag which is to be excluded.
          (`#8 <https://github.com/collective/collective.soupstrainer/issues/8>`_)
        
        - Add support for Python 3.8 and 3.9.
        
        
        2.1 (2019-02-06)
        ----------------
        
        - Add support for Python 3 and PyPy.
        
        
        2.0 (2017-10-19)
        ----------------
        
        Backwards incompatible changes
        ++++++++++++++++++++++++++++++
        
        * Update to beautifulsoup4.
        
        * Add a parameter ``parser`` to ``SoupStrainer`` which specifies the parser
          used by beautifulsoup4.
        
        
        1.0 - 2008-11-14
        ----------------
        
        * Initial release
        
        
Keywords: html beautifulsoup clean filter rules
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Web Environment
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v2 or later (GPLv2+)
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Filters
Classifier: Topic :: Text Processing :: Markup :: HTML
Classifier: Topic :: Utilities
