Monday, May 02, 2005

The Sgmlop Parser/Tokenizer ::: www.effbot.org

The Sgmlop Parser/Tokenizer ::: www.effbot.org
The sgmlop module is a fast replacement for the regular expression-based parsers used in the sgmllib/htmllib and xmllib module. A single module supports both SGML and XML.

The sgmlop parser is tolerant, and happily accepts XML-like data that are not well-formed. If you need strictness, use another parser. sgmlop is an excellent choice for applications that read human-authored content and wants to be fairly tolerant, and also for applications that read machine-generated XML in situations where it's safe to trade standard compliancy for speed.

The current release is about 6 times faster than the original re-based implementation provided with Python 1.5, when using an xmllib/sgmllib-style interface. When using sgmlop directly, it can be more than 30 times faster.

and it's being used for speedup in the python xmlrpclib if available!!