Quantcast
Channel: MobileRead Forums - Calibre
Viewing all articles
Browse latest Browse all 31497

lxml.etree._utf8 crash

$
0
0
This is probably a question for Kovid.

I'm getting a trap in lxml.etree._utf8 with the message "ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters"

With recursions=0 and simultaneous downloads=1 this crashes ebook-convert with the following traceback

Code:

Python function terminated unexpectedly
  All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters (Error Code: 1)
Traceback (most recent call last):
  File "site.py", line 132, in main
  File "site.py", line 109, in run_entry_point
  File "site-packages\calibre\ebooks\conversion\cli.py", line 325, in main
  File "site-packages\calibre\ebooks\conversion\plumber.py", line 979, in run
  File "site-packages\calibre\customize\conversion.py", line 208, in __call__
  File "site-packages\calibre\ebooks\conversion\plugins\recipe_input.py", line 105, in convert
  File "site-packages\calibre\web\feeds\news.py", line 881, in download
  File "site-packages\calibre\web\feeds\news.py", line 1130, in build_index
  File "site-packages\calibre\web\feeds\news.py", line 974, in feed2index
  File "site-packages\calibre\web\feeds\templates.py", line 43, in generate
  File "site-packages\calibre\web\feeds\templates.py", line 177, in _generate
  File "site-packages\lxml\builder.py", line 222, in __call__
  File "site-packages\lxml\builder.py", line 185, in add_text
  File "lxml.etree.pyx", line 916, in lxml.etree._Element.text.__set__ (src/lxml/lxml.etree.c:36134)
  File "apihelpers.pxi", line 721, in lxml.etree._setNodeText (src/lxml/lxml.etree.c:17141)
  File "apihelpers.pxi", line 1366, in lxml.etree._utf8 (src/lxml/lxml.etree.c:22211)
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters

With recursion set to 1 and simultaneous_downloads left to the default the ebook-convert application doesn't crash, but the following traceback does appear, indicating a subprpcess of the main ebook_convert process crashed

Code:

Parsing feed_1/article_4/index.html as HTML
HTML 5 parsing failed, falling back to older parsers
Traceback (most recent call last):
  File "site-packages\calibre\ebooks\oeb\parse_utils.py", line 259, in parse_html
  File "site-packages\calibre\ebooks\oeb\parse_utils.py", line 86, in html5_parse
  File "site-packages\html5lib\html5parser.py", line 38, in parse
  File "site-packages\html5lib\html5parser.py", line 211, in parse
  File "site-packages\html5lib\html5parser.py", line 111, in _parse
  File "site-packages\html5lib\html5parser.py", line 179, in mainLoop
  File "site-packages\html5lib\html5parser.py", line 447, in processStartTag
  File "site-packages\html5lib\html5parser.py", line 725, in startTagMeta
  File "site-packages\html5lib\treebuilders\_base.py", line 259, in insertElementNormal
  File "site-packages\html5lib\treebuilders\etree_lxml.py", line 219, in _setAttributes
  File "site-packages\html5lib\treebuilders\etree_lxml.py", line 189, in __init__
  File "lxml.etree.pyx", line 2145, in lxml.etree._Attrib.__setitem__ (src/lxml/lxml.etree.c:46818)
  File "apihelpers.pxi", line 563, in lxml.etree._setAttributeValue (src/lxml/lxml.etree.c:15781)
  File "apihelpers.pxi", line 1366, in lxml.etree._utf8 (src/lxml/lxml.etree.c:22211)
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters

In that case, feed_1/article_4/index.html is sitting in the debug-pipeline directories looking happy as a clam, so I'm not sure what is going on here.

I've looked at the calibre source at http://bazaar.launchpad.net/~kovid/calibre/trunk/files and the line numbers in the tracebacks don't seem to line up so I'm at a loss here.

My question: what is causing this and could calibre be made a little more bulletproof here?

Viewing all articles
Browse latest Browse all 31497

Trending Articles