I am trying to convert a number of Microsoft Word documents (.doc or .docx files) to e-books. I have tried saving the documents as .RTF files, and also as "filtered HTML" files. With either format, converting to EPUB (using Calibre 0.8.45) generates a readable file, but with significant errors.
The errors seem to be related to changes that are applied to the major headings. In the original HTML file, a heading looks like:
<h1><a name="_Toc244191671"></a>New York Bumper Stickers</h1>
However, in the HTML file in the Debug\Parsed folder, the heading has been changed to:
<h1><a name="<i>Toc244191671></a>New York Bumper Stickers</h1>
As a result:
In other cases, a very similar heading changes from:
<h1><a name="_Toc244191667"></a><a name="_Toc104266959">You know youre from </a>Jersey when </h1>
to:
<h1 style="margin-top:1em;margin-bottom:1em;"><a name="<i>Toc244191667></a><a name=">Toc104266959>You know youre from </a>Jersey when </h1>
In this case, the text of the anchor is visible, but, because the </h1> is not corrupted, the subsequent text is properly formatted.
Can anybody tell me why some of the HTML tags are being corrupted? Also, why is the text:
<i>
being inserted before the text of the anchor name attribute?
The errors seem to be related to changes that are applied to the major headings. In the original HTML file, a heading looks like:
<h1><a name="_Toc244191671"></a>New York Bumper Stickers</h1>
However, in the HTML file in the Debug\Parsed folder, the heading has been changed to:
<h1><a name="<i>Toc244191671></a>New York Bumper Stickers</h1>
As a result:
- The paragraphs that follow the <h1> are appearing in the large font of the <h1>
- The reference from the <a> is appearing as text
- The text of the <h1> is not visible
In other cases, a very similar heading changes from:
<h1><a name="_Toc244191667"></a><a name="_Toc104266959">You know youre from </a>Jersey when </h1>
to:
<h1 style="margin-top:1em;margin-bottom:1em;"><a name="<i>Toc244191667></a><a name=">Toc104266959>You know youre from </a>Jersey when </h1>
In this case, the text of the anchor is visible, but, because the </h1> is not corrupted, the subsequent text is properly formatted.
Can anybody tell me why some of the HTML tags are being corrupted? Also, why is the text:
<i>
being inserted before the text of the anchor name attribute?