Format Your Own Damned Book Part IV -- Converting A Word Processor Document To XHTML

As an example of how to create a word processor document to an XHTML document usable in an e-book I created a Word document with a chapter heading, a paragraph, and a block quotation. I then saved it as HTML, Filtered using the Save As... menu option. I used the Filtered option because the other HTML option gives you tags that are not legal HTML. These would be ignored by a web browser but would not be allowed in an EPUB.

When you look at this file with Notepad you'll see this (portions removed for clarity):


<html>

<head>
<meta http-equiv=Content-Type content="text/html; charset=unicode">
<meta name=Generator content="Microsoft Word 14 (filtered)">
<style>
<!--
/* Font Definitions */
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin-top:0in;
margin-right:0in;
margin-bottom:10.0pt;
margin-left:0in;
line-height:115%;
font-size:11.0pt;
font-family:"Calibri","sans-serif";}
h1
{mso-style-link:"Heading 1 Char";
margin-top:24.0pt;
margin-right:0in;
margin-bottom:0in;
margin-left:0in;
margin-bottom:.0001pt;
line-height:115%;
font-size:14.0pt;
font-family:"Cambria","serif";
font-weight:bold;}

. . . LOTS more of these Styles . . .
	
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
-->
</style>

<meta name=created content="20161019;203949908339038">
<meta name=changed content="20161019;204919743246354">
</head>

<body lang=EN-US>

<div class=WordSection1>

<h1><span style='font-size:18.0pt;line-height:115%;
font-family:"Arial","sans-serif"'>Chapter One</span></h1>

<p class=MsoNormal>AMONG the priceless teachings that may
be found in the great Hindu poem of the Mahabharata, there is none
so rare and precious as this — "The Lord's Song." Since it
fell from the divine lips of Shri Krishna on the field of battle, and
stilled the surging emotions of his disciple and friend, how many
troubled hearts has it quieted and strengthened, how many weary
souls has it led to Him! It is meant to lift the aspirant from the lower
levels of renunciation, objects are renounced, to the loftier heights
where desires are dead, and where the Yogi dwells in ceaseless
contemplation, while his body and mind are actively employed in
discharging the duties that fall to his lot in life. That the spiritual man
need not be a recluse, that union with the divine Life may be achieved
and maintained in the midst of worldly affairs, that the obstacles to
that union lie not outside us but within us — such is the central
lesson of the BHAGAVAD-GITA. </p>

<blockquote style='margin-top:5.0pt;margin-bottom:5.0pt'>

<p class=MsoQuote>The Blessed Lord said: </p>

</blockquote>

<blockquote style='margin-top:5.0pt;margin-bottom:5.0pt'>

<p class=MsoQuote>Whence hath this dejection befallen thee
in this perilous strait, ignoble, heaven-closing, infamous, O Arjuna? </p>

</blockquote>

<blockquote style='margin-top:5.0pt;margin-bottom:5.0pt'>

<p class=MsoQuote>Yield not to impotence, Partha! it doth not
befit thee, Shake off this paltry faintheartedness! Stand up,
Parantapa!</p>

</blockquote>

</div>

</body>

</html>


You need to clean this up before you can use it in an EPUB. The first thing you need to do is get rid of the entire <head> section. This contains styles, which can be used in EPUBs, but we're going to replace these with something much simpler so the original styles need to go.

The other things that we want to remove are all in the format name=value like the ones shown in bold below:

<body lang=EN-US>

<div class=WordSection1>

<h1><span style='font-size:18.0pt;line-height:115%;font-family:"Arial","sans-serif"'>Chapter
One</span></h1>

The tags <div> and <span> don't do anything after that, so they may be removed.

Removing these this gives us this:


<html>
<body>
<h1>Chapter One</h1>

<p>AMONG the priceless teachings that may be found in the
great Hindu poem of the Mahabharata, there is none so rare and
precious as this — "The Lord's Song." Since it fell from
the divine lips of Shri Krishna on the field of battle, and stilled the
surging emotions of his disciple and friend, how many troubled
hearts has it quieted and strengthened, how many weary souls has
it led to Him! It is meant to lift the aspirant from the lower levels of
renunciation, where objects are renounced, to the loftier heights
where desires are dead, and where the Yogi dwells in ceaseless
contemplation, while his body and mind are actively employed in
discharging the duties that fall to his lot in life. That the spiritual
man need not be a recluse, that union with the divine Life may
be achieved and maintained in the midst of worldly affairs,
that the obstacles to that union lie not outside us but within us
— such is the central lesson of the BHAGAVAD-GITA. </p>

<blockquote>
<p>The Blessed Lord said: </p>
</blockquote>

<blockquote>
<p>Whence hath this dejection befallen thee in this perilous
strait, ignoble, heaven-closing, infamous, O Arjuna? </p>
</blockquote>

<blockquote>
<p>Yield not to impotence, Partha! it doth not befit thee,
Shake off this paltry faintheartedness! Stand up,
Parantapa!</p>
</blockquote>

</body>
</html>


This document only contains the most basic HTML tags and no Style information and is now ready to be imported into an EPUB using Sigil. We will demonstrate this in the next installment.
 •  0 comments  •  flag
Share on Twitter
Published on October 20, 2016 08:54
No comments have been added yet.


Bhakta Jim's Bhagavatam Class

Bhakta Jim
If I have any regrets about leaving the Hare Krishna movement it might be that I never got to give a morning Bhagavatam class. You need to be an initiated devotee to do that and I got out before that ...more
Follow Bhakta Jim's blog with rss.