Tuesday, January 17, 2006

 

Some notes on Tex and HTML

1.
Latex2Html
http://www.latex2html.org/

It is said to ``Bringing high quality documents to the Web''. However, it is not easy to change the style

2.
Embedding equations in HTML

From http://fauskes.net/nb/htmleqI/ http://fauskes.net/nb/htmleqII/

As a PhD student I need tools for typesetting mathematical equations. However, when writing for the web, I seldom use equations. This is mostly because I don't need them, but also because the lack of support for mathematics in HTML makes it a nuisance to write equations like for instance:
I'm currently working on a simple CMS for my web site. It is based on XHTML and PHP on the server and Python is used as a transformation tool and to glue the various components together. The use of Python gives incredible flexibility, so why not include a way to easily embed and display mathematical equations in my web pages?
In a two part article I will show you how to embed mathematics in your XHTML documents and generate beautiful equations. In the first part I will discuss various aspects concerning browsers and markup. In part two I will describe a practical solution to the problem, using Python, LaTeX and a handy little program called dvipng.
Investigating our options
For rendering mathematics in a browser there are three main options:
HTML and CSS.
MathML
Images
The first option is a good choice for simple equations. HTML has the sub> and sup> tags for subscripts and superscripts. However, writing fractions, Greek letters and other mathematical symbols is difficult.
MathML
According to the W3C, MathML is:
MathML is intended to facilitate the use and re-use of mathematical and scientific content on the Web, and for other applications such as computer algebra systems, print typesetting, and voice synthesis. MathML can be used to encode both the presentation of mathematical notation for high-quality visual display, and mathematical content, for applications where the semantics plays more of a key role such as scientific software or voice synthesis.
This sounds great. However, MathML is a low-level specification based on XML, and handwriting equations in MathML is a lot of work. Of more concern is the fact that, at the time of writing, the majority of browsers can not render MathML without downloading additional plug-ins.
Images
The safest way of displaying complex mathematics on the web is by using images. You first generate the equations in some program, and then save them as images for inclusion in your HTML document. It is not an optimal solution, but at the present time it seems to be the best choice. A good example of this method is the Mathworld site.
Markup
LaTeX has a simple and powerful syntax for writing mathematics. The TeX engine also produces esthetically pleasing and beautiful equations, so LaTeX is an obvious choice for writing mathematics. LaTeX also has an xml-friendly syntax, so LaTeX code can easily be included directly in the XHTML document.
To produce the equation
I want to write something as simple as: div class="eq">
y = \int_0^\infty \gamma^2 \cos(x) dx
/div>
Or for an inline equation like :bla bla y^2=x^2+\alpha^2 bla bla

This is the second part of my article on how to easily embed mathematics in XHTML documents using LaTeX. In part one I discussed various aspects concerning browsers and markup. In part two I'll get technical and show you how I have implemented a solution in my CMS.
Prerequisites
I have used the following software:
A LaTeX distribution. LaTeX is included in most Linux/Unix distributions. For Windows users I recommend MikTeX. MikTeX has very useful package manager that makes it very easy to add utilities and LaTeX packages.
dvipng for converting dvi files to png or gif. The program is included with the MikTeX distribution.
Python for gluing it all together. I also use the excellent XML-library ElementTree from effbot.org.
If you prefer, you can easily implement the techniques described in this article using your favorite language/tool. It should not be that difficult.
Note that I use XHTML for writing my documents. This allows me to process my documents as XML, which greatly simplifies the processing.
Outline of the process
In part one I decided to embed LaTeX code between ordinary div> and tags. The equations must then be extracted from the document and replaced by images. The process can be summarized in these steps:
Read source document and extract equations.
Generate equations and save them as images.
Insert generated images in the document
Publish document
I'll dwell a bit on item one and two. However, I'll let the source code do most of the talking.
Extracting the equations
Extracting the equations is quite easy with a proper XML/XHTML-library. Python has good support for HTML and XML in the standard library, but I'm a bit lazy and prefer to use a higher level library like ElementTree. With ElementTree, extracting the equations can be done in a few lines of code:from elementtree import ElementTree as et
source_filename = "test.xhtml"
# parse document
xhtmltree = et.parse(source_filename)
# find all elements with attribute class='eq'
eqs = [element for element in xmltree.getiterator()
if element.get('class','')=='eq']
# equations are now available in the eqs[..].text variable
ElementTree also handles the encoding issues for us, and we can easily change the markup.
Generating the equations
We now have a list of elements containing LaTeX code. In order to render the equations, we need to make a LaTeX document and compile it. The next problem is to save them as images. This is where dvipng comes to the rescue:
This program makes PNG and/or GIF graphics from DVI files as obtained from TeX and its relatives. It produces high-quality images while its internals are tuned for speed. It supports PK, VF, PostScript, and TrueType fonts, color and PostScript inclusion.
Dvipng is a command line utility with many options for tuning the output. See the documentation for a full list of features. It basically saves each page in a dvi document as an image, which means that we should put each equation on a page of its own.
The final code
Below is the listing of the final program. It's also available for download as eqhtml.py."""A simple tool for embedding LaTeX in XHTML documents.
This script lets you embed LaTeX code between div> and tags. Example:
div class="">
y = \int_0^\infty \gamma^2 \cos(x) dx
div>

An inline equation y^2=x^2+\alpha^2 here.


The script extracts the equations, creates a temporary LaTeX document,
compiles it, saves the equations as images and replaces the original markup
with images.
Usage:
python eqhtml.py source dest

Process source and save result in dest. Note that no error checking is
performed.
"""
from elementtree import ElementTree as et
import os, sys
# Include your favourite LaTeX packages and commands here
tex_preamble = r'''
\documentclass{article}
\usepackage{amsmath}
\usepackage{amsthm}
\usepackage{amssymb}
\usepackage{bm}
\newcommand{\mx}[1]{\mathbf{\bm{#1}}} % Matrix command
\newcommand{\vc}[1]{\mathbf{\bm{#1}}} % Vector command
\newcommand{\T}{\text{T}} % Transpose
\pagestyle{empty}
\begin{document}
'''
imgpath = '' # path to generated equations. e.q 'img/'
# get source and dest filenames from command line
sourcefn = sys.argv[1]
destfn = sys.argv[2]
sourcefn_base = os.path.splitext(os.path.basename(sourcefn))[0]
# change working directory to the same as source's
cwd = os.getcwd()
os.chdir(os.path.abspath(os.path.dirname(sourcefn)))
sourcefn = os.path.basename(sourcefn)
texfn = sourcefn_base+'.tex'
print "Processing %s" % sourcefn
# load and parse source document
f = open(sourcefn)
xhtmltree = et.parse(f)
f.close()
# find all elements with attribute class='eq'
eqs = [element for element in xhtmltree.getiterator()
if element.get('class','')=='eq']
# equations are now available in the eqs[..].text variable
# create a LaTeX document and insert equations
f = open(texfn,'w')
f.write(tex_preamble)
counter = 1
for eq in eqs:
if eq.tag == 'span': # inline equation
f.write("$%s$ \n \\newpage \n" % eq.text)
else:
f.write("\\[\n%s \n\\] \n \\newpage \n" % eq.text)
# delete LaTeX code from the document tree, and replace
# them by image urls.
del eq.text
imgname = "%seq%s%i.png" % (imgpath,sourcefn_base, counter)
et.SubElement(eq,'img',src=imgname, alt='')
counter += 1
# end LaTeX document
f.write('\end{document}')
f.close()
# compile LaTeX document. A DVI file is created
os.system('latex %s' % texfn)
# Run dvipng on the generated DVI file. Use tight bounding box.
# Magnification is set to 1200
cmd = "dvipng -T tight -x 1200 -z 9 -bg transparent " + "-o %seq%s%%d.png %s" % (imgpath , sourcefn_base, sourcefn_base)
os.system(cmd)
# Remove temporary files
os.remove(sourcefn_base+'.tex')
os.remove(sourcefn_base+'.log')
os.remove(sourcefn_base+'.aux')
os.remove(sourcefn_base+'.dvi')
os.chdir(cwd)
# Write processed source document to dest
xhtmltree.write(destfn)
print "Done."
Update: A.M. Kuchling has written a Movable Type plugin, called mt-math, for writing equations in weblog entries. The work is derived from my code. He has added som interesting features to it, such as storing the images in html using the data: URL scheme.
Display issues
A disadvantage with using bitmaps is that they don't scale with your document's text size. If you find the generated equations too large or too small, you can tweak the images with dvipng's -x magnification setting. The inline equations may also look a bit out of place. This issue can be fixed by adjusting the vertical-alignment property of the image with CSS. This is how I style my equations:/* Center block equations */
div.eq {text-align:center;}
/* Align inline equations with parents content area */
span.eq img{vertical-align:text-bottom;}
Concluding remarks
That's it. I can now have fancy equations in my web pages. The technique can also be extended to embed arbitrary LaTeX code, which allows to include EPS graphics and other (La)TeX goodies.

3.
MimeTeX
http://www.forkosh.com/mimetex.html

MimeTeX, licensed under the gpl, lets you easily embed LaTeX math in your html pages.

4.
TTH: the TEX to HTML translator
http://hutchinson.belmont.ma.us/tth/



<< Home

This page is powered by Blogger. Isn't yours?