Web Site Content Management - Publish to PDF

It is quite complex to go directly from a html website to PDF so I am proposing to do this in stages:

I also considered loading the website into Word or OpenOffice and then converting to XSL-FO although this would require more manual intervention. The native Open Office format is an XML file which could be converted to XSL-FO using an XSLT script.

Website HTML to XHTML

This feature requires the Kontent to read each html file being managed (every html file under the current directory), to convert content as described here:

http://www22.brinkster.com/beeandnee/techzone/articles/htmltoxhtml.asp

Then to write the file to disc as a xhtml.

The program already has code which reads html (using SAX) to get the title for the index generator. So this would require a new tab similar to the others but works as discribed here.

If it is too hard to handle CSS that could be left for a future stage.

If <p> is found then the program needs to check for a closing </p> and if not found it needs to be inserted before the next format.

The new files should be put in a new directory tree with the same structure as the original files. All html (and htm) files are converted to xhtml, other files such as .gif and .jpeg are copied into the new directory tree unchanged.

node.convertToXhtml
nodeDir.convertToXhtml
nodeHTML.convertToXhtml

convert XHTML to XSL-FO

Here is a XSLT script to do the convertion http://www.antennahouse.com/XSLsample/XSLsample.htm

This http://www-106.ibm.com/developerworks/library/x-xslfo2app/ explans the issues.

XSL-FO to PDF

see http://xml.apache.org/fop/

 


metadata block
see also:

 

Correspondence about this page

This site may have errors. Don't use for critical systems.

Copyright (c) 1998-2023 Martin John Baker - All rights reserved - privacy policy.