Command line usage
HtmlCleaner can be called from the command line with the following syntax:
java -jar htmlcleanerXX.jar src = <url | file> [incharset = <charset>]
[dest = <file>] [outcharset = <charset>]
[taginfofile = <file>] [options...]
where options include:
outputtype = simple | compact | browser-compact | pretty
advancedxmlescape = true | false
usecdata = true | false
specialentities = true | false
unicodechars = true | false
omitunknowntags = true | false
treatunknowntagsascontent = true | false
omitdeprtags = true | false
treatdeprtagsascontent = true | false
omitcomments = true | false
omitxmldecl = true | false
omitdoctypedecl = true | false
omithtmlenvelope = true | false
useemptyelementtags = true | false
allowmultiwordattributes = true | false
allowhtmlinsideattributes = true | false
ignoreqe = true | false
namespacesaware = true | false
hyphenreplacement = <string value>
prunetags = <string value>
booleanatts = self | empty | true
nodebyxpath = <xpath expression>
t:<sourcetagX>[=<desttag>[,<preserveatts>]]
t:<sourcetagX>.<destattrY>[=<template>]
Note: in order to make difference between URLs and files,
URL's must begin with http:// or https://
Optional parameter taginfofile is path to XML file that contains description of all
tags and tag dependencies. It will be used in cleaning process instead of default tag info set.
See description file of default tag info set as reference.
Transformation parameters are prefixed with "t:". Transformations given in
example would be described in command-line as:
t:cfoutput t:c:block=div,false t:font=span,true t:font.size t:font.face t:font.style=${style};font-family=${face};font-size=${size};

