Command line usage

HtmlCleaner can be called from the command line with the following syntax:

    java -jar htmlcleanerXX.jar src = <url | file> [incharset = <charset>] 
                                [dest = <file>] [outcharset = <charset>] 
                                [taginfofile = <file>] [options...]
    
where options include:

    outputtype = simple | compact | browser-compact | pretty | 
                 htmlsimple | htmlcompact | htmlpretty
    advancedxmlescape = true | false
    transrescharstoncr = true | false
    usecdata = true | false
    specialentities = true | false
    transspecialentitiestoncr = true | false
    unicodechars = true | false
    omitunknowntags = true | false
    treatunknowntagsascontent = true | false
    omitdeprtags = true | false
    treatdeprtagsascontent = true | false
    omitcomments = true | false
    omitxmldecl = true | false
    omitdoctypedecl = true | false
    useemptyelementtags = true | false
    allowmultiwordattributes = true | false
    allowhtmlinsideattributes = true | false
    ignoreqe = true | false
    namespacesaware = true | false
    hyphenreplacement = <string value>
    prunetags = <string value>
    booleanatts = self | empty | true
    nodebyxpath = <xpath expression>
    omitenvelope = true | false
    t:<sourcetagX>[=<desttag>[,<preserveatts>]]
    t:<sourcetagX>.<destattrY>[=<template>]
    
Note: in order to make difference between URLs and files, URL's must begin with http:// or https://

Optional parameter taginfofile is path to XML file that contains description of all tags and tag dependencies. It will be used in cleaning process instead of default tag info set. See description file of default tag info set as reference.

Transformation parameters are prefixed with "t:". Transformations given in example would be described in command-line as: t:cfoutput t:c:block=div,false t:font=span,true t:font.size t:font.face t:font.style=${style};font-family=${face};font-size=${size};