Command line usage
HtmlCleaner can be called from the command line with the following syntax:
java -jar htmlcleanerXX.jar [src = <url | file>] [incharset = <charset>] [dest = <file>] [outcharset = <charset>] [taginfofile = <file>] [options...]where options include:
outputtype = simple | compact | browser-compact | pretty | htmlsimple | htmlcompact | htmlpretty advancedxmlescape = true | false transrescharstoncr = true | false usecdata = true | false usecdatafor = ["script,style"] specialentities = true | false transspecialentitiestoncr = true | false unicodechars = true | false omitunknowntags = true | false treatunknowntagsascontent = true | false omitdeprtags = true | false treatdeprtagsascontent = true | false omitcomments = true | false omitxmldecl = true | false omitdoctypedecl = true | false useemptyelementtags = true | false allowmultiwordattributes = true | false allowhtmlinsideattributes = true | false ignoreqe = true | false namespacesaware = true | false hyphenreplacement = <string value> prunetags = <string value> booleanatts = self | empty | true nodebyxpath = <xpath expression> omitenvelope = true | false allowinvalidattributenames = true | false invalidattributenameprefix [""] t:<sourcetagX>[=<desttag>[,<preserveatts>]] t:<sourcetagX>.<destattrY>[=<template>]Note: in order to make difference between URLs and files, URL's must begin with http:// or https://
Pipelines and stdin
As of version 2.11, the src
parameter is optional, as you can instead send data directly from stdin. For example:
curl http://google.com | java -jar htmlcleaner-2.11.jar > cleaned.html
TagInfo providers
Optional parameter taginfofile
is path to XML file that contains description of all
tags and tag dependencies. It will be used in cleaning process instead of default tag info set.
See description file of default tag info set as reference.
Quiet Mode
As of version 2.9, you can also supply the --quiet option to reduce the amount of log output HtmlCleaner produces.
Transformations
Transformation parameters are prefixed with "t:"
. Transformations given in
example would be described in command-line as:
t:cfoutput t:c:block=div,false t:font=span,true t:font.size t:font.face t:font.style=${style};font-family=${face};font-size=${size};