MAP | PageMixer Documents > Tutorial > Mixing with PageMixer > Trim token | << | >> |
This section explains
how to eliminate HTML tag element from HTML page
in PageMixer framework.
Explanation uses the filter
which eliminate "<br>
" tags from HTML page.
before line break.<br>
after line break.
before line break. after line break.
In other words,
the filter tirms the tokens named as "br
" from sequence.
Class diagram in this section is shown below:
Classes which you must define are colored, and other are already defined.
Object diagram in this section is shown below:
Sequence diagram in this section is shown below:
In this tutorial, abbreviated class names are used. Complete names are shown below.
Notation | Full name |
---|---|
Consumer | jp.ne.dti.lares.foozy.pagemixer.mixer.Consumer |
ConsumerContext | jp.ne.dti.lares.foozy.pagemixer.mixer.ConsumerContext |
HTMLSymbolSet | jp.ne.dti.lares.foozy.pagemixer.HTMLSymbolSet |
Symbol | jp.ne.dti.lares.foozy.pagemixer.Symbol |
TokenEditFilter | jp.ne.dti.lares.foozy.pagemixer.mixer.TokenEditFilter |
TokenWatcher | jp.ne.dti.lares.foozy.pagemixer.mixer.TokenWatcher |
Notation | Full name |
---|---|
Bootstrap.Default | pagemixer.filter.Bootstrap.Default |
BrTrimFilter | pagemixer.filter.BrTrimFilter |
TokenWatcher
In PageMixer framework,
"processing something" and "finding targets of that" are
treated as different functions,
and for single token 'filterring',
these are implemented by "TokenEditFilter
"
and "TokenWatcher
".
So, at first,
create TokenWatcher
object
to tell TokenEditFilter
what token should be trimmed in sequence
(= "finding targets of that").
In this case,
to trim all token named as "br
",
"TokenWatcher.Name
" is used.
It is "Token Watcher watching at Name of it".
new TokenWatcher.Name(HTMLSymbolSet.SET.BR)
To create instance of TokenWatcher.Name
class,
you should specify "Symbol
" on construction,
and "HTMLSymbolSet
" is used
to get Symbol
in above example.
It is out of purpose of this tutorial
to explain what Symbol
is,
why Symbol
is used instead of String
,
or other detail about them.
In this tutorial,
it is enough that you know how to use HTMLSymbolSet
to get Symbol
objects
appropirated to well-known names of HTML tag and attribute.
Please see "Symbol comparison" for detail.
TokenEditFilter
Then,
define derived class from "TokenEditFilter
"
to trim token found by TokenWatcher
defined as above.
"Trimming token from sequence" is
to discard the trimmed token
without passing it to connected Consumer
,
so derived class is defined as below.
public class BrTrimFilter
extends TokenEditFilter
{
final static
private HTMLSymbolSet SET = HTMLSymbolSet.SET;
///////////////////////////////////////////
public BrTrimFilter(){
super(new TokenWatcher.Name(SET.BR));
}
///////////////////////////////////////////
// Concretization of class TokenEditFilter
protected void edit(ConsumerContext context,
Token token)
{
// NOTHING TO DO = DISCARD "BR" TOKEN
}
}
Now, everything needed are ready to use. Execution code is as below (see pagemixer.filter.BrTrimFilter for detail).
try{ Bootstrap bootstrap = new Bootstrap.Default(filename); // apply the filter bootstrap.execute(new BrTrimFilter()); } catch(Exception e){ e.printStackTrace(System.err); }
Sample HTML file as input is "index.en.html
"
under "src/demo/servlet/war/WEB-INF/page/demosite
"
in distribution.
MAP | PageMixer Documents > Tutorial > Mixing with PageMixer > Trim token | << | >> |