| MAP | PageMixer Documents > Tutorial > Mixing with PageMixer > Trim token | << | >> |
This section explains
how to eliminate HTML tag element from HTML page
in PageMixer framework.
Explanation uses the filter
which eliminate "<br>" tags from HTML page.
before line break.<br>
after line break.
before line break. after line break.
In other words,
the filter tirms the tokens named as "br" from sequence.
Class diagram in this section is shown below:
Classes which you must define are colored, and other are already defined.
Object diagram in this section is shown below:
Sequence diagram in this section is shown below:
In this tutorial, abbreviated class names are used. Complete names are shown below.
| Notation | Full name |
|---|---|
| Consumer | jp.ne.dti.lares.foozy.pagemixer.mixer.Consumer |
| ConsumerContext | jp.ne.dti.lares.foozy.pagemixer.mixer.ConsumerContext |
| HTMLSymbolSet | jp.ne.dti.lares.foozy.pagemixer.HTMLSymbolSet |
| Symbol | jp.ne.dti.lares.foozy.pagemixer.Symbol |
| TokenEditFilter | jp.ne.dti.lares.foozy.pagemixer.mixer.TokenEditFilter |
| TokenWatcher | jp.ne.dti.lares.foozy.pagemixer.mixer.TokenWatcher |
| Notation | Full name |
|---|---|
| Bootstrap.Default | pagemixer.filter.Bootstrap.Default |
| BrTrimFilter | pagemixer.filter.BrTrimFilter |
TokenWatcherIn PageMixer framework,
"processing something" and "finding targets of that" are
treated as different functions,
and for single token 'filterring',
these are implemented by "TokenEditFilter"
and "TokenWatcher".
So, at first,
create TokenWatcher object
to tell TokenEditFilter
what token should be trimmed in sequence
(= "finding targets of that").
In this case,
to trim all token named as "br",
"TokenWatcher.Name" is used.
It is "Token Watcher watching at Name of it".
new TokenWatcher.Name(HTMLSymbolSet.SET.BR)
TokenWatcher to trim Token named
"br"To create instance of TokenWatcher.Name class,
you should specify "Symbol" on construction,
and "HTMLSymbolSet" is used
to get Symbol in above example.
It is out of purpose of this tutorial
to explain what Symbol is,
why Symbol is used instead of String,
or other detail about them.
In this tutorial,
it is enough that you know how to use HTMLSymbolSet
to get Symbol objects
appropirated to well-known names of HTML tag and attribute.
Please see "Symbol comparison" for detail.
TokenEditFilterThen,
define derived class from "TokenEditFilter"
to trim token found by TokenWatcher defined as above.
"Trimming token from sequence" is
to discard the trimmed token
without passing it to connected Consumer,
so derived class is defined as below.
public class BrTrimFilter
extends TokenEditFilter
{
final static
private HTMLSymbolSet SET = HTMLSymbolSet.SET;
///////////////////////////////////////////
public BrTrimFilter(){
super(new TokenWatcher.Name(SET.BR));
}
///////////////////////////////////////////
// Concretization of class TokenEditFilter
protected void edit(ConsumerContext context,
Token token)
{
// NOTHING TO DO = DISCARD "BR" TOKEN
}
}
TokenEditFilter
Now, everything needed are ready to use. Execution code is as below (see pagemixer.filter.BrTrimFilter for detail).
try{
Bootstrap bootstrap =
new Bootstrap.Default(filename);
// apply the filter
bootstrap.execute(new BrTrimFilter());
}
catch(Exception e){
e.printStackTrace(System.err);
}
Sample HTML file as input is "index.en.html"
under "src/demo/servlet/war/WEB-INF/page/demosite"
in distribution.
| MAP | PageMixer Documents > Tutorial > Mixing with PageMixer > Trim token | << | >> |