Home of: [Atelier "FUJIGURUMA"] >> [PageMixer hosted by SourceForge.net]

SEE "For Readers of English Version",
or Japanese version of this page

Trim token

This section explains how to eliminate HTML tag element from HTML page in PageMixer framework. Explanation uses the filter which eliminate "<br>" tags from HTML page.

before line break.<br>
after line break.

Before processing

before line break.
after line break.

After processing

In other words, the filter tirms the tokens named as "br" from sequence.


Class diagram

Class diagram in this section is shown below:

Class diagram
Class diagram (click for large figure)

Classes which you must define are colored, and other are already defined.

Object diagram

Object diagram in this section is shown below:

Object diagram
Object diagram (click for large figure)

Sequence diagram

Sequence diagram in this section is shown below:

Sequence diagram
Sequence diagram (click for large figure)

Classe names

In this tutorial, abbreviated class names are used. Complete names are shown below.

Classes of PageMixer framework

NotationFull name
Consumer jp.ne.dti.lares.foozy.pagemixer.mixer.Consumer
ConsumerContext jp.ne.dti.lares.foozy.pagemixer.mixer.ConsumerContext
HTMLSymbolSet jp.ne.dti.lares.foozy.pagemixer.HTMLSymbolSet
Symbol jp.ne.dti.lares.foozy.pagemixer.Symbol
TokenEditFilter jp.ne.dti.lares.foozy.pagemixer.mixer.TokenEditFilter
TokenWatcher jp.ne.dti.lares.foozy.pagemixer.mixer.TokenWatcher

Tutorial specific classes

NotationFull name
Bootstrap.Default pagemixer.filter.Bootstrap.Default
BrTrimFilter pagemixer.filter.BrTrimFilter

Create TokenWatcher

In PageMixer framework, "processing something" and "finding targets of that" are treated as different functions, and for single token 'filterring', these are implemented by "TokenEditFilter" and "TokenWatcher".

So, at first, create TokenWatcher object to tell TokenEditFilter what token should be trimmed in sequence (= "finding targets of that").

In this case, to trim all token named as "br", "TokenWatcher.Name" is used. It is "Token Watcher watching at Name of it".

new TokenWatcher.Name(HTMLSymbolSet.SET.BR)

TokenWatcher to trim Token named "br"

To create instance of TokenWatcher.Name class, you should specify "Symbol" on construction, and "HTMLSymbolSet" is used to get Symbol in above example.

It is out of purpose of this tutorial to explain what Symbol is, why Symbol is used instead of String, or other detail about them.

In this tutorial, it is enough that you know how to use HTMLSymbolSet to get Symbol objects appropirated to well-known names of HTML tag and attribute. Please see "Symbol comparison" for detail.

Concretize TokenEditFilter

Then, define derived class from "TokenEditFilter" to trim token found by TokenWatcher defined as above.

"Trimming token from sequence" is to discard the trimmed token without passing it to connected Consumer, so derived class is defined as below.

public class BrTrimFilter
    extends TokenEditFilter
    final static
    private HTMLSymbolSet SET = HTMLSymbolSet.SET;


    public BrTrimFilter(){
        super(new TokenWatcher.Name(SET.BR));

    // Concretization of class TokenEditFilter

    protected void edit(ConsumerContext context,
                        Token token)
Derived class from TokenEditFilter

Connect and mix

Now, everything needed are ready to use. Execution code is as below (see pagemixer.filter.BrTrimFilter for detail).

    Bootstrap bootstrap =
    new Bootstrap.Default(filename);

    // apply the filter
    bootstrap.execute(new BrTrimFilter());
catch(Exception e){
Connect and Mix

Sample HTML file as input is "index.en.html" under "src/demo/servlet/war/WEB-INF/page/demosite" in distribution.

To next section "Inset data into token"