Home of: [Atelier "FUJIGURUMA"] >> [PageMixer hosted by SourceForge.net]

SEE "For Readers of English Version",
or Japanese version of this page

Combine efficiently

This section explains how to combine filters efficiently in PageMixer framework.

Overview

Class diagram

Class diagram in this section is shown below:

Class diagram
Class diagram (click for large figure)

Classes which you must define are colored, and other are already defined.

Object diagram

Object diagram in this section is shown below:

Object diagram
Object diagram (click for large figure)

Class names

In this tutorial, abbreviated class names are used. Complete names are shown below.

Classes of PageMixer framework

NotationFull name
ConsumerContext jp.ne.dti.lares.foozy.pagemixer.mixer.ConsumerContext
Filter jp.ne.dti.lares.foozy.pagemixer.mixer.Filter
FilterPipeline jp.ne.dti.lares.foozy.pagemixer.mixer.FilterPipeline
HTMLSymbolSet jp.ne.dti.lares.foozy.pagemixer.HTMLSymbolSet
SequenceHybridFilter jp.ne.dti.lares.foozy.pagemixer.mixer.SequenceHybridFilter
SequenceTrimFilter jp.ne.dti.lares.foozy.pagemixer.mixer.SequenceTrimFilter
SequenceWatcher jp.ne.dti.lares.foozy.pagemixer.mixer.SequenceWatcher
Token jp.ne.dti.lares.foozy.pagemixer.Token
TrimFilter jp.ne.dti.lares.foozy.pagemixer.mixer.TrimFilter

Tutorial specific classes

NotationFull name
AlreadyLoginFilter pagemixer.filter.AlreadyLoginFilter
Bootstrap pagemixer.filter.Bootstrap
Trim pagemixer.filter.AlreadyLoginFilter.Trim
UsernameAttrInsetFilter pagemixer.filter.UsernameTextInsetFilter

Motivation

You make all filters process all of Token sequence if you connect them each other and use them simply.

But, some of them have no interest on almost all of Token sequence. For example, filters for "Login" part of HTML page have interest only on "Login" part of Token sequence.

If you make 10 filters process sequence consists of 5,000 Tokens, amount of processing becomes 5,000 * 10 = 50,000. But if you make only 3 filters process all of sequence, and make the rest 7 filters process only 100 Tokens in which they have interest, amount of processing becomes (5,000 * 3) + (100 * 7) = 15,700, so, you can cut about 70% of processing cost down.

Keeping filters as "fine grain" means that required function consists of many filters. And so, processing cost reduction is needed.

Extends SequenceHybridFilter

To combine filters efficiently, PageMixer framework provides "SequenceHybridFilter" class.

It is constructed with SequenceWatcher and Filter. Specified Filter is only provided sub-sequence recognized by specified SequenceWatcher.

If you want to provide sub-sequence to some filters, you can do it by using FilterPipeline.

The filter to combine filters for "Login" part of HTML page efficiently is defined as below.


public class AlreadyLoginFilter
    extends SequenceHybridFilter
{
    final static
    private HTMLSymbolSet SET = HTMLSymbolSet.SET;

    final static
    private String ATTR_VALUE = "Auth-AlreadyLogin";

    ////////////////////////////////////////

    public AlreadyLoginFilter(Object keyLogin){
        super(new SequenceWatcher.NameAttr(SET.SPAN,
                                           SET.CLASS,
                                           ATTR_VALUE),
              create(keyLogin));
    }

    // non-static method is not good
    // for base class constructor invocation
    static
    private Filter createFilter(Object keyLogin){
        FilterPipeline pipeline = new FilterPipeline();

        pipeline.push(new UsernameTextInsetFilter(keyLogin));
        pipeline.push(new Trim(keyLogin));

        return pipeline;
    }

}

Extends SequenceHybridFilter

"AlreadyLoginFilter" provides Token sub-sequence, between "<span class="Auth-AlreadyLogin">" and "</span>" of it, to filters pushded into FilterPipeline in "create" method.

If sub-sequence described above consists only 10% of whole seuqnece, AlreadyLoginFilter processes 100% of whole sequence, but filters for "Login" part process only 10% of it. Then, (100 + 10 * 2)/(100 * 2) = 120/200 = 0.6, this class cuts about 40% processing cost off.

The more filters you use with HybridSequenceFilter, the more reduction ratio you can gain. For example, (100 + 10 * 10)/(100 * 10) = 200/1000 = 0.2, 80% processing cost is cut off when you use 10 filters.

UsernameTextInsetFilter is explained in "Inset data into sequence" section.

"Trim" is derived class of "TrimFilter" which is not explained yet. TrimFilter is filter to trim whole Token sequence given to it. Difference between SequenceTrimFilter.Whole and TrimFilter is only to trim sub-sequence recognized by SequenceWatcher, or whole sequence.

Connect and mix

Now, everything needed are ready to use. Execution code is as below (see AlreadyLoginFilter for detail).

try{
    // key to set/get user name
    final Object key = "Auth.Login";
    // user name
    final String name = "foozy";

    Bootstrap bootstrap =
    new Bootstrap.Default(filename)
    {
        protected void prepare(ConsumerContext context)
        {
            // "login" is whether put user name
            // into context or not
            if(login){
                // put user name into context
                context.setValue(key, name);
            }
        }
    };

    AlreadyLoginFilter filter =
    new AlreadyLoginFilter(key);

    // apply the filter
    bootstrap.execute(filter);
}
catch(Exception e){
    e.printStackTrace(System.err);
}
Connect and Mix

Sample HTML file as input is "auth.en.html" under "src/demo/servlet/war/WEB-INF/page/demosite" in distribution.


To next section "Pre-parse HTML page"