Home of: [Atelier "FUJIGURUMA"] >> [PageMixer hosted by SourceForge.net]

SEE "For Readers of English Version",
or Japanese version of this page

Pre-parse HTML page

This section explains how to pre-parse HTML page in PageMixer framework.


Class names

In this tutorial, abbreviated class names are used. Complete names are shown below.

Classes of PageMixer framework

NotationFull name
ConsumerContext jp.ne.dti.lares.foozy.pagemixer.mixer.ConsumerContext
DataProvider jp.ne.dti.lares.foozy.pagemixer.mixer.DataProvider
PageParser jp.ne.dti.lares.foozy.pagemixer.mixer.PageParser
PersistentProducer jp.ne.dti.lares.foozy.pagemixer.mixer.PersistentProducer
Producer jp.ne.dti.lares.foozy.pagemixer.mixer.Producer

Tutorial specific classes

NotationFull name
BasketFilter pagemixer.filter.BasketFilter
Bootstrap pagemixer.filter.Bootstrap


In former sections of this tutorial, sample programs parse HTML page by PageParser at execution time.

But this on-demand parsing may causes:

Or, Producer instance may be created programatically.

PageMixer framework only needs Producer instance, not HTML page. So, direct getting of Producer is needed.

Use PersistentProducer

PageMixer framework provides "PersistentProducer" to serialize/de-serialize Producer into/from external file.

Serialization code of Producer is described as below.

    PersistentProducer persistent =
    new PersistentProducer();

    persistent.write(filename, producer);
catch(IOException e){

Serialize by PersistentProducer

In fact, by "PageParser" class, you can write Producer generated from specified HTML page out directly. See detail about PageParser.

Then, de-serialization code of Producer is implemented as below.

    PersistentProducer persistent =
    new PersistentProducer();

    Producer producer = persistent.read(filename);
catch(IOException e){
catch(ClassNotFoundException e){

De-serialize by PersistentProducer

In fact again, Bootstrap provides utility method to de-serialize Producer, as "readinProducer(String)", you can de-serialize easily in tutorial environment.

Connect and mix

Now, everything needed are ready to use.

At first, invoke PageParser#main to serialize Producer created from specified HTML page to specified file.

Then, de-serialize Producer from file and produce token sequence as below (see pagemixer.filter.PreparseSample for detail).

    final Object providerkey =

    final Object dataKey =

    Bootstrap bootstrap = new Bootstrap()
        protected Producer createProducer()
            throws IOException, //
                   ClassNotFoundException //
            return readinProducer(filename);

        protected void prepare(ConsumerContext context)
            List entryList = BasketEntry.getEntryList();

            ListDataProvider provider =
            new ListDataProvider(entryList);

            context.setValue(providerKey, provider);

    BasketFilter filter =
    new BasketFilter(providerKey, dataKey);

catch(Exception e){

Connect and Mix

Sample HTML file as input is "basket.en.html" under "src/demo/servlet/war/WEB-INF/page/demosite" in distribution.

About "BasketFilter" is explained in "Combine filters" section.

To next section "Render locale sensitively"