Basic principle

This section explains basic principle of PageMixer.

Token and token sequence

HTML page consists of many tags and texts like below:

                            :
                            :
<h2>Table of contents</h2>

<table class="wide">

<tr><!-- ======================================== -->
<th style="width: 25%;">title</th><th>content</th>
</tr>

<tr><!-- ======================================== -->
<th><a href="./principle.en.html">Basic principle</a></th>
<td><p>basic principle of PageMixer</p></td>
</tr>

<tr><!-- ======================================== -->
<th><a href="./nopfilter.en.html">'NOP' filter</a></th>
<td><p>parse and render HTML page in
PageMixer framework</p></td>
</tr>
                            :
                            :

sample of HTML page source

Basic principle of PageMixer is that treat HTML page as the "Token" sequence.

Overview (click for large figure)

"Token", like of lexical analysis in programming language specificaiton, meanas "minimum unit of HTML page".

NOTE: "Token" is not "minimum unit" in fact, because "Token" consists of "Symbol" and "Attribute"s. But "Token" is recognized as "minimum unit" in PageMixer, like ATOM recognized as "minimum unit" in introduction of physics.

In PageMixer, below things become tokens.

HTML start tag
HTML end tag
text between HTML (start/end)tag
comment

So, HTML page soruce sample above is translated into below "Token" sequence(below example is only part of it).

HTML start tag token - "h2"
text token - "Table of contents"
HTML end tag token - "h2"
text token - two LFs (or two CR/LF pairs)
HTML start tag token - "table" including 'class="wide"' attribute
text token - two LFs
HTML start tag token - "tr"
comment token - " ======================================== "
HTML start tag token - "th" including 'style="width: 25%;"' attribute
text token - "title"
HTML end tag token - "th"
HTML start tag token - "th"
text token - "content"
HTML end tag token - "th"
and so on ....

Concept of "Token" is defined as Java class, and so, "Token" and its variations are Java objects. It means that you can treat HTML page as the sequence of Java objects.

Class hierarchy of "Token" (click for large figure)

In addition to above, classes shown below are also available since PageMixer 3.0.

CDATAToken for CDATA
PIToken for Procesing Instruction
ScriptToken for embedded script(e.g.: JSP, ASP and so on)

Consumer, filter and producer

Consumer, filter and producer (click for large figure)

Consumer

HTML page, the object of processing, is represented as the token sequence. Then, where is the subject of processing ?

The subject of processing is represented as "Consumer", which consumes token sequence. Tokens in the sequence, representing target HTML page, are passed to "Consumer" one by one, and "Consumer" consumes them.

The result of consuming depends on "Consumer" implementation. For example:

ignore all tokens, so empty HTML page is rendered
render whole token sequence as same as it is
add some tokens to sequence, and render whole sequence
modify some tokens in sequence, and render whole sequence
and so on ....

Filter

As described above, subject of processing is represented as "Consumer". But large, complex, multi-purposes "Consumer" is difficult to develop, test, correct, and understand.

To divide processing into small, simple, and mono-purpose ones, concept of "Filter" is introduced.

"Filter" is the "Consumer" which is connected to another "Consumer", and provides token sequence to it.

It depends on "Filter" implementation whether it provides the token sequence as same as it given. Some "Filter"s are implemented as below:

omit some tokens in sequence which are named as specified
omit some token sub-sequences which are between tokens named as specified
iterate token sub-sequence some times as specified
and so on ....

By combining some ready-made "Filter"s, you can create your custom processing up, like combination of UNIX commands by pipe("|").

Producer

Token sequence is processed by "Consumer" or "Filter". And it is named as "Producer" to provide token sequence to them initially.

"Producer" must have function to provide its specific token sequence to given "Consumer"(or "Filter" as "Consumer"), but token sequence production mechanism depends on "Producer" implementation.

You can create "Producer" which provides dynamic generated, deserialized from DBMS, or infinite token sequence(may be in-usefull).

Consumer context

In web based system, HTML pages include some dynamic values. They are determined on execution time, because they are from user input, or result of user input processing.

So, consumers or filters need the way to get user input or result of user input.

And they also need the way for inter-filter communication.

Of course, filters communicate with each other via token sequence, but token sequence is as same as FTP DATA channel or HTTP BODY. They need the another way, as same as FTP CONTROL channel or HTTP HEADER, to exchange other informations than tokens(DATA/BODY).

So, the concept of "Consumer Context" is introduced.

Consumers and filters are given "Consumer Context" on the request of token processing, and "Consumer Context" provides functions to set/unset/get values, like "java.util.Map".

Consumers and filters can:

get user input
get result of user input processing
SEND(by setting) information to others
RECEIVE(by getting) information from others

via "Consumer Context".

It depends on "Consumer Context" implementation where values are stored, and how they are identified. For example, in Servlet environment, "Consumer Context" may store values in attribute storage space of HttpRequest or HttpSession storage space, and may also get them from there or parameters of HttpRequest.

To next section "'NOP' filter"

MAP