Home of: [Atelier "FUJIGURUMA"] >> [PageMixer hosted by SourceForge.net]

SEE "For Readers of English Version",
or Japanese version of this page

Basic principle

This section explains basic principle of PageMixer.

Token and token sequence

HTML page consists of many tags and texts like below:

                            :
                            :
<h2>Table of contents</h2>

<table class="wide">

<tr><!-- ======================================== -->
<th style="width: 25%;">title</th><th>content</th>
</tr>

<tr><!-- ======================================== -->
<th><a href="./principle.en.html">Basic principle</a></th>
<td><p>basic principle of PageMixer</p></td>
</tr>

<tr><!-- ======================================== -->
<th><a href="./nopfilter.en.html">'NOP' filter</a></th>
<td><p>parse and render HTML page in
PageMixer framework</p></td>
</tr>
                            :
                            :
sample of HTML page source

Basic principle of PageMixer is that treat HTML page as the "Token" sequence.

Overview
Overview (click for large figure)

"Token", like of lexical analysis in programming language specificaiton, meanas "minimum unit of HTML page".

NOTE: "Token" is not "minimum unit" in fact, because "Token" consists of "Symbol" and "Attribute"s. But "Token" is recognized as "minimum unit" in PageMixer, like ATOM recognized as "minimum unit" in introduction of physics.

In PageMixer, below things become tokens.

So, HTML page soruce sample above is translated into below "Token" sequence(below example is only part of it).

  1. HTML start tag token - "h2"
  2. text token - "Table of contents"
  3. HTML end tag token - "h2"
  4. text token - two LFs (or two CR/LF pairs)
  5. HTML start tag token - "table" including 'class="wide"' attribute
  6. text token - two LFs
  7. HTML start tag token - "tr"
  8. comment token - " ======================================== "
  9. HTML start tag token - "th" including 'style="width: 25%;"' attribute
  10. text token - "title"
  11. HTML end tag token - "th"
  12. HTML start tag token - "th"
  13. text token - "content"
  14. HTML end tag token - "th"
  15. and so on ....

Concept of "Token" is defined as Java class, and so, "Token" and its variations are Java objects. It means that you can treat HTML page as the sequence of Java objects.

Class hierarchy of "Token"
Class hierarchy of "Token" (click for large figure)

In addition to above, classes shown below are also available since PageMixer 3.0.

Consumer, filter and producer

Consumer, filter and producer
Consumer, filter and producer (click for large figure)

Consumer

HTML page, the object of processing, is represented as the token sequence. Then, where is the subject of processing ?

The subject of processing is represented as "Consumer", which consumes token sequence. Tokens in the sequence, representing target HTML page, are passed to "Consumer" one by one, and "Consumer" consumes them.

The result of consuming depends on "Consumer" implementation. For example:

Filter

As described above, subject of processing is represented as "Consumer". But large, complex, multi-purposes "Consumer" is difficult to develop, test, correct, and understand.

To divide processing into small, simple, and mono-purpose ones, concept of "Filter" is introduced.

"Filter" is the "Consumer" which is connected to another "Consumer", and provides token sequence to it.

It depends on "Filter" implementation whether it provides the token sequence as same as it given. Some "Filter"s are implemented as below:

By combining some ready-made "Filter"s, you can create your custom processing up, like combination of UNIX commands by pipe("|").

Producer

Token sequence is processed by "Consumer" or "Filter". And it is named as "Producer" to provide token sequence to them initially.

"Producer" must have function to provide its specific token sequence to given "Consumer"(or "Filter" as "Consumer"), but token sequence production mechanism depends on "Producer" implementation.

You can create "Producer" which provides dynamic generated, deserialized from DBMS, or infinite token sequence(may be in-usefull).

Consumer context

In web based system, HTML pages include some dynamic values. They are determined on execution time, because they are from user input, or result of user input processing.

So, consumers or filters need the way to get user input or result of user input.

And they also need the way for inter-filter communication.

Of course, filters communicate with each other via token sequence, but token sequence is as same as FTP DATA channel or HTTP BODY. They need the another way, as same as FTP CONTROL channel or HTTP HEADER, to exchange other informations than tokens(DATA/BODY).

So, the concept of "Consumer Context" is introduced.

Consumers and filters are given "Consumer Context" on the request of token processing, and "Consumer Context" provides functions to set/unset/get values, like "java.util.Map".

Consumers and filters can:

via "Consumer Context".

It depends on "Consumer Context" implementation where values are stored, and how they are identified. For example, in Servlet environment, "Consumer Context" may store values in attribute storage space of HttpRequest or HttpSession storage space, and may also get them from there or parameters of HttpRequest.


To next section "'NOP' filter"