MAP | PageMixer Documents > Design note > Symbol comparison |
This section explains
the reduction of String
comparison cost.
In this document, abbreviated class names are used. Complete names are shown below.
Notation | Full name |
---|---|
Attribute | jp.ne.dti.lares.foozy.pagemixer.Attribute |
HTMLSymbolSet | jp.ne.dti.lares.foozy.pagemixer.HTMLSymbolSet |
Symbol | jp.ne.dti.lares.foozy.pagemixer.Symbol |
SymbolSet | jp.ne.dti.lares.foozy.pagemixer.SymbolSet |
Token | jp.ne.dti.lares.foozy.pagemixer.Token |
PageMixer needs to examine equivalence between two strings
as names of Token
or Attribute
.
Even though "String#equals(Object)
" may
be used to do so ordinarily,
it costs very much
because it consists of below steps if specified object is not itself
according to the source file of String
class in JDK.
String
-ness of given object by
instanceof
Object
to String
In fact, (1)instanceof
and (2)downcast are
not so reasonable in the least.
NOTE:Downcast to a class
on the heels of instanceof
with same one may not cost so much
than alone downcast.
Optimization may skip class check of succeeding (down)cast.
Because calculation order of (4)step by step character comparison
is in propotion to length of string "O(n)",
some other ways shown below also cost
as much as String#equals(Object)
.
String#equalsIgnoreCase(String)
String#compareTo(String)
String#compareToIgnoreCase(String)
Though PageMixer should examine case-insensitively if possible, case-insensitive comparison methods shown above convert to lower letter at execution time. They are not reasonable to process much data.
One of good ways to examine equivalence between two strings quicklly is comparison of references.
But what should refenrce comparison implementation for examination of strings equivalence do when it detect difference between two references to string ?
It costs very much,
if it examines equivalence of string contents by
String#equals
or same function.
But it causes mis-judgment to conclude from difference of references
that two strings are different from each other,
because there is no good reason for difference
between string contents.
Though all strings must be listed up beforehand for above reason, that restriction is not realistic for applicability.
For equivalence examination of two strings by reference comparison,
instead of using String
reference directly,
PageMixer defines "Symbol
" and its derived classes
shown as below.
This class design not only makes comparison between known strings efficient, but also makes heterogeneous comparison of known/unknonw strings possible.
Symbol
defines "isSameAs(Symbol)
"
to examine equivalence with another one,
but it is defined as "abstract".
"Symbol.Resolved#isSameAs(Symbol)
"
is implemented as shown below.
It just examines equivalence by reference comparison.
public boolean isSameAs(Symbol symbol){ return (value_ == symbol.value_); }
The other hand,
"Symbol.Unresolved#isSameAs(Symbol)
"
is implemented as shown below.
It examines equivalence by String#equals(Object)
,
which is not efficient.
public boolean isSameAs(Symbol symbol){ return (value_.equals(symbol.value_)); }
Symbol.Resolved
is used for known string,
and Symbol.Unresolved
is used for unknown one.
Assume that Symbol.Resolved
instances are stored
into SymbolSet
by SymbolSet#add(String)
for known strings beforehand.
By SymbolSet#lookup(String)
,
you can get Symbol.Resolved
as Symbol
if you specify known string to it,
and Symbol.Unresolved
as Symbol
if you specify unknown string.
Comparison between known strings is done by
isSameAs(Symbol)
of
Symbol.Resolved
,
and result of it depends on (reference) equivalence of two strings.
On the other hand,
comparison between known and unknown strings will fail certainly,
because references to String
are different
(for Symbol.Resolved
) and
contents of String
are different
(for String#equals(Object)
in Symbol.Unresolved
).
Moreover,
comparison between unknown strings is done by
Symbol.Unresolved
,
and result of it depends on equivalence of two strings contents.
"this" | argument | |
---|---|---|
known | unknown | |
known | reference comparison | reference comparison (may fail) |
unknown | String#equals(Object) (may fail) |
String#equals(Object) |
You must lookup (SymbolSet#lookup(String)
)
with cased string, which you decide as upper or lower,
if you want compare case-insensitively.
It seems not to be seriouse
even though case conversion costs much,
because looking Symbol
up is done only in preparation phase
not in execution phase.
PageMixer provides "HTMLSymbolSet
",
which is derived class of SymbolSet
,
to get Symbol
for well known HTML tag and attribute names
easily.For example:
// correspond to Symbol a = HTMLSymbolSet.SET.A; // "A" Symbol html = HTMLSymbolSet.SET.HTML; // "HTML" Symbol img = HTMLSymbolSet.SET.IMG; // "IMG" Symbol color = HTMLSymbolSet.SET.COLOR; // "COLOR" Symbol equiv = HTMLSymbolSet.SET.HTTP_EQUIV; // "HTTP-EQUIV"
MAP | PageMixer Documents > Design note > Symbol comparison |