|
fop 0.93 | ||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.fop.hyphenation.TernaryTree
org.apache.fop.hyphenation.HyphenationTree
This tree structure stores the hyphenation patterns in an efficient way for fast lookup. It provides the provides the method to hyphenate a word.
Nested Class Summary |
Nested classes inherited from class org.apache.fop.hyphenation.TernaryTree |
TernaryTree.Iterator |
Field Summary | |
protected TernaryTree |
classmap
This map stores the character classes |
protected java.util.HashMap |
stoplist
This map stores hyphenation exceptions |
protected ByteVector |
vspace
value space: stores the interletter values |
Fields inherited from class org.apache.fop.hyphenation.TernaryTree |
BLOCK_SIZE, eq, freenode, hi, kv, length, lo, root, sc |
Constructor Summary | |
HyphenationTree()
|
Method Summary | |
void |
addClass(java.lang.String chargroup)
Add a character class to the tree. |
void |
addException(java.lang.String word,
java.util.ArrayList hyphenatedword)
Add an exception to the tree. |
void |
addPattern(java.lang.String pattern,
java.lang.String ivalue)
Add a pattern to the tree. |
java.lang.String |
findPattern(java.lang.String pat)
|
protected byte[] |
getValues(int k)
|
protected int |
hstrcmp(char[] s,
int si,
char[] t,
int ti)
String compare, returns 0 if equal or t is a substring of s |
Hyphenation |
hyphenate(char[] w,
int offset,
int len,
int remainCharCount,
int pushCharCount)
Hyphenate word and return an array of hyphenation points. |
Hyphenation |
hyphenate(java.lang.String word,
int remainCharCount,
int pushCharCount)
Hyphenate word and return a Hyphenation object. |
void |
loadPatterns(org.xml.sax.InputSource source)
Read hyphenation patterns from an XML file. |
void |
loadPatterns(java.lang.String filename)
Read hyphenation patterns from an XML file. |
static void |
main(java.lang.String[] argv)
|
protected int |
packValues(java.lang.String values)
Packs the values by storing them in 4 bits, two values into a byte Values range is from 0 to 9. |
void |
printStats()
|
protected void |
searchPatterns(char[] word,
int index,
byte[] il)
Search for all possible partial matches of word starting at index an update interletter values. |
protected java.lang.String |
unpackValues(int k)
|
Methods inherited from class org.apache.fop.hyphenation.TernaryTree |
balance, clone, find, find, init, insert, insert, insertBalanced, keys, knows, size, strcmp, strcmp, strcpy, strlen, strlen, trimToSize |
Methods inherited from class java.lang.Object |
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
protected ByteVector vspace
protected java.util.HashMap stoplist
protected TernaryTree classmap
Constructor Detail |
public HyphenationTree()
Method Detail |
protected int packValues(java.lang.String values)
values
- a string of digits from '0' to '9' representing the
interletter values.
protected java.lang.String unpackValues(int k)
public void loadPatterns(java.lang.String filename) throws HyphenationException
filename
- the filename
HyphenationException
- In case the parsing failspublic void loadPatterns(org.xml.sax.InputSource source) throws HyphenationException
source
- the InputSource for the file
HyphenationException
- In case the parsing failspublic java.lang.String findPattern(java.lang.String pat)
protected int hstrcmp(char[] s, int si, char[] t, int ti)
protected byte[] getValues(int k)
protected void searchPatterns(char[] word, int index, byte[] il)
Search for all possible partial matches of word starting at index an update interletter values. In other words, it does something like:
for(i=0; i
But it is done in an efficient way since the patterns are
stored in a ternary tree. In fact, this is the whole purpose
of having the tree: doing this search without having to test
every single pattern. The number of patterns for languages
such as English range from 4000 to 10000. Thus, doing thousands
of string comparisons for each word to hyphenate would be
really slow without the tree. The tradeoff is memory, but
using a ternary tree instead of a trie, almost halves the
the memory used by Lout or TeX. It's also faster than using
a hash table
- Parameters:
word
- null terminated word to matchindex
- start index from wordil
- interletter values array to update
public Hyphenation hyphenate(java.lang.String word, int remainCharCount, int pushCharCount)
word
- the word to be hyphenatedremainCharCount
- Minimum number of characters allowed
before the hyphenation point.pushCharCount
- Minimum number of characters allowed after
the hyphenation point.
Hyphenation
object representing
the hyphenated word or null if word is not hyphenated.public Hyphenation hyphenate(char[] w, int offset, int len, int remainCharCount, int pushCharCount)
w
- char array that contains the wordoffset
- Offset to first character in wordlen
- Length of wordremainCharCount
- Minimum number of characters allowed
before the hyphenation point.pushCharCount
- Minimum number of characters allowed after
the hyphenation point.
Hyphenation
object representing
the hyphenated word or null if word is not hyphenated.public void addClass(java.lang.String chargroup)
PatternParser
as callback to
add character classes. Character classes define the
valid word characters for hyphenation. If a word contains
a character not defined in any of the classes, it is not hyphenated.
It also defines a way to normalize the characters in order
to compare them with the stored patterns. Usually pattern
files use only lower case characters, in this case a class
for letter 'a', for example, should be defined as "aA", the first
character being the normalization char.
addClass
in interface PatternConsumer
chargroup
- character grouppublic void addException(java.lang.String word, java.util.ArrayList hyphenatedword)
PatternParser
class as callback to
store the hyphenation exceptions.
addException
in interface PatternConsumer
word
- normalized wordhyphenatedword
- a vector of alternating strings and
hyphen
objects.public void addPattern(java.lang.String pattern, java.lang.String ivalue)
PatternParser
class as callback to
add a pattern to the tree.
addPattern
in interface PatternConsumer
pattern
- the hyphenation patternivalue
- interletter weight values indicating the
desirability and priority of hyphenating at a given point
within the pattern. It should contain only digit characters.
(i.e. '0' to '9').public void printStats()
printStats
in class TernaryTree
public static void main(java.lang.String[] argv) throws java.lang.Exception
java.lang.Exception
|
fop 0.93 | ||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |