scone.util
Class DocumentParser
java.lang.Object
scone.util.DocumentParser
public class DocumentParser
- extends java.lang.Object
transforms tokens into database objects.
HtmlTokens which represent links are transformed into LinkToken
objects.
The following keys and values are added to the meta data:
"baseNode" | the NetNode |
"htmlDocument" | the HtmlNode |
- Author:
- Harald Weinreich, Volkert Buchmann
Constructor Summary |
DocumentParser(int requirements)
create the initial instance |
DocumentParser(int requirements,
boolean showRequirements)
create the initial instance |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
COPYRIGHT
public static final java.lang.String COPYRIGHT
- See Also:
- Constant Field Values
CONSIDERLINKS
public static final int CONSIDERLINKS
- See Also:
- Constant Field Values
CONSIDERINCLUSIONS
public static final int CONSIDERINCLUSIONS
- See Also:
- Constant Field Values
PARSEDOCUMENT
public static final int PARSEDOCUMENT
- See Also:
- Constant Field Values
CONSIDERKEYWORDS
public static final int CONSIDERKEYWORDS
- See Also:
- Constant Field Values
SAVEBODYTEXT
public static final int SAVEBODYTEXT
- See Also:
- Constant Field Values
SAVESOURCECODE
public static final int SAVESOURCECODE
- See Also:
- Constant Field Values
CALCFINGERPRINT
public static final int CALCFINGERPRINT
- See Also:
- Constant Field Values
POSTDATA
public static final int POSTDATA
- See Also:
- Constant Field Values
MAX_BODYTEXT
public static final int MAX_BODYTEXT
- See Also:
- Constant Field Values
MAX_SOURCECODE
public static final int MAX_SOURCECODE
- See Also:
- Constant Field Values
DocumentParser
public DocumentParser(int requirements)
- create the initial instance
DocumentParser
public DocumentParser(int requirements,
boolean showRequirements)
- create the initial instance
- Parameters:
requirement
- is an bitarray. See scone.Plugin for more information.showRequirements
- shall the requirements be displayed?
parse
public void parse(TokenInputStream in,
TokenOutputStream out)
- Parse document and collect data for NetNode and HtmlNode objects: number of links,
number of images, language etc.