maito.util
Class Tools

java.lang.Object
  extended by maito.util.Tools

public class Tools
extends java.lang.Object

A class that contains static utility methods. Could be optimized (and made a bit uglier...) by compiling the RegExp patterns globally etc.

Version:
1.0
Author:
Antti Laitinen, Väinö Ala-Härkönen, Tuomas Tanner

Field Summary
static int ACTOR_CITESEER
          Person name format "Firstname M.
static int ACTOR_OTHER
          Person name format "Surname, Firstname M.", one actor per field
static java.lang.String DATASOURCE_PARAM_LOCATION
           
static java.lang.String DATASOURCE_PARAM_UPDATED
           
static java.lang.String PATH_DBCONFIG
          The full path for the dbconfig.properties file
static java.lang.String PATH_RAWDATA_SQL
          The full path for the resource graph file
static java.lang.String PATH_RESOURCEGRAPH_SQL
          The full path for the dbconfig.properties file
static java.lang.String RESOURCENET_PATH
          The full path for the dbconfig.properties file
 
Method Summary
static java.lang.String[] canonizeActor(java.lang.String str, int dataType)
          Canonizes an Actor string according to the specifications.
static java.lang.String canonizeDate(java.lang.String str)
          Canonizes / normalizes a Date String according to the specifications.
static java.lang.String canonizeGeneric(java.lang.String str)
          Canonizes a String without any special heuristics or transformation: 1.
static java.lang.String canonizeLang(java.lang.String str)
          Canonizes / normalizes a Language String according to the specifications.
static java.lang.String canonizeTitle(java.lang.String str)
          Canonizes a Title String according to the specifications
static java.lang.String categorizeIdentifier(java.lang.String str)
          Categorizes an Identifier String according to the specifications
static java.util.Properties loadProperties(java.lang.String filename)
           
static java.lang.String readFile(java.lang.String fileName)
          Reads a text file created in the UTF-8 charset into a string
static boolean saveFile(java.lang.String fileName, java.lang.String contents, boolean append)
          Saves a text file in the UTF-8 character encoding
static java.lang.String[] splitActor(java.lang.String str, int dataType)
          Splits an Actor string to a table of substrings provided.
static java.lang.String[] splitPerson(java.lang.String str)
          Splits a String of "Henkilö"-type Actors to a table of substrings.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

PATH_DBCONFIG

public static final java.lang.String PATH_DBCONFIG
The full path for the dbconfig.properties file


PATH_RESOURCEGRAPH_SQL

public static final java.lang.String PATH_RESOURCEGRAPH_SQL
The full path for the dbconfig.properties file


PATH_RAWDATA_SQL

public static final java.lang.String PATH_RAWDATA_SQL
The full path for the resource graph file


RESOURCENET_PATH

public static final java.lang.String RESOURCENET_PATH
The full path for the dbconfig.properties file


ACTOR_CITESEER

public static final int ACTOR_CITESEER
Person name format "Firstname M. Surname", several persons separated with a comma

See Also:
Constant Field Values

ACTOR_OTHER

public static final int ACTOR_OTHER
Person name format "Surname, Firstname M.", one actor per field

See Also:
Constant Field Values

DATASOURCE_PARAM_LOCATION

public static final java.lang.String DATASOURCE_PARAM_LOCATION
See Also:
Constant Field Values

DATASOURCE_PARAM_UPDATED

public static final java.lang.String DATASOURCE_PARAM_UPDATED
See Also:
Constant Field Values
Method Detail

canonizeGeneric

public static java.lang.String canonizeGeneric(java.lang.String str)
Canonizes a String without any special heuristics or transformation: 1. 1-n consequent whitespaces to one space 2. Remove all characters except for A-Z a-z 0-9 , . / - : ~ (Note: this means characters like å, ä, ö are gone too which IS according to specs, not a bug) 3. Convert to upper case

Parameters:
str - The original string
Returns:
The canonized version of the string - null if the original String was null

canonizeActor

public static java.lang.String[] canonizeActor(java.lang.String str,
                                               int dataType)
Canonizes an Actor string according to the specifications.

Parameters:
str - A String containing one or more actors
dataType - Type of Actor data to be canonized, use ACTOR_-constants
Returns:
A String array: the first String specifies type of the actor(s) according to specifications ("Organisaatio"/"Henkilö"/"Joku"). The rest of the Strings are names of the actors. Value null is returned if original String is null.

canonizeDate

public static java.lang.String canonizeDate(java.lang.String str)
Canonizes / normalizes a Date String according to the specifications. Date format used is ISO 8601, http://www.w3.org/TR/NOTE-datetime but we're only saving the YYYY-MM-DD part of it

Parameters:
str - A String containing a date
Returns:
The canonized version of the String, null if date not valid

canonizeLang

public static java.lang.String canonizeLang(java.lang.String str)
Canonizes / normalizes a Language String according to the specifications. Language code format used is 2- or 3-character ISO639 without additional identifiers. Validity of the code is not checked, just the format.

Parameters:
str - A String containing a language identifier
Returns:
The canonized version of the String or null if format not valid

canonizeTitle

public static java.lang.String canonizeTitle(java.lang.String str)
Canonizes a Title String according to the specifications

Parameters:
str - A String containing a title
Returns:
The canonized version of the String or null if the String was null

categorizeIdentifier

public static java.lang.String categorizeIdentifier(java.lang.String str)
Categorizes an Identifier String according to the specifications

Parameters:
str - The identifier to be categorized
Returns:
The type of identifier that the heuristic assumed. The name of the type is returned in the program's common atomic statement property format - see specifications. If the identifier String is null, returns null.

splitActor

public static java.lang.String[] splitActor(java.lang.String str,
                                            int dataType)
Splits an Actor string to a table of substrings provided. The devider character is ','. This method does not canonize the actors.

Parameters:
str - A String containing one or more actors (persons split by ',').
dataType - Type of Actor data to be canonized, use ACTOR_-constants.
Returns:
A String array: A String array: the first String specifies type of the actor(s) according to specifications ("Organisaatio"/"Henkilö"/"Joku"). The rest of the Strings are the uncanonized actors. Value null is returned if original String is null.

splitPerson

public static java.lang.String[] splitPerson(java.lang.String str)
Splits a String of "Henkilö"-type Actors to a table of substrings. Utilized only when the dataType is known to be "ACTOR_CITESEER".

Parameters:
str - A String containing one or more actors (split by ',').
Returns:
A String array: A String array: the first String specifies type of the actor(s) ("Henkilö"). The rest of the Strings are the uncanonized actors. Value null is returned if original String is null.

readFile

public static java.lang.String readFile(java.lang.String fileName)
Reads a text file created in the UTF-8 charset into a string


saveFile

public static boolean saveFile(java.lang.String fileName,
                               java.lang.String contents,
                               boolean append)
Saves a text file in the UTF-8 character encoding

Parameters:
fileName - the name of the file with full path information
contents - what to write
append - append to the end of an existing file or write over it

loadProperties

public static java.util.Properties loadProperties(java.lang.String filename)