Conditional Random Field (CRF)


Prev | Home | Next

4. Features, Feature Types, and Feature Generator

The iitb.Model package contains an implementation of the Feature as well as the FeatureGenerator interface. It also contains an abstract class named FeatureTypes. The package also provides several implemented feature types which are described below.

FeatureImpl class

The FeatureImpl class is an implementation of the Feature interface. It encapsulates feature id, current label(yend), previous label(s)(ystart), and a value for the feature. You would not require to implement this interface unless your implementation of feature types class requires.

FeatureTypes class

FeatureTypes is a factory class that defines a class of similar kind of features. It is an abstract class that provides an advanced user of this package the facility to implement new features. A user is just required to extend this class in order to add new features. Following are the important methods of the class:

Few tips regrading how to implement your own FeatureTypes class is given at the end of this section. In order to make this package easy to use in a common application, this package provides implementation of some of the commonly used features such as EdgeFeatures, WordFeatures and the like. Typically, an application will define a number of features for capturing different properties of the input data. A brief description of each of these feature types is given below.

A Feature is identified by its name and a unique id assigned to it. Public Class FeatureIdentifier is used to give an identification to a feature. In order to generate the WordFeatures, UnknownFeatures, and WordScoreFeatures described above, a dictionary of words is maintained. This dictionary basically gives a mapping of words seen during training with their respective class labels. The WordsInTrain class provides this functionality. A brief description of this class is as under.

WordsInTrain

This class encapsulates dictionary or vocabulary built on-the-fly from the training set. Basically, it is a set of data-structures which are described below.

FeatureGenImpl class

FeatureGenImpl is a feature generator class which implements the FeatureGenerator interface. As described earlier, a feature generator works as a repository for feature types objects. These feature types objects are added to the feature generator using the addFeatures() method of the FeatureGenerator interface. If you have created new features by extending the FeatureTypes class, you will need to add the new feature types in the feature generator. To add your own feature types, you can extend the FeatureGenImpl and override the addFeatures() method. But, if your application needs the functionality not provided by the current FeatureGenImpl class, you may want to create a new feature generator class by implementing the FeatureGenerator interface.

Implementing your own FeatureTypes:

Typically, you would have one of two types of features to implement.

  1. A State feature that outputs a boolean feature for a pattern/property for each label. An example is the UnknownFeatures class.
  2. A transition feature that is a function of not just the current state but also the previous state. An example is the EdgeFeatures class.

Here are a few tips on creating your own state features.
Create a derived class of FeatureTypes for each class of pattern that you would like to detect. You want a feature to be associated for each label for each pattern in this type. There is a convenient class FeatureTypesEachLabel for outputing a feature for each label. So, you just need to worry about outputing every applicable pattern for each data position given in startScanFeatureAt() on every call to next(). On a next() call you need to fill in the value of the feature (typically 1 but can be any real-value) and set the id and name of the feature. The state where it is fired (the yend field) will be set by the outer FeatureTypesEachLabel wrapper. The id field is the most tricky since you have to make sure to assign a unique id to each distinct pattern that you generate. The name of the feature will be the string pattern that will be useful for viewing later. However, you are responsible for assigning a unique id for each pattern. The id-s you assign do not have to be contiguous and they only have to be unique within each FeatureType. The function setIdentifier is used to assign the id and this function adds a few bits for each FeatureType to make them unique overall. You want the generation of ids to be fast.

You add different FeatureTypes to the FeatureGenImpl class using the .add() routine as follows:
FeatureGenImpl fgen = new FeatureGenImpl(modelSpecs, numLabels, false); // last false to disable default addition of features.

fgen.add(new EdgeFeatures);
fgen.add(new FeatureTypesEachLabel(new MyPatternFeature1());
fgen.add(new FeatureTypesEachLabel(new MyPatternFeature2());
..
.

The FeatureTypes class also supports some advanced methods like train() to enable you to collect any statistics you want from the trianing data and use that to drive the features that you output.

top
Prev | Home | Next

Copyright © 2004 KReSIT, IIT Bombay. All rights reserved