Conditional Random Field (CRF)
1. Overview
This package is an implementation of Conditional Random Fields (CRFs), which
are undirected graphical models used for sequence learning tasks.
CRFs are proposed by John Lafferty, Andrew McCallum and Fernando Pereira in Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data.
The code, here, follows notations and algorithm described by F. Sha and F. Pereira in Shallow parsing with conditional random fields.
The package is built in a way that makes it possible to use it in various sequential learning tasks such as Information Extraction,
Segmentation of text into attributes, and Sequence Classification.
The various directories available in the package are as follows:
build/ | : | Stores all the compiled java class files |
doc/ | : | Javadoc and this documentation for the package |
samples/ | : | A sample dataset and the corresponding configuration file |
src/ | : | Java source files |
lib/ | : | jar files |
You need to install J2SE1.4 or above, and set JAVA_HOME to point to the directory where you have installed it.
Also, you would need Apache Ant package to compile the code. Refer to the
README file provided with the distribution to know more about installation.
An example use of the package is provided as sample code; it gives an
application of the CRF package to a text segmentation task. This example
uses CRF to segment a string or text into predefined fields or attributes.
The code for the application can be found in src/iitb/Segment directory which demonstrates implementation
of various interfaces needed to use the package. A sample
dataset is given in the samples/ directory, along with the configuration file
for the same. The training and test sets are US addresses, which are required
to be segmented into constituent fields (as given in the training set). The
instructions to run the application are given in the README file.
The code of the distribution is organized into various packages. The source code can be found in the src/ directory of the distribution. A summary of various packages is given below.
iitb.CRF | : | Core package; contains implementation of training and inferencing algorithms and defines various interfaces to be implemented by the user of the distribution. |
iitb.Model | : | Stores implementation of various graphs, features, and feature generator (see next section). |
iitb.Utils | : | Common classes used by other packages. |
iitb.MaxentClassifier | : | An application of CRF to a maximum entropy based classification task. |
iitb.Segment | : | An application of CRF to a text segmentation task. |
top
Copyright © 2004 KReSIT, IIT Bombay. All rights reserved
|