Conditional Random Field (CRF)


Prev | Home | Next

1. Overview

This package is an implementation of Conditional Random Fields (CRFs), which are undirected graphical models used for sequence learning tasks. CRFs are proposed by John Lafferty, Andrew McCallum and Fernando Pereira in Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. The code, here, follows notations and algorithm described by F. Sha and F. Pereira in Shallow parsing with conditional random fields. The package is built in a way that makes it possible to use it in various sequential learning tasks such as Information Extraction, Segmentation of text into attributes, and Sequence Classification.

The various directories available in the package are as follows:

build/: Stores all the compiled java class files
doc/ : Javadoc and this documentation for the package
samples/: A sample dataset and the corresponding configuration file
src/: Java source files
lib/: jar files

You need to install J2SE1.4 or above, and set JAVA_HOME to point to the directory where you have installed it. Also, you would need Apache Ant package to compile the code. Refer to the README file provided with the distribution to know more about installation.

An example use of the package is provided as sample code; it gives an application of the CRF package to a text segmentation task. This example uses CRF to segment a string or text into predefined fields or attributes. The code for the application can be found in src/iitb/Segment directory which demonstrates implementation of various interfaces needed to use the package. A sample dataset is given in the samples/ directory, along with the configuration file for the same. The training and test sets are US addresses, which are required to be segmented into constituent fields (as given in the training set). The instructions to run the application are given in the README file.

The code of the distribution is organized into various packages. The source code can be found in the src/ directory of the distribution. A summary of various packages is given below.

iitb.CRF: Core package; contains implementation of training and inferencing algorithms and defines various interfaces to be implemented by the user of the distribution.
iitb.Model: Stores implementation of various graphs, features, and feature generator (see next section).
iitb.Utils: Common classes used by other packages.
iitb.MaxentClassifier: An application of CRF to a maximum entropy based classification task.
iitb.Segment: An application of CRF to a text segmentation task.

top
Prev | Home | Next

Copyright © 2004 KReSIT, IIT Bombay. All rights reserved