Frequently asked questions

What can I do to improve the training time for CRF.
  • Use the flag "reuseM true" if the edge features are independent of i and x. This should certainly reduce the number of exp/log computations.
  • If you have lots of memory run with "-cache true" flag on.
  • If you do not expect to face numerical overflow problems because of small number of features or small sequence sizes, do not use the "trainer ll" option. This option performs computations in log math to handle numerical problems.
  • How do I know if the model is overfitting and what can I do if it is.
    There are three things you can do to control over-fitting:

    How do I get the second-best solution in CRF?
    You can use viterbi.getBestSoln(k) for getting the k-th best solution. For that you need to modify the beamsize argument of when Viterbi() is constructed to the maximum value of k you might be interested in. Look in An example of usage can be found in

    Is the CRF code from sourceforge just for 1st order Markov models? I notice that you can specify a history length, but is this extended history only for the input sequence and not for the label sequence? I know we can simulate higher order models using 1st order models by increasing the state space and having a huge, but sparse transition matrix.
    The code supports higher order markov models, although the performance is not as great. The history size refers to the label sequence history. The CRF part of the package does not care about the history in the input sequence --- that would only affect the feature generator that you use. The CRF package will use exactly those dependencies that you specify by the features. Even in a first order markov model, you can choose independently for each label pair whether to add that as a dependency or not. Consult the file; the yprev() function controls that. When you move to higher order models the yprevArray() controls for each history position, what labels you want to depend on.

    How can I use the Semi-CRF model as described in your NIPS 2004 paper?
    You will find it in iitb.CRF.NestedCRF and a slightly different version that allows control over which segments are considered in iitb.CRF.SegmentCRF.

    Is there any support for feature selection?
    We have found that feature selection is equivalent to training with L1/L2 regularizer. This does not help in reducing the running time, but it addresses the accuracy issue quite well. The latest version of the code when run with "-prior laplaceApprox" gives L1 regularization, the default is L2. You can control the amount of feature selection by varying -invSigmaSquare.