How do I get the second-best solution in CRF?
You can use viterbi.getBestSoln(k) for getting the k-th best solution. For that you need to modify the beamsize argument of when Viterbi() is constructed
to the maximum value of k you might be interested in. Look in Viterbi.java. An example of usage can be found in CollinsTrainer.java
Is the CRF code from sourceforge just for 1st order Markov models? I
notice that you can specify a history length, but is this extended
history only for the input sequence and not for the label sequence?
I know we can simulate higher order models using 1st order models
by increasing the state space and having a huge, but sparse transition
matrix.
The code supports higher order markov models, although the performance is not as great. The history size refers to the label sequence history. The CRF part of the package does not care about the history in the input sequence --- that would only affect the feature generator that you use.
The CRF package will use exactly those dependencies that you specify by the features. Even in a first order markov model, you can choose independently for each label pair whether to add that as a dependency or not. Consult the Feature.java file; the yprev() function controls that. When you move to higher order models the yprevArray() controls for each history position, what labels you want to depend on.
Is there any support for feature selection?
We have found that feature selection is equivalent to training with
L1/L2 regularizer. This does not help in reducing the running time, but it
addresses the accuracy issue quite well. The latest version of the code
when run with "-prior laplaceApprox" gives L1 regularization, the
default is L2. You can control the amount of feature selection by
varying -invSigmaSquare.