Class MalletCrfTrainerConfiguration
MalletCrfTrainer.
This immutable class encapsulates all configurable parameters for CRF model training using the
MALLET library. Use the MalletCrfTrainerConfiguration.Builder to construct instances with custom settings, or use
defaults() to obtain a configuration with sensible default values.
Example usage:
MalletCrfTrainerConfig config = MalletCrfTrainerConfig.builder().gaussianVariance(5.0).iterations(1000)
.numThreads(8).build();
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic final classBuilder for constructingMalletCrfTrainerConfigurationinstances. -
Method Summary
Modifier and TypeMethodDescriptionbuilder()Returns a newMalletCrfTrainerConfiguration.Builderinstance for constructing a configuration.Returns the configuration for CoNLL output.booleanReturns whether CoNLL output is enabled during training.defaults()Returns a configuration with all default values.booleanReturns whether to create a fully connected CRF state machine.doubleReturns the Gaussian prior variance for L2 regularization.intReturns the maximum number of training iterations.Returns the configuration for model checkpoint output.booleanReturns whether model checkpoint output is enabled during training.intReturns the random seed for reproducible data splitting.intthreads()Returns the number of threads to use for parallel training.doubleReturns the fraction of data to use for training.Returns the weight storage type for the CRF.
-
Method Details
-
builder
Returns a newMalletCrfTrainerConfiguration.Builderinstance for constructing a configuration.- Returns:
- a new builder with default values
-
defaults
Returns a configuration with all default values.This is equivalent to calling
MalletCrfTrainerConfig.builder().build().- Returns:
- a configuration with default settings
-
conllOutputEnabled
public boolean conllOutputEnabled()Returns whether CoNLL output is enabled during training.When enabled, predictions are written to files in CoNLL format at regular intervals. Default is true.
- Returns:
- true if CoNLL output is enabled
- See Also:
-
conllOutputConfiguration
Returns the configuration for CoNLL output.This configuration controls the output directory, file naming, and iteration interval for CoNLL output files.
- Returns:
- the CoNLL output configuration
- See Also:
-
fullyConnected
public boolean fullyConnected()Returns whether to create a fully connected CRF state machine.A fully connected CRF allows transitions between all states. Default is true.
- Returns:
- true if the CRF should be fully connected
-
gaussianVariance
public double gaussianVariance()Returns the Gaussian prior variance for L2 regularization.Higher values result in less regularization (weights can grow larger), while lower values result in stronger regularization. Default is 10.0.
- Returns:
- the Gaussian prior variance
-
iterations
public int iterations()Returns the maximum number of training iterations.Training may stop earlier if convergence is detected. Default is 500.
- Returns:
- the maximum iterations
-
modelOutputEnabled
public boolean modelOutputEnabled()Returns whether model checkpoint output is enabled during training.When enabled, model checkpoints are serialized to files at regular intervals during training. Default is true.
- Returns:
- true if model output is enabled
- See Also:
-
modelOutputConfiguration
Returns the configuration for model checkpoint output.This configuration controls the output directory, file naming, and iteration interval for model checkpoint files.
- Returns:
- the model output configuration
- See Also:
-
randomSeed
public int randomSeed()Returns the random seed for reproducible data splitting.Using the same seed will produce the same train/test split. Default is 0.
- Returns:
- the random seed
-
threads
public int threads()Returns the number of threads to use for parallel training.More threads can speed up training on multi-core systems. Default is 6.
- Returns:
- the number of threads
-
trainingFraction
public double trainingFraction()Returns the fraction of data to use for training.The remaining data is used for testing/evaluation. Default is 0.5.
- Returns:
- the training fraction, between 0.0 (exclusive) and 1.0 (inclusive)
-
weightsType
Returns the weight storage type for the CRF.Controls memory usage and computation speed trade-offs. Default is
WeightsType.SOME_DENSE.- Returns:
- the weights type
-