Class MalletCrfTrainerConfiguration

java.lang.Object
org.coordinatekit.crf.mallet.train.MalletCrfTrainerConfiguration

@NullMarked public final class MalletCrfTrainerConfiguration extends Object
Configuration settings for MalletCrfTrainer.

This immutable class encapsulates all configurable parameters for CRF model training using the MALLET library. Use the MalletCrfTrainerConfiguration.Builder to construct instances with custom settings, or use defaults() to obtain a configuration with sensible default values.

Example usage:

 
 MalletCrfTrainerConfig config = MalletCrfTrainerConfig.builder().gaussianVariance(5.0).iterations(1000)
         .numThreads(8).build();
 
 
See Also:
  • Method Details

    • builder

      public static MalletCrfTrainerConfiguration.Builder builder()
      Returns a new MalletCrfTrainerConfiguration.Builder instance for constructing a configuration.
      Returns:
      a new builder with default values
    • defaults

      public static MalletCrfTrainerConfiguration defaults()
      Returns a configuration with all default values.

      This is equivalent to calling MalletCrfTrainerConfig.builder().build().

      Returns:
      a configuration with default settings
    • conllOutputEnabled

      public boolean conllOutputEnabled()
      Returns whether CoNLL output is enabled during training.

      When enabled, predictions are written to files in CoNLL format at regular intervals. Default is true.

      Returns:
      true if CoNLL output is enabled
      See Also:
    • conllOutputConfiguration

      public ConllOutputConfiguration conllOutputConfiguration()
      Returns the configuration for CoNLL output.

      This configuration controls the output directory, file naming, and iteration interval for CoNLL output files.

      Returns:
      the CoNLL output configuration
      See Also:
    • fullyConnected

      public boolean fullyConnected()
      Returns whether to create a fully connected CRF state machine.

      A fully connected CRF allows transitions between all states. Default is true.

      Returns:
      true if the CRF should be fully connected
    • gaussianVariance

      public double gaussianVariance()
      Returns the Gaussian prior variance for L2 regularization.

      Higher values result in less regularization (weights can grow larger), while lower values result in stronger regularization. Default is 10.0.

      Returns:
      the Gaussian prior variance
    • iterations

      public int iterations()
      Returns the maximum number of training iterations.

      Training may stop earlier if convergence is detected. Default is 500.

      Returns:
      the maximum iterations
    • modelOutputEnabled

      public boolean modelOutputEnabled()
      Returns whether model checkpoint output is enabled during training.

      When enabled, model checkpoints are serialized to files at regular intervals during training. Default is true.

      Returns:
      true if model output is enabled
      See Also:
    • modelOutputConfiguration

      public ModelOutputConfiguration modelOutputConfiguration()
      Returns the configuration for model checkpoint output.

      This configuration controls the output directory, file naming, and iteration interval for model checkpoint files.

      Returns:
      the model output configuration
      See Also:
    • randomSeed

      public int randomSeed()
      Returns the random seed for reproducible data splitting.

      Using the same seed will produce the same train/test split. Default is 0.

      Returns:
      the random seed
    • threads

      public int threads()
      Returns the number of threads to use for parallel training.

      More threads can speed up training on multi-core systems. Default is 6.

      Returns:
      the number of threads
    • trainingFraction

      public double trainingFraction()
      Returns the fraction of data to use for training.

      The remaining data is used for testing/evaluation. Default is 0.5.

      Returns:
      the training fraction, between 0.0 (exclusive) and 1.0 (inclusive)
    • weightsType

      public WeightsType weightsType()
      Returns the weight storage type for the CRF.

      Controls memory usage and computation speed trade-offs. Default is WeightsType.SOME_DENSE.

      Returns:
      the weights type