A type-safe, extensible Java wrapper for CRF libraries. Train models and tag sequences with a clean, fluent API.
var tagProvider = new StringTagProvider("O");
var trainer = new MalletCrfTrainer(
CompositeFeatureExtractor.of(
LengthFeatureExtractor.<String>builder(5)
.hasLengthFeatureMapper(len -> "HAS_LENGTH_" + len).build(),
PatternMatchingFeatureExtractor.<String>builder("\\d+")
.matchedFeature("IS_DIGITS").build()
),
tagProvider,
new XmlTrainingData(tagProvider)
);
trainer.train(Path.of("training.xml"), Path.of("model.crf"));A thoughtfully designed API that gets out of your way
Minimal implementation required. Choose sensible defaults and start benefiting from CRF immediately.
Configure everything in code without endless instance variables. Chain methods naturally.
Override any functionality without reimplementing entire classes. Sensible defaults that don't require extension.
Constructor-based dependency injection makes it easy to define @Bean configurations.
Core abstractions in one module, CRF library implementations in separate modules. Use only what you need.
No unnecessary transitive dependencies. Avoid dependency hell and keep your project lean.
Tag tokens in sequences where context matters
109 UNIVERSITY ST MARTIN TN
→Conditional Random Fields consider the entire sequence context when assigning labels, not just individual tokens.
The CrfTrainer accepts training files and outputs a serialized model. Compose feature extractors to capture the patterns that matter for your domain.
FeatureExtractor — Extract features from tokensTagProvider — Define your label vocabularyTrainingDataSequencer — Parse training filesThe CrfTagger loads a trained model and applies labels to new sequences. Get both the predicted tags and confidence scores.
Tokenizer — Convert input to tokensFeatureExtractor — Same as trainingTagProvider — Same as trainingPre-built feature extractors for common patterns
Combine extractors with CompositeFeatureExtractor or implement your own.
Add the library to your project and start labeling sequences in minutes.
Read the Docs