CoordinateKit CRF

A type-safe, extensible Java wrapper for CRF libraries. Train models and tag sequences with a clean, fluent API.

var tagProvider = new StringTagProvider("O");
var trainer = new MalletCrfTrainer(
    CompositeFeatureExtractor.of(
        LengthFeatureExtractor.<String>builder(5)
            .hasLengthFeatureMapper(len -> "HAS_LENGTH_" + len).build(),
        PatternMatchingFeatureExtractor.<String>builder("\\d+")
            .matchedFeature("IS_DIGITS").build()
    ),
    tagProvider,
    new XmlTrainingData(tagProvider)
);
trainer.train(Path.of("training.xml"), Path.of("model.crf"));

Built for Developers

A thoughtfully designed API that gets out of your way

Quick Start

Minimal implementation required. Choose sensible defaults and start benefiting from CRF immediately.

Fluent API

Configure everything in code without endless instance variables. Chain methods naturally.

Extensible

Override any functionality without reimplementing entire classes. Sensible defaults that don't require extension.

Spring Ready

Constructor-based dependency injection makes it easy to define @Bean configurations.

Modular

Core abstractions in one module, CRF library implementations in separate modules. Use only what you need.

Minimal Dependencies

No unnecessary transitive dependencies. Avoid dependency hell and keep your project lean.

Sequence Labeling Made Simple

Tag tokens in sequences where context matters

109 UNIVERSITY ST MARTIN TN
109 StreetNumber UNIVERSITY StreetName ST StreetSuffix MARTIN City TN State

Conditional Random Fields consider the entire sequence context when assigning labels, not just individual tokens.

Train

The CrfTrainer accepts training files and outputs a serialized model. Compose feature extractors to capture the patterns that matter for your domain.

  • FeatureExtractor — Extract features from tokens
  • TagProvider — Define your label vocabulary
  • TrainingDataSequencer — Parse training files

Tag

The CrfTagger loads a trained model and applies labels to new sequences. Get both the predicted tags and confidence scores.

  • Tokenizer — Convert input to tokens
  • FeatureExtractor — Same as training
  • TagProvider — Same as training

Batteries Included

Pre-built feature extractors for common patterns

Pattern Matching
Prefix and Suffix
Sequence Position
Token Length
Transforming
Window
XPath

Combine extractors with CompositeFeatureExtractor or implement your own.

Ready to get started?

Add the library to your project and start labeling sequences in minutes.

Read the Docs