Java API Changelog

This document summarizes all important API changes in the Extraction Plugin API. This document only shows changes that are important to plugin developers. For a full list of changes per version, please refer to the general changelog.

0.8.0

The trace property imageId is renamed to image. This is to be in line with the Hansken REST API and Python API. When updating your plugin, please update your calls trace.get("imageId") to trace.get("image").

#774 By default, deferred extraction plugin searches are now scoped to the image of the trace that is currently being processed. Optionally, a project-wide search can be done by passing an optional scope argument.

@Override
public void process(final Trace trace, final ExtractionContext context, final TraceSearcher searcher) {
    // only search for traces inside the same image as the trace that is being processed
    final SearchResult result = searcher.search("file.extension=asc", 10);
    final SearchResult result = searcher.search("file.extension=asc", 10, TraceSearcher.SearchScope.IMAGE);

    // search for all traces inside the same project as the trace that is being processed
    final SearchResult result = searcher.search("file.extension=asc", 10, TraceSearcher.SearchScope.PROJECT);
}

Support trace properties of type List<Integer>, List<Double>, and List<Float>. This enables you to write multiple offsets and confidence scores in tracelets of type prediction.

For example:

trace.addTracelet("prediction", tracelet -> {
    tracelet.set("modelName", "my_cat_detector");
    tracelet.set("modelVersion", "0.0.BETA");
    tracelet.set("type", "classification");
    tracelet.set("label", "cat");

    tracelet.set("offset", 3.0);
    tracelet.set("confidence", 0.4);

    tracelet.set("offsets", List.of(0.0, 3.0, 6.0, 9.0));
    tracelet.set("confidences", List.of(0.1, 0.4, 0.03, 0.09));
})

0.7.0

Escaping the / character in matchers is optional. This simplifies and aims for better HQL and HQL-Lite compatability. See for more information and examples the HQL-Lite syntax documentation.

Examples:
- file.path:\\/Users\\/*\\/AppData -> file.path:/Users/*/AppData
- registryEntry.key:\\/Software\\/Dropbox\\/ks*\\/Client-p -> registryEntry.key:/Software/Dropbox/ks*/Client-p
Hansken returns file.path properties as a String property, instead of a List<String>. Example: trace.get("file.path") now returns "/dev/null", this was ["dev", "null"].

0.6.3

A plugin can now write multiple data streams to a single trace concurrently, e.g. write both decrypted and ocr at the same time. See the “Adding data to a trace” code snippets for general examples on adding data to a trace.

0.6.1

The JAVA SDK is now distributed through maven central instead of the Hansken community.

0.6.0

Warning

It is highly recommended to upgrade your plugin to this new version. See the migration steps below.

Extraction plugin container images are now labeled with PluginInfo. This allows Hansken to efficiently load extraction plugins.
By default, extraction plugin version is managed in the plugin’s pom.xml. The .pluginVersion(..) can be removed from the PluginInfo builder.
Migration steps from earlier versions – for plugins that use the Java extraction plugin SuperPOM:
1. Update the SDK version in your pom.xml
2. If you come from a version prior to 0.4.0, or if you use a plugin name instead of a plugin id in your pluginInfo(), switch to the plugin id style (read instructions for version 0.4.0)
3. Set your plugin version in your project’s pom.xml, and remove the following from your PluginInfo.Builder:
```
.pluginVersion(...)
```
4. Update your build scripts to build your plugin (Docker) container image. You should build your plugin container image with the following command:
```
mvn package docker:build`
```
  This will generate a plugin image:
  - The extraction plugin is added to your local image registry (docker images),
  - The image name is extraction-plugin/PLUGINID, e.g. extraction-plugin/nfi.nl/extract/chat/whatsapp,
  - The image is tagged with two tags: latest, and your plugin version.
  Nb. If Docker is not available in your environment, podman can be used as an alternative. See packaging for more details.

0.5.0

Add new tracelet api Trace.addTracelet(type, consumer). It can be used like this:

trace.addTracelet("prediction", tracelet -> tracelet
     .set("type", "classification")
     .set("label", "label")
     .set("confidence", 0.8f)
     .set("embedding", Vector.of(1,2,3))
     .set("modelName", "yolo")
     .set("modelVersion", "2.0"));

Deprecate Trace.addTracelet(Trace)
Support vector data type in trace properties.

0.4.13

When writing input search traces for tests, it is no longer required to explicitly set an id property. These are automatically generated when executing tests.

0.4.7

A new convenience method id(String, String, String) is added to the PluginInfo builder. This removes some boilerplate code when setting the pluginId. More details on the plugin naming conventions can be found at the Plugin naming convention section.
```
PluginInfo.builderFor(this)
          .id("nfi.nl", "extract", "TestPlugin") // new style
          .id(new PluginId("nfi.nl", "extract", "TestPlugin")) // old style, but also works
          ...
```

0.4.6

It is now possible to specify maximum system resources in the PluginInfo. To run a plugin with 0.5 cpu (= 0.5 vCPU/Core/hyperthread) and 1 gb memory, for example, the following configuration can be added to PluginInfo:
```
PluginInfo.builderFor(this)
    ...
    .pluginResources(PluginResources.builder()
        .maximumCpu(0.5f)
        .maximumMemory(1000)
        .build())
    .build();
```

0.4.0

Extraction Plugins are now identified with a PluginInfo.PluginId containing a domain, category and name. The method PluginInfo.name(pluginName) has been replaced by PluginInfo.id(new PluginId(domain, category, name). More details on the plugin naming conventions can be found at the Plugin naming convention section.
PluginInfo.name() is now deprecated (but will still work for backwards compatibility).
A new license field PluginInfo.license has also been added in this release.

The following example creates a PluginInfo for a plugin with the name TestPlugin, licensed under the Apache License 2.0 license:

PluginInfo.builderFor(this)
          .id(new PluginId("nfi.nl", "extract", "TestPlugin")) // id.domain: nfi.nl, id.category: extract, id.name: TestPlugin
          // .name("TestPlugin") // no longer supported
          .pluginVersion("0.4.1")
          .author(Author.builder()...build())
          .description("A plugin for testing.")
          .maturityLevel(MaturityLevel.PROOF_OF_CONCEPT)
          .hqlMatcher("*")
          .webpageUrl("https://www.hansken.org")
          .license("Apache License 2.0")
          .build();

0.3.0

Extraction Plugins can now create new datastreams on a Trace through data transformations. Data transformations describe how data can be obtained from a source.

An example case is an extraction plugin that processes an archive file. The plugin creates a child trace per entry in the archive file. Each child trace will have a datastream that is a transformation that marks the start and length of the entry in the original archive data. By just describing the data instead of specifying the actual data, a lot of space is saved.

Although Hansken supports various transformations, the Extraction Plugins SDK for now only supports ranged data transformations. Ranged data transformations define data as a list of ranges, each range with an offset and length in a bytearray.

The following example sets a new datastream with dataType html on a trace, by setting a ranged data transformation:
```
trace.setData("html", RangedDataTransformation.builder().addRange(offset, length).build());
```
The following example creates a child trace and sets a new datastream with dataType raw on it, by setting a ranged data transformation with two ranges:
```
trace.newChild(format("lineNumber %d", lineNumber), child -> {
    child.setData("raw", RangedDataTransformation.builder()
      .addRange(10, 20)
      .addRange(50, 30)
      .build());
});
```
More detailed documentation will follow in an upcoming SDK release.

0.2.0

Warning

This is an API breaking change. Plugins created with an earlier version of the extraction plugin SDK are not compatible with Hansken that uses 0.2.0 or later.

Introduced a new extraction plugin type DeferredExtractioPlugin. Deferred Extraction plugins can be run at a different extraction stage. This type of plugin also allows accessing other traces using the searcher.

The class ExtractionContext has been renamed to DataContext. The new name DataContext represents the class contents better. Plugins have to update matching import statements and the type in ExtractionPlugin.process() implementation in the same way. This change has no functional side effects.

Old:

import org.hansken.plugin.extraction.api.ExtractionContext;

@Override

public void process(final Trace trace, final ExtractionContext context) throws IOException {

}

New:

import org.hansken.plugin.extraction.api.DataContext;

@Override
public void process(final Trace trace, final DataContext dataContext) throws IOException {

}