Java code snippets

This page contains Java code snippets for common patterns that will be used when writing a plugin.

RandomAccessData as InputStream

In Java, InputStream is a common type to pass data to another class or method. The SDK provides a simple utility to use a RandomAccessData as InputStream.

Add the following import to your code:

import org.hansken.plugin.extraction.core.data.RandomAccessDatas;

Next we can create an InputStream from the RandomAccessData as shown in the following snippet. Note that the InputStream is created using a try-with-resources-statement. This ensures that the InputStream is correctly closed when the InputStream is no longer required.

RandomAccessData traceData=...;
    try(InputStream asInputStream=RandomAccessDatas.asInputStream(traceData)){
    // use the InputStream here
    }

Notes:

  • the created InputStream is not thread-safe,

  • the created InputStream changes state in the provided RandomAccessData (e.g. when data is read, the position updated in both the InputStream and the RandomAccessData instances),

  • for more details on the implementation of the InputStream, refer to the RandomAccessDataInputStream JavaDoc.

Adding tracelets

In the following Java example, a “classification” tracelet is added to a trace. The tracelet consists of a list of four properties, namely “class”, “confidence”, “modelName” and “modelVersion”.

trace.addTracelet("prediction", tracelet -> tracelet
     .set("type", "classification")
     .set("class", "telephone")
     .set("label", "label")
     .set("confidence", 0.8f)
     .set("embedding", Vector.of(1,2,3))
     .set("modelName", "yolo")
     .set("modelVersion", "2.0"));

or

trace.addTracelet(new Tracelet("prediction", List.of(
    new TraceletProperty("prediction.type","classification"),
    new TraceletProperty("prediction.class","telephone"),
    new TraceletProperty("prediction.label","label"),
    new TraceletProperty("prediction.confidence",0.8f))),
    new TraceletProperty("prediction.embedding", Vector.of(1,2,3)),
    new TraceletProperty("prediction.modelName","yolo"),
    new TraceletProperty("prediction.modelVersion","2.0"));

Adding data to a trace

Traces can have data attached to them. See Data streams for more information. The following two snippets demonstrate how to add data to a trace.

It is currently not possible to verify that a specific data stream is already set or not.

Data Transformations

The most efficient way to add data to a trace is using data transformations. See Data Transformations for more details.

The following example sets a new data stream with dataType html on a trace, by setting a ranged data transformation:

trace.setData("html", RangedDataTransformation.builder().addRange(offset, length).build());

The following example creates a child trace and sets a new datastream with dataType raw on it, by setting a ranged data transformation with two ranges:

trace.newChild(format("lineNumber %d", lineNumber), child -> {
    child.setData("raw", RangedDataTransformation.builder()
                  .addRange(10, 20)
                  .addRange(50, 30)
                  .build());
});

Blobs

It is not always possible to create a transormation for the data that has to be added to a trace. For example the data is a result of a computation, and not a direct subset of another data stream..

The following examples show how to creates a new data stream of dataType raw on a trace.

In case all data is stored in a byte[], we can add the byte array to the data stream with:

final byte[] rawBytes = {.....};
trace.setData("raw", writer -> writer.write(rawBytes));

Alternatively, if the data is available in an InputStream the data can be added with:

final InputStream inputStream = ...;
trace.setData("raw", inputStream);

Specifying system resources

In the PluginInfo you can specify maximum system resource metrics for a plugin. These are used for scaling the number of pods as described here. To run a plugin with 0.5 cpu (= 0.5 vCPU/Core/hyperthread), 1 gb memory and 10 (concurrent) cpu workers (threads), for example, the following configuration can be added to PluginInfo:

PluginInfo.builderFor(this)
    ...
    .pluginResources(PluginResources.builder()
    .maximumCpu(0.5f)
    .maximumMemory(1000)
    .maximumWorkers(10)
    .build())
    .build();

Deferred Extraction Plugins

Using a deferred plugin requires inheriting the DeferredExtractionPlugin base class. This allows access to a TraceSearcher object in the process function to search for traces.

public class ExampleDeferred extends DeferredExtractionPlugin {
    @Override
    public PluginInfo pluginInfo();

    @Override
    public void process(final Trace trace, final ExtractionContext context,
                        final TraceSearcher searcher) {
        final SearchResult result = searcher.search("file.extension=asc", 10);
    }
}

The search method accepts a HQL query and a count, which represents the maximum number of traces to return. It may be useful to specifically search for traces from the image being extracted. Add "image:" + trace.get("image") to your query. The query of the provided example could be extended like this: "file.extension = asc AND image:" + trace.get("image").

The traces contained in the SearchResult are returned as a stream.

final Stream<Trace> stream = result.getTraces();
stream.limit(5);

Logging

The logging is provided by Log4j 2 with a SLF4J binding. The Log4j 2 SLF4J binding allows applications coded to the SLF4J API to use Log4j 2 as the implementation.

Usage

Here is an example illustrating how to log something with SLF4J. It begins by getting a logger with the name “LOG”. This logger is in turn used to log the message I'm logging a variable 1234!.

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class Example {
    private static final Logger LOG = LoggerFactory.getLogger(Example.class);

    public void example() {
        final int aNumber = 1234;
        // logs to console: I'm logging a variable 1234!
        LOG.info("I'm logging a variable {}!", aNumber);
    }
}

Customize logging

It’s easy to change the logging format with a file called log4j2.xml. If desired, this file must be in the resources folder, for example src/main/resources/log4j2.xml

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
    <appenders>
        <console name="stdout" target="SYSTEM_OUT">
            <patternLayout
                    pattern="%-5p|%d{yyyy-MM-dd HH:mm:ss}|%-20.20t|%-32.32c{1}|%m%n"/>
        </console>
    </appenders>
    <loggers>
        <root level="info">
            <appenderRef ref="stdout"/>
        </root>
    </loggers>
</configuration>

Warning

Be careful with logging sensitive information.

Note

More information about customizing the logging can be found here.

Note

The default logger is pre-configured to log INFO to STDOUT (see the configuration above)

Note

Log4j 2 supports various logging formats, including xml, yaml, json, properties, etc. Currently, only the xml format is supported.

Note

Contact your Hansken administrator for more information on where to find logs for your Hansken environment.

[EXPERIMENTAL FEATURE] Adding previews to a trace

Warning

This is an experimental feature, which might change or get removed in future releases.

Example:

public class ExamplePlugin extends ExtractionPlugin {
  @Override
  public PluginInfo pluginInfo();

  @Override
  public void process(final Trace trace, final DataContext context) {
    final byte[] previewData;
    // set the preview data for the image/png MIME-type
    trace.set("preview.image/png", previewData);
    trace.set("preview.image/png", previewData);
  }
}