How to debug an Extraction Plugin

Debugging is the art of removing bugs — hopefully quickly.

Locally

To debug a plugin locally, it is recommended to start the plugin via the IDE by running the integration test. This has the advantage that breakpoints can easily be put in the code instead of printing log statements, for example.

Logging

The logging of the extraction plugin is displayed in the console.

Locally with Docker

Debugging an extraction plugin via docker is a bit trickier. Java has the advantage that remote debugging is already baked in.

Using Java Remote Debug with Docker containers requires 3 distinct steps:

  1. Build a Docker image

  2. Run the Docker image with specific Java tool options

  3. Setting breakpoints in your code

Build a Docker image

If the Docker image is not built, run the following command to build the Docker image:

mvn package docker:build

Run the Docker image with specific Java tool options

In Java, the remote debug functionality is not enabled by default. To enable the remote debug functionality, the following environments variable must be set in the Docker container:

JAVA_TOOL_OPTIONS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005"

This environment variable allows the debugger to connect to the debuggee (application being debugged). To start the Docker image with the JAVA_TOOL_OPTIONS environment variable, the following command can be used:

docker run -p 5005:5005 -e JAVA_TOOL_OPTIONS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005" your_extraction_plugin_name

The next step is to attach the debugger to the debuggee. For Intellij, the instructions are clearly described on the following page: Tutorial: Remote debug

Setting breakpoints in the code

The last step is to add breakpoints in the code.

Logging in Docker

The logging of the extraction plugin is displayed in the console after running the docker run command. In addition, the logging is also displayed in the IntelliJ console while debugging.

Kubernetes

In kubernetes it is currently not possible to debug via Java Remote Debug because:

  • no debug ports are published;

  • the container was not started with the environment variable JAVA_TOOL_OPTIONS so debugging is not enabled.

Logging in Kubernetes

If there is authorization to the kubernetes cluster, the logging can be viewed with the following command:

kubectl logs -f hansken-extraction-plugins/your_extraction_plugin_pod

Debug HQL

An HQL query can be debugged by overriding the isVerboseLoggingEnabled() method of the ExtractionPluginFlits class. The example below shows an example of an embedded FLITS test with verbose logging enabled.

public class TestPluginFlitsIT extends EmbeddedExtractionPluginFlits {

    @Override
    public Path testPath() {
        return srcPath("integration/inputs/plugin");
    }

    @Override
    public Path resultPath() {
        return srcPath("integration/results/embedded/plugin");
    }

    @Override
    protected ExtractionPlugin pluginToTest() {
        return new TestPlugin();
    }

    @Override
    public boolean regenerate() {
        return true;
    }

    @Override
    protected boolean isVerboseLoggingEnabled() {
        return true;
    }
}

The following output will then be displayed in the console:

HQL match found for:
$data.type=jpg
With trace:
dataType=jpg
types={file, data}
properties={data.raw.mimeType=image/jpg, path=/test-input-trace, file.name=image.jpg, name=test-input-trace, id=0}

If the HQL query contains an error, it will be shown in the generated test results. An example of an invalid query is $data.mimeType=image/jpg (slash not escaped). This query will produce an error like the one shown below.

{
  "class": "org.hansken.plugin.extraction.hql_lite.lang.ParseException",
  "message": "HqlLiteHumanQueryParser: line 1:20 token recognition error at: '/jpg'"
}

Note

The error is only shown in the generated trace, so to find out the ParseException override the regenerate() method from Flits and then let this method return true.