How to debug an Extraction Plugin
Debugging is the art of removing bugs — hopefully quickly.
Locally
To debug a plugin locally, it is recommended to start the plugin via the IDE by running the integration test. This has the advantage that breakpoints can easily be put in the code instead of printing log statements, for example.
Logging
The logging of the extraction plugin is displayed in the console.
Locally with Docker
Debugging an extraction plugin via docker is a bit trickier. Java has the advantage that remote debugging is already baked in.
Using Java Remote Debug with Docker containers requires 3 distinct steps:
Build a Docker image
Run the Docker image with specific Java tool options
Setting breakpoints in your code
Build a Docker image
If the Docker image is not built, run the following command to build the Docker image:
mvn package docker:build
Run the Docker image with specific Java tool options
In Java, the remote debug functionality is not enabled by default. To enable the remote debug functionality, the following environments variable must be set in the Docker container:
JAVA_TOOL_OPTIONS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005"
This environment variable allows the debugger to connect to the debuggee (application being debugged). To start the
Docker image with the JAVA_TOOL_OPTIONS
environment variable, the following command can be used:
docker run -p 5005:5005 -e JAVA_TOOL_OPTIONS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005" your_extraction_plugin_name
The next step is to attach the debugger to the debuggee. For Intellij, the instructions are clearly described on the following page: Tutorial: Remote debug
Setting breakpoints in the code
The last step is to add breakpoints in the code.
Logging in Docker
The logging of the extraction plugin is displayed in the console after running the docker run
command. In addition,
the logging is also displayed in the IntelliJ console while debugging.
Kubernetes
In kubernetes it is currently not possible to debug via Java Remote Debug because:
no debug ports are published;
the container was not started with the environment variable
JAVA_TOOL_OPTIONS
so debugging is not enabled.
Logging in Kubernetes
If there is authorization to the kubernetes cluster, the logging can be viewed with the following command:
kubectl logs -f hansken-extraction-plugins/your_extraction_plugin_pod
Debug HQL
An HQL query can be debugged by overriding the isVerboseLoggingEnabled()
method of the ExtractionPluginFlits
class.
The example below shows an example of an embedded FLITS test with verbose logging enabled.
public class TestPluginFlitsIT extends EmbeddedExtractionPluginFlits {
@Override
public Path testPath() {
return srcPath("integration/inputs/plugin");
}
@Override
public Path resultPath() {
return srcPath("integration/results/embedded/plugin");
}
@Override
protected ExtractionPlugin pluginToTest() {
return new TestPlugin();
}
@Override
public boolean regenerate() {
return true;
}
@Override
protected boolean isVerboseLoggingEnabled() {
return true;
}
}
The following output will then be displayed in the console:
HQL match found for:
$data.type=jpg
With trace:
dataType=jpg
types={file, data}
properties={data.raw.mimeType=image/jpg, path=/test-input-trace, file.name=image.jpg, name=test-input-trace, id=0}
If the HQL query contains an error, it will be shown in the generated test results. An example of an invalid query
is $data.mimeType=image/jpg
(slash not escaped). This query will produce an error like the one shown below.
{
"class": "org.hansken.plugin.extraction.hql_lite.lang.ParseException",
"message": "HqlLiteHumanQueryParser: line 1:20 token recognition error at: '/jpg'"
}
Note
The error is only shown in the generated trace, so to find out the ParseException override
the regenerate()
method from Flits
and then let this method return true
.