How to debug an Extraction Plugin
Debugging is the art of removing bugs — hopefully quickly.
Locally
To debug a plugin locally, it is recommended to start the plugin via the IDE. This has the advantage that breakpoints can easily be put in the code instead of printing log statements, for example. To start a plugin locally, a piece of code must be added, see Testing for more information.
Logging
The logging of the extraction plugin is displayed in the console.
Locally with Docker
Debugging an extraction plugin via docker is a bit trickier. In order to debug in Python, a debugger must be added to the extraction plugin. There are several debug modules for Python available, but one debug module that works well with Visual Studio Code is debugpy. This package is developed by Microsoft specifically for use in Visual Studio Code with Python.
Note
debugpy implements the Debug Adapter Protocol (DAP), which is a standardised way for development tools to communicate with debuggers.
Using debugpy
with Docker containers requires 4 distinct steps:
Install
debugpy
Configuring
debugpy
in PythonBuild a docker image
Configuring the connection to the Docker container
Setting breakpoints in your code
Install debugpy
First, add debugpy
to your setup.py
.
from setuptools import setup
setup(
# ...
install_requires=[
"hansken-extraction-plugin==0.4.7", # the plugin SDK
"debugpy==1.5.1"
]
)
Configuring debugpy
in Python
At the beginning of your script, import debugpy
, and call debugpy.listen()
to start the debug adapter, passing
a (host, port)
tuple as the first argument. Use the debugpy.wait_for_client()
function to block program execution
until the client is attached.
import debugpy
debugpy.listen(("0.0.0.0", 5678))
debugpy.wait_for_client() # blocks execution until client is attached
# your extraction plugin code
Build a Docker image
If the Docker image is not built, first build the image as described here.
Configuring the connection to the Docker container
debugpy
is now set up to accept connections inside a Docker container. To connect to debugpy
in the docker
container, port 5678 must be published. To make a port available to services outside of Docker, use the –publish or -p
flag. This creates a firewall rule which maps a container port to a port on the Docker host to the outside world.
To run the extraction plugin with the published port the following command can be used:
docker run -p 5678:5678 your_extraction_plugin_name
The next step is to configure Visual Studio Code. A launch.json
file must be created in order for Visual Studio Code
to connect to the extraction plugin in Docker. This minimal launch.json
example below tells the debugger to attach
to localhost
on port 5678
.
{
"version": "0.2.0",
"configurations": [
{
"name": "Python: Remote Attach",
"type": "python",
"request": "attach",
"connect": {
"host": "localhost",
"port": 5678
},
"pathMappings": [
{
"localRoot": "${workspaceFolder}",
"remoteRoot": "."
}
]
}
]
}
Setting breakpoints in the code
The last step is to add breakpoints in the code.
Logging in Docker
The logging of the extraction plugin is displayed in the console after running the docker run
command. In addition,
the logging is also displayed in the Visual Studio Code console while debugging.
Kubernetes
In kubernetes it is currently not possible to debug via debugpy
because no debug ports are published.
Logging in Kubernetes
If there is authorization to the kubernetes cluster, the logging can be viewed with the following command:
kubectl logs -f hansken-extraction-plugins/your_extraction_plugin_pod
Debug HQL
An HQL query can be debugged by running the test framework with the --verbose
option enabled. The found HQL matches
will then be displayed in the console. To test a plugin in python with the --verbose
option enabled use the following
command:
test_plugin --standalone plugin/your_plugin.py --regenerate --verbose
The following output will then be displayed in the console:
HQL match found for:
$data.type=jpg
With trace:
dataType=jpg
types={file, data}
properties={data.raw.mimeType=image/jpg, path=/test-input-trace, file.name=image.jpg, name=test-input-trace, id=0}
If the HQL query contains an error, it will be shown in the generated test results. An example of an invalid query
is $data.mimeType=image/jpg
(slash not escaped). This query will produce an error like the one shown below.
{
"class": "org.hansken.plugin.extraction.hql_lite.lang.ParseException",
"message": "HqlLiteHumanQueryParser: line 1:20 token recognition error at: '/jpg'"
}
Note
The error is only shown in the generated trace, so to find out the ParseException run the test_plugin
command with the --regenerate
option enabled.