# How to debug an Extraction Plugin Debugging is the art of removing bugs — hopefully quickly. ## Locally To debug a plugin locally, it is recommended to start the plugin via the IDE. This has the advantage that breakpoints can easily be put in the code instead of printing log statements, for example. To start a plugin locally, a piece of code must be added, see [Testing](testing.md) for more information. ### Logging The logging of the extraction plugin is displayed in the console. ## Locally with Docker Debugging an extraction plugin via docker is a bit trickier. In order to debug in Python, a debugger must be added to the extraction plugin. There are several debug modules for Python available, but one debug module that works well with Visual Studio Code is [debugpy](https://github.com/microsoft/debugpy). This package is developed by Microsoft specifically for use in Visual Studio Code with Python. .. note:: `debugpy` implements the Debug Adapter Protocol (DAP), which is a standardised way for development tools to communicate with debuggers. Using `debugpy` with Docker containers requires 4 distinct steps: 1. Install `debugpy` 2. Configuring `debugpy` in Python 3. Build a docker image 4. Configuring the connection to the Docker container 5. Setting breakpoints in your code ### Install `debugpy` First, add `debugpy` to your `setup.py`. ```python from setuptools import setup setup( # ... install_requires=[ "hansken-extraction-plugin==0.4.7", # the plugin SDK "debugpy==1.5.1" ] ) ``` ### Configuring `debugpy` in Python At the beginning of your script, import `debugpy`, and call `debugpy.listen()` to start the debug adapter, passing a `(host, port)` tuple as the first argument. Use the `debugpy.wait_for_client()` function to block program execution until the client is attached. ```python import debugpy debugpy.listen(("0.0.0.0", 5678)) debugpy.wait_for_client() # blocks execution until client is attached # your extraction plugin code ``` ### Build a Docker image If the Docker image is not built, first build the image as described [here](getting_started.md#Building a docker image for a Python plugin). ### Configuring the connection to the Docker container `debugpy` is now set up to accept connections inside a Docker container. To connect to `debugpy` in the docker container, port 5678 must be published. To make a port available to services outside of Docker, use the --publish or -p flag. This creates a firewall rule which maps a container port to a port on the Docker host to the outside world. To run the extraction plugin with the published port the following command can be used: ```bash docker run -p 5678:5678 your_extraction_plugin_name ``` The next step is to configure Visual Studio Code. A `launch.json` file must be created in order for Visual Studio Code to connect to the extraction plugin in Docker. This minimal `launch.json` example below tells the debugger to attach to `localhost` on port `5678`. ```json { "version": "0.2.0", "configurations": [ { "name": "Python: Remote Attach", "type": "python", "request": "attach", "connect": { "host": "localhost", "port": 5678 }, "pathMappings": [ { "localRoot": "${workspaceFolder}", "remoteRoot": "." } ] } ] } ``` ### Setting breakpoints in the code The last step is to add breakpoints in the code. ### Logging in Docker The logging of the extraction plugin is displayed in the console after running the `docker run` command. In addition, the logging is also displayed in the Visual Studio Code console while debugging. ## Kubernetes In kubernetes it is currently _not_ possible to debug via `debugpy` because no debug ports are published. ### Logging in Kubernetes If there is authorization to the kubernetes cluster, the logging can be viewed with the following command: ```bash kubectl logs -f hansken-extraction-plugins/your_extraction_plugin_pod ``` ## Debug HQL An HQL query can be debugged by running the test framework with the `--verbose` option enabled. The found HQL matches will then be displayed in the console. To test a plugin in python with the `--verbose` option enabled use the following command: ```bash test_plugin --standalone plugin/your_plugin.py --regenerate --verbose ``` The following output will then be displayed in the console: ```text HQL match found for: $data.type=jpg With trace: dataType=jpg types={file, data} properties={data.raw.mimeType=image/jpg, path=/test-input-trace, file.name=image.jpg, name=test-input-trace, id=0} ``` If the HQL query contains an error, it will be shown in the generated test results. An example of an invalid query is ``$data.mimeType=image/jpg`` (slash not escaped). This query will produce an error like the one shown below. ```json { "class": "org.hansken.plugin.extraction.hql_lite.lang.ParseException", "message": "HqlLiteHumanQueryParser: line 1:20 token recognition error at: '/jpg'" } ``` .. note:: The error is only shown in the generated trace, so to find out the `ParseException` run the ``test_plugin`` command with the ``--regenerate`` option enabled.