Packaging

Extraction plugins are packaged as OCI images (also known as Docker images). The OCI images are labeled with the PluginInfo. To automate packaging of a Python plugin and labeling the OCI image, the Extraction Plugin SDK comes with two utility applications: label_plugin, and build_plugin.

To package a plugin, make sure that the Extraction Plugins SDK is installed, as well as Docker. Next build and label your plugin as described in the following sections.

To verify that the image has been built, use the following command to view all local images:

docker images

Once your plugin is packaged and labelled, it can be published or ‘uploaded’ to Hansken. See “Upload the plugin to Hansken” for instructions.

label_plugin

label_plugin is a utility to add labels to an extraction plugin image. To label a plugin, first build the plugin image with docker build; for example by using one of the following commands:

docker build . -t my_plugin
docker build . -t my_plugin --build-arg https_proxy=http://your_proxy:8080

Next, run the label_plugin utility to label the build plugin container:

label_plugin my_plugin

This utility will briefly start your plugin using Docker, and requests the PluginInfo from the plugin. The information from the PluginInfo will be added as labels to the plugin image. The result of label_plugin is a plugin image that can be published to a docker/OCI image registry.

build_plugin

The build_plugin extends label_plugin by also taking care of the docker build command. Use this as an one-liner to both build and label your plugin image.

To build your plugin container image you can use the following command:

build_plugin DOCKER_FILE_DIRECTORY [DOCKER_IMAGE_NAME] [DOCKER_ARGS]

For example:

build_plugin .

and to pass proxy configurations to Docker:

build_plugin . --build-arg http_proxy="$http_proxy" --build-arg https_proxy="$https_proxy"

This will generate a plugin image:

  • The extraction plugin is added to your local image registry (docker images),

  • Note that the variables $http_proxy and $https_proxy are put in quotes, this is needed in case they contain spaces,

  • The image is tagged with two tags: latest, and your plugin version.

Arguments:

  • DOCKER_FILE_DIRECTORY: Path to the directory containing the Dockerfile of the plugin.

  • (Optional) DOCKER\_IMAGE\_NAME: Name of the docker image without tag. Note that docker image names cannot start with a period or dash. If it starts with a dash, it will be interpreted as an additional docker argument (see DOCKER_ARGS). If no name is given the name defaults to extraction-plugin/PLUGINID, e.g. extraction-plugin/nfi.nl/extract/chat/whatsapp.

  • (Optional) DOCKER\_ARGS: Additional arguments for the docker command, which can be as many arguments as you like.