hansken_extraction_plugin.api.extraction_trace

This module contains the different Trace apis.

Note that there are a couple of different traces:
  • The ExtractionTrace and MetaExtractionTrace, which are offered to the process function.

  • ExtractionTraceBuilder, which is a trace that can be built; it does not exist in hansken yet, but it is added after building.

  • SearchTrace, which represents an immutable trace which is returned after searching for traces.

Classes

ExtractionTrace()

Trace offered to be processed.

ExtractionTraceBuilder()

ExtractionTrace that can be build.

MetaExtractionTrace()

MetaExtractionTraces contain only metadata.

SearchTrace()

SearchTraces represent traces returned when searching for traces.

Trace()

All trace classes should be able to return values.

class ExtractionTraceBuilder[source]

Bases: ABC

ExtractionTrace that can be build.

Represents child traces.

abstract update(key_or_updates: Mapping | str | None = None, value: Any | None = None, data: Mapping[str, bytes] | None = None) ExtractionTraceBuilder[source]

Update or add metadata properties for this .ExtractionTraceBuilder.

Can be used to update the name of the Trace represented by this builder, if not already set.

Parameters:
  • key_or_updates – either a str (the metadata property to be updated) or a mapping supplying both keys and values to be updated

  • value – the value to update metadata property key to (used only when key_or_updates is a str, an exception will be thrown if key_or_updates is a mapping)

  • data – a dict mapping data type / stream name to bytes to be added to the trace

Returns:

this .ExtractionTraceBuilder

abstract add_tracelet(tracelet: Tracelet | str, value: Mapping[str, Any] | None = None) ExtractionTraceBuilder[source]

Add a .Tracelet to this .ExtractionTraceBuilder.

Parameters:
  • tracelet – the Tracelet or tracelet type (supplied as a str) to add

  • value – the tracelet properties to add (only applicable when tracelet is a str)

Returns:

this .ExtractionTraceBuilder

abstract add_transformation(data_type: str, transformation: Transformation) ExtractionTraceBuilder[source]

Update or add transformations for this .ExtractionTraceBuilder.

Parameters:
  • data_type – data type of the Transformation

  • transformation – the Transformation to add

Returns:

this .ExtractionTraceBuilder

abstract child_builder(name: str | None = None) ExtractionTraceBuilder[source]

Create a new .TraceBuilder to build a child trace to the trace to be represented by this builder.

Note

Traces should be created and built in depth first order, parent before child (pre-order).

Returns:

a .TraceBuilder set up to save a new trace as the child trace of this builder

add_data(stream: str, data: bytes) ExtractionTraceBuilder[source]

Add data to this trace as a named stream.

Parameters:
  • stream – name of the data stream to be added

  • data – data to be attached

Returns:

this .ExtractionTraceBuilder

abstract open(data_type: str | None = None, offset: int = 0, size: int | None = None, mode: Literal['rb', 'wb', 'w', 'wt'] = 'rb', encoding='utf-8', buffer_size: int | None = None) BufferedReader | BufferedWriter | TextIOBase[source]

Open a data stream to read or write data from or to the .ExtractionTrace.

Parameters:
  • data_type – the data type of the datastream, ‘raw’ by default

  • offset – byte offset to start the stream on when reading

  • size – the number of bytes to make available when reading

  • mode – ‘rb’ for reading, ‘wb’ for writing

  • encoding – encoding for writing text, used to convert str values to bytes, only valid for modes ‘w’ and ‘wt’

  • buffer_size – buffer size for reading (cache read back/ahead) or writing (cache for flush) data

Returns:

a file-like object to read or write bytes from the named stream

abstract build() str[source]

Save the trace being built by this builder to remote.

Note

Building more than once will result in an error being raised.

Returns:

the new trace’ id

class Trace[source]

Bases: ABC

All trace classes should be able to return values.

abstract get(key: str, default: Any | None = None) Any[source]

Return metadata properties for this .ExtractionTrace.

Parameters:
  • key – the metadata property to be retrieved

  • default – value returned if property is not set

Returns:

the value of the requested metadata property

class SearchTrace[source]

Bases: Trace

SearchTraces represent traces returned when searching for traces.

abstract open(stream: str = 'raw', offset: int = 0, size: int | None = None, buffer_size: int | None = None) BufferedReader[source]

Open a data stream of the data that is being processed.

Parameters:
  • stream – data stream of trace to open. defaults to raw. other examples are html, text, etc.

  • offset – byte offset to start the stream on

  • size – the number of bytes to make available

  • buffer_size – buffer size for reading data

Returns:

a file-like object to read bytes from the named stream

class MetaExtractionTrace[source]

Bases: Trace

MetaExtractionTraces contain only metadata.

This class represenst traces during the extraction of an extraction plugin without a data stream.

abstract update(key_or_updates: Mapping | str | None = None, value: Any | None = None, data: Mapping[str, bytes] | None = None) None[source]

Update or add metadata properties for this .ExtractionTrace.

Parameters:
  • key_or_updates – either a str (the metadata property to be updated) or a mapping supplying both keys and values to be updated

  • value – the value to update metadata property key to (used only when key_or_updates is a str, an exception will be thrown if key_or_updates is a mapping)

  • data – a dict mapping data type / stream name to bytes to be added to the trace

abstract add_tracelet(tracelet: Tracelet | str, value: Mapping[str, Any] | None = None) None[source]

Add a .Tracelet to this .MetaExtractionTrace.

Parameters:
  • tracelet – the Tracelet or tracelet type to add

  • value – the tracelet properties to add (only applicable when tracelet is a tracelet type)

abstract add_transformation(data_type: str, transformation: Transformation) None[source]

Update or add transformations for this .ExtractionTraceBuilder.

Parameters:
  • data_type – data type of the Transformation

  • transformation – the Transformation to add

abstract child_builder(name: str | None = None) ExtractionTraceBuilder[source]

Create a .TraceBuilder to build a trace to be saved as a child of this .Trace.

A new trace will only be added to the index once explicitly saved (e.g. through .TraceBuilder.build).

Note

Traces should be created and built in depth first order, parent before child (pre-order).

Parameters:

name – the name for the trace being built

Returns:

a .TraceBuilder set up to create a child trace of this .MetaExtractionTrace

class ExtractionTrace[source]

Bases: MetaExtractionTrace

Trace offered to be processed.

abstract open(data_type: str | None = None, offset: int = 0, size: int | None = None, mode: Literal['rb', 'wb', 'w', 'wt'] = 'rb', encoding='utf-8', buffer_size: int | None = None) BufferedReader | BufferedWriter | TextIOBase[source]

Open a data stream to read or write data from or to the .ExtractionTrace.

Parameters:
  • data_type – the data type of the datastream, ‘raw’ by default

  • offset – byte offset to start the stream on when reading

  • size – the number of bytes to make available when reading

  • mode – ‘rb’ for reading, ‘wb’ for writing

  • encoding – encoding for writing text, used to convert str values to bytes, only valid for modes ‘w’ and ‘wt’

  • buffer_size – buffer size for reading (cache read back/ahead) or writing (cache for flush) data

Returns:

a file-like object to read or write bytes from the named stream