lir.datasets.feature_data_csv module
- class lir.datasets.feature_data_csv.ExtraField(name: str, column_names: list[str], validate_cell: Callable[[str], Any])[source]
Bases:
NamedTupleExtra field for CSV parsing.
- column_names: list[str]
Alias for field number 1
- name: str
Alias for field number 0
- parse_row(row: dict[str, str]) list[Any][source]
Take the appropriate values from a dictionary and return them as a list.
- Parameters:
row (dict[str, str]) – CSV row dictionary to parse.
- Returns:
Parsed values extracted from the input row.
- Return type:
list[Any]
- validate_cell: Callable[[str], Any]
Alias for field number 2
- class lir.datasets.feature_data_csv.FeatureDataCsvFileParser(file: PathLike, **kwargs: Any)[source]
Bases:
FeatureDataCsvParserRead CSV data from file.
- Parameters:
file (PathLike) – Path to the input file.
**kwargs (Any) – Additional keyword arguments forwarded to the underlying FeatureDataCsvParser call.
- get_instances() FeatureData[source]
Retrieve FeatureData instances.
- Returns:
FeatureData object parsed from the source.
- Return type:
- class lir.datasets.feature_data_csv.FeatureDataCsvHttpParser(url: str, session: Session, **kwargs: Any)[source]
Bases:
FeatureDataCsvParserRead CSV data from a URL.
By default, this class uses requests-cache to cache retrieved data. The cache is persistent and located in the user cache folder, which is written to the log file.
- Parameters:
url (str) – URL of the remote resource to read.
session (requests.Session) – Value passed via
session.**kwargs (Any) – Additional keyword arguments forwarded to the underlying call.
- get_instances() FeatureData[source]
Retrieve FeatureData from the remote resource.
- Returns:
FeatureData object parsed from the source.
- Return type:
- class lir.datasets.feature_data_csv.FeatureDataCsvParser(source_id_column: str | list[str] | None = None, label_column: str | None = None, instance_id_column: str | None = None, role_assignment_column: str | None = None, fold_assignment_column: str | None = None, extra_fields: list[ExtraField] | None = None, ignore_columns: list[str] | None = None, head: int | None = None, message_prefix: str = '')[source]
Bases:
DataProvider,ABCParse a CSV file into a
FeatureDataobject.This is an abstract base class with concrete implementations for different data sources:
FeatureDataCsvFileParserfor reading from a local file;FeatureDataCsvHttpParserfor reading from a URL;FeatureDataCsvStreamParserfor reading from a stream.
- Parameters:
source_id_column (str | list[str] | None) – Column name(s) containing source identifiers.
label_column (str | None) – Column name containing class labels.
instance_id_column (str | None) – Column name containing instance identifiers.
role_assignment_column (str | None) – Column name containing predefined train/test roles.
fold_assignment_column (str | None) – Column name containing predefined fold assignments.
extra_fields (list[ExtraField] | None) – Optional extra fields to parse from each row.
ignore_columns (list[str] | None) – Column names ignored when extracting features.
head (int | None) – Maximum number of rows to read from the source.
message_prefix (str) – Prefix added to parser log and error messages.
Examples
Assume a CSV file containing two features and source identifiers:
source_id,feature1,feature2,feature3,name_of_an_irrelevant_column 0,1,10,1,sherlock 0,1,11,1,holmes 1,20,30,1,irene 1,18,32,3,adler 2,5,10,8,professor 2,1,11,8,moriarty
This file can be parsed using the following YAML configuration:
data: provider: parse_features_from_csv_file path: path/to/file.csv source_id_column: source_id ignore_columns: - name_of_an_irrelevant_column
data: provider: parse_features_from_csv_url url: https://raw.githubusercontent.com/NetherlandsForensicInstitute/elemental_composition_glass/refs/heads/main/training.csv source_id_column: Item ignore_columns: - id - Piece
- source_id_columns: list[str]
- class lir.datasets.feature_data_csv.FeatureDataCsvStreamParser(fp: IO, **kwargs: Any)[source]
Bases:
FeatureDataCsvParserRead data from a streamed CSV.
- Parameters:
fp (IO) – Open file-like object to read from.
**kwargs (Any) – Additional keyword arguments forwarded to the underlying call.
- get_instances() FeatureData[source]
Retrieve FeatureData instances from CSV stream.
- Returns:
FeatureData object parsed from the source.
- Return type: