textflint.input_layer.dataset.dataset

dataset: textflint dataset

class textflint.input_layer.dataset.dataset.Dataset(task='UT')[source]

Bases: object

Any iterable of (label, text_input) pairs qualifies as a Dataset.

__init__(task='UT')[source]
Parameters

task (str) – indicate data sample format.

free()[source]

Fully clear dataset.

dump()[source]

Return dataset in json object format.

load(dataset)[source]

Loads json object and prepares it as a Dataset.

Support two formats input, Example:

  1. {‘x’: [

    ‘The robustness of deep neural networks has received much attention recently’, ‘We focus on certified robustness of smoothed classifiers in this work’, …, ‘our approach exceeds the state-of-the-art.’ ],

    ‘y’: [

    ‘neural’, ‘positive’, …, ‘positive’ ]}

    1. [

      {‘x’: ‘The robustness of deep neural networks has received much attention recently’, ‘y’: ‘neural’}, {‘x’: ‘We focus on certified robustness of smoothed classifiers in this work’, ‘y’: ‘positive’}, …, {‘x’: ‘our approach exceeds the state-of-the-art.’, ‘y’: ‘positive’} ]

Parameters

dataset (list|dict) –

Returns

load_json(json_path, encoding='utf-8', fields=None, dropna=True)[source]

Loads json file, each line of the file is a json string.

Parameters
  • json_path – file path

  • encoding – file’s encoding, default: utf-8

  • fields – json object’s fields that needed, if None, all fields are needed. default: None

  • dropna – weather to ignore and drop invalid data, :if False, raise ValueError when reading invalid data. default: True

Returns

load_csv(csv_path, encoding='utf-8', headers=None, sep=',', dropna=True)[source]

Loads csv file, one line correspond one sample.

Parameters
  • csv_path – file path

  • encoding – file’s encoding, default: utf-8

  • headers – file’s headers, if None, make file’s first line as headers. default: None

  • sep – separator for each column. default: ‘,’

  • dropna – weather to ignore and drop invalid data, :if False, raise ValueError when reading invalid data. default: True

Returns

load_hugging_face(name, subset='train')[source]

Loads a dataset from HuggingFace datasets and prepares it as a Dataset.

Parameters
  • name – the dataset name

  • subset – the subset of the main dataset.

Returns

append(data_sample, sample_id=- 1)[source]

Load single data sample and append to dataset.

Parameters
  • data_sample (dict|sample) –

  • sample_id (int) – useful to identify sample, default -1

Returns

True / False indicate whether append action successful.

extend(data_samples)[source]

Load multi data samples and extend to dataset.

Parameters

data_samples (list|dict|Sample) –

Returns

static norm_input(data_samples)[source]

Convert various data input to list of dict. Example:

 {'x': [
          'The robustness of deep neural networks has received
          much attention recently',
          'We focus on certified robustness of smoothed classifiers
          in this work',
          ...,
          'our approach exceeds the state-of-the-art.'
      ],
 'y': [
          'neural',
          'positive',
          ...,
          'positive'
      ]
}
convert to
[
    {'x': 'The robustness of deep neural networks has received
    much attention recently', 'y': 'neural'},
    {'x': 'We focus on certified robustness of smoothed classifiers
    in this work', 'y': 'positive'},
    ...,
    {'x': 'our approach exceeds the state-of-the-art.',
    'y': 'positive'}
]
Parameters

data_samples (list|dict|Sample) –

Returns

Normalized data.

save_csv(out_path, encoding='utf-8', headers=None, sep=',')[source]

Save dataset to csv file.

Parameters
  • out_path – file path

  • encoding – file’s encoding, default: utf-8

  • headers – file’s headers, if None, make file’s first line as headers. default: None

  • sep – separator for each column. default: ‘,’

Returns

save_json(out_path, encoding='utf-8', fields=None)[source]

Save dataset to json file which contains json object in each line.

Parameters
  • out_path – file path

  • encoding – file’s encoding, default: utf-8

  • fields – json object’s fields that needed, if None, all fields are needed. default: None

Returns