textflint.input_layer.dataset.dataset¶
dataset: textflint dataset¶
-
class
textflint.input_layer.dataset.dataset.
Dataset
(task='UT')[source]¶ Bases:
object
Any iterable of (label, text_input) pairs qualifies as a
Dataset
.-
load
(dataset)[source]¶ Loads json object and prepares it as a Dataset.
Support two formats input, Example:
- {‘x’: [
‘The robustness of deep neural networks has received much attention recently’, ‘We focus on certified robustness of smoothed classifiers in this work’, …, ‘our approach exceeds the state-of-the-art.’ ],
- ‘y’: [
‘neural’, ‘positive’, …, ‘positive’ ]}
- [
{‘x’: ‘The robustness of deep neural networks has received much attention recently’, ‘y’: ‘neural’}, {‘x’: ‘We focus on certified robustness of smoothed classifiers in this work’, ‘y’: ‘positive’}, …, {‘x’: ‘our approach exceeds the state-of-the-art.’, ‘y’: ‘positive’} ]
- Parameters
dataset (list|dict) –
- Returns
-
load_json
(json_path, encoding='utf-8', fields=None, dropna=True)[source]¶ Loads json file, each line of the file is a json string.
- Parameters
json_path – file path
encoding – file’s encoding, default: utf-8
fields – json object’s fields that needed, if None, all fields are needed. default: None
dropna – weather to ignore and drop invalid data, :if False, raise ValueError when reading invalid data. default: True
- Returns
-
load_csv
(csv_path, encoding='utf-8', headers=None, sep=',', dropna=True)[source]¶ Loads csv file, one line correspond one sample.
- Parameters
csv_path – file path
encoding – file’s encoding, default: utf-8
headers – file’s headers, if None, make file’s first line as headers. default: None
sep – separator for each column. default: ‘,’
dropna – weather to ignore and drop invalid data, :if False, raise ValueError when reading invalid data. default: True
- Returns
-
load_hugging_face
(name, subset='train')[source]¶ Loads a dataset from HuggingFace
datasets
and prepares it as a Dataset.- Parameters
name – the dataset name
subset – the subset of the main dataset.
- Returns
-
append
(data_sample, sample_id=- 1)[source]¶ Load single data sample and append to dataset.
- Parameters
data_sample (dict|sample) –
sample_id (int) – useful to identify sample, default -1
- Returns
True / False indicate whether append action successful.
-
extend
(data_samples)[source]¶ Load multi data samples and extend to dataset.
- Parameters
data_samples (list|dict|Sample) –
- Returns
-
static
norm_input
(data_samples)[source]¶ Convert various data input to list of dict. Example:
{'x': [ 'The robustness of deep neural networks has received much attention recently', 'We focus on certified robustness of smoothed classifiers in this work', ..., 'our approach exceeds the state-of-the-art.' ], 'y': [ 'neural', 'positive', ..., 'positive' ] } convert to [ {'x': 'The robustness of deep neural networks has received much attention recently', 'y': 'neural'}, {'x': 'We focus on certified robustness of smoothed classifiers in this work', 'y': 'positive'}, ..., {'x': 'our approach exceeds the state-of-the-art.', 'y': 'positive'} ]
- Parameters
data_samples (list|dict|Sample) –
- Returns
Normalized data.
-
save_csv
(out_path, encoding='utf-8', headers=None, sep=',')[source]¶ Save dataset to csv file.
- Parameters
out_path – file path
encoding – file’s encoding, default: utf-8
headers – file’s headers, if None, make file’s first line as headers. default: None
sep – separator for each column. default: ‘,’
- Returns
-