textflint.adapter¶

TextFlint Adapter Class¶

textflint.adapter.auto_config(task='UT', config=None)[source]¶

Check config input or create config automatically.

Parameters

task (str) – task name
config (str|dict|textflint.config.Config) – config to control generation procedure.

Returns

textflint.config.Config instance.

textflint.adapter.auto_dataset(data_input=None, task='UT')[source]¶

Create Dataset instance and load data input automatically.

Parameters

data_input (dict|list|string) – json object or json/csv file.
task (str) – task name.

Returns

textflint.Dataset instance.

textflint.adapter.auto_flintmodel(model, task)[source]¶

Check flint model type and whether compatible to task.

Parameters

model (textflint.FlintModel|str) – FlintModel instance or python file path which contains FlintModel instance
task (str) – task name

Returns

textflint.FlintModel

textflint.adapter.auto_generator(config_obj)[source]¶

Automatic create task generator to apply transformations, subpopulations and adversarial attacks.

Parameters: config_obj (textflint.Config) – Config instance.
Returns: textflint.Generator

textflint.adapter.auto_report_generator()[source]¶

Return a ReportGenerator instance.

Returns: ReportGenerator

class textflint.adapter.Config(task='UT', out_dir=None, max_trans=1, random_seed=1, fields=None, flint_model=None, trans_methods=None, trans_config=None, return_unk=True, sub_methods=None, sub_config=None, attack_methods=None, validate_methods=None, **kwargs)[source]¶

Bases: object

Hold some config params to control generation and report procedure.

__init__(task='UT', out_dir=None, max_trans=1, random_seed=1, fields=None, flint_model=None, trans_methods=None, trans_config=None, return_unk=True, sub_methods=None, sub_config=None, attack_methods=None, validate_methods=None, **kwargs)[source]¶

Parameters

task (str) – task name
out_dir (string) – out dir for saving generated samples, default current path.
max_trans (int) – maximum transformed samples generate by one original sample pre Transformation.
random_seed (int) – random number seed to reproduce generation.
fields (str|list[str]) – fields on which new samples are generated.

::param str model_file: path to the python file containing: the FlintModel instance which named ‘model’.

Parameters

trans_methods (list) – indicate what transformations to apply to dataset.
trans_config (dict) – parameters for the initialization of the transformation instances.
return_unk (bool) – whether apply transformations which may influence label of sample.
sub_methods (list) – indicate what subpopulations to apply to dataset.
sub_config (dict) – parameters for the initialization of the subpopulation instances.
attack_methods (str) – path to the python file containing the Attack instances which named “attacks”.
validate_methods (str|list[str]) – indicate use which validate methods to calculate confidence of generated samples.

check_config()[source]¶: Check common config params.

get_generate_methods(methods, task_to_methods, allow_pipeline=False)[source]¶

Validate transformation or subpopulation methods.

Watch out! Some UT transformations/subpopulations may not compatible with your task, please choose your method carefully.

Parameters

methods (list) – transformation or subpopulation need to apply to dataset. If not provide, return default generated methods.
task_to_methods (dict) – map allowed methods by task name.
allow_pipeline (bool) – whether allow pipeline input

Returns

list of transformation/subpopulation.

classmethod from_dict(json_object)[source]¶: Constructs a Config from a Python dictionary of parameters.

classmethod from_json_file(json_file)[source]¶: Constructs a Config from a json file of parameters.

to_dict()[source]¶: Serializes this instance to a Python dictionary.

to_json_string()[source]¶: Serializes this instance to a JSON string.

to_json_file(json_file)[source]¶: Serializes this instance to a JSON file.

class textflint.adapter.Dataset(task='UT')[source]¶

Bases: object

Any iterable of (label, text_input) pairs qualifies as a Dataset.

__init__(task='UT')[source]¶

Parameters: task (str) – indicate data sample format.

free()[source]¶: Fully clear dataset.

dump()[source]¶: Return dataset in json object format.

load(dataset)[source]¶

Loads json object and prepares it as a Dataset.

Support two formats input, Example:

{‘x’: [
‘The robustness of deep neural networks has received much attention recently’, ‘We focus on certified robustness of smoothed classifiers in this work’, …, ‘our approach exceeds the state-of-the-art.’ ],

‘y’: [
‘neural’, ‘positive’, …, ‘positive’ ]}
1. [
  {‘x’: ‘The robustness of deep neural networks has received much attention recently’, ‘y’: ‘neural’}, {‘x’: ‘We focus on certified robustness of smoothed classifiers in this work’, ‘y’: ‘positive’}, …, {‘x’: ‘our approach exceeds the state-of-the-art.’, ‘y’: ‘positive’} ]

Parameters: dataset (list|dict) –
Returns

load_json(json_path, encoding='utf-8', fields=None, dropna=True)[source]¶

Loads json file, each line of the file is a json string.

Parameters

json_path – file path
encoding – file’s encoding, default: utf-8
fields – json object’s fields that needed, if None, all fields are needed. default: None
dropna – weather to ignore and drop invalid data, :if False, raise ValueError when reading invalid data. default: True

Returns

load_csv(csv_path, encoding='utf-8', headers=None, sep=',', dropna=True)[source]¶

Loads csv file, one line correspond one sample.

Parameters

csv_path – file path
encoding – file’s encoding, default: utf-8
headers – file’s headers, if None, make file’s first line as headers. default: None
sep – separator for each column. default: ‘,’
dropna – weather to ignore and drop invalid data, :if False, raise ValueError when reading invalid data. default: True

Returns

load_hugging_face(name, subset='train')[source]¶

Loads a dataset from HuggingFace datasets and prepares it as a Dataset.

Parameters

name – the dataset name
subset – the subset of the main dataset.

Returns

append(data_sample, sample_id=- 1)[source]¶

Load single data sample and append to dataset.

Parameters

data_sample (dict|sample) –
sample_id (int) – useful to identify sample, default -1

Returns

True / False indicate whether append action successful.

extend(data_samples)[source]¶

Load multi data samples and extend to dataset.

Parameters: data_samples (list|dict|Sample) –
Returns

static norm_input(data_samples)[source]¶

Convert various data input to list of dict. Example:

 {'x': [
          'The robustness of deep neural networks has received
          much attention recently',
          'We focus on certified robustness of smoothed classifiers
          in this work',
          ...,
          'our approach exceeds the state-of-the-art.'
      ],
 'y': [
          'neural',
          'positive',
          ...,
          'positive'
      ]
}
convert to
[
    {'x': 'The robustness of deep neural networks has received
    much attention recently', 'y': 'neural'},
    {'x': 'We focus on certified robustness of smoothed classifiers
    in this work', 'y': 'positive'},
    ...,
    {'x': 'our approach exceeds the state-of-the-art.',
    'y': 'positive'}
]

Parameters: data_samples (list|dict|Sample) –
Returns: Normalized data.

save_csv(out_path, encoding='utf-8', headers=None, sep=',')[source]¶

Save dataset to csv file.

Parameters

out_path – file path
encoding – file’s encoding, default: utf-8
headers – file’s headers, if None, make file’s first line as headers. default: None
sep – separator for each column. default: ‘,’

Returns

save_json(out_path, encoding='utf-8', fields=None)[source]¶

Save dataset to json file which contains json object in each line.

Parameters

out_path – file path
encoding – file’s encoding, default: utf-8
fields – json object’s fields that needed, if None, all fields are needed. default: None

Returns

class textflint.adapter.FlintModel(model, tokenizer, task='SA', batch_size=1)[source]¶

Bases: abc.ABC

A model wrapper queries a model with a list of text inputs.

Classification-based models return a list of lists, where each sublist represents the model’s scores for a given input.

Text-to-text models return a list of strings, where each string is the output – like a translation or summarization – for a given input.

__init__(model, tokenizer, task='SA', batch_size=1)[source]¶

Parameters

model – any model object
tokenizer – support tokenize sentence and convert tokens to model input ids
task (str) – task name
batch_size (int) – batch size to apply evaluation

evaluate(data_samples, prefix='')[source]¶

Parameters

data_samples (list[Sample]) – list of Samples
prefix (str) – name prefix to add to metrics

Returns

dict obj to save metrics result

get_grad(*inputs)[source]¶

Get gradient of loss with respect to input tokens.

Parameters: inputs (tuple) – tuple of original texts

get_model_grad(*inputs)[source]¶

Get gradient of loss with respect to input tokens.

Parameters: inputs (tuple) – list of original text

unzip_samples(data_samples)[source]¶

Unzip sample to input texts and labels.

Parameters: data_samples (list) – sample list
Returns: (inputs_text), labels.

class textflint.adapter.Generator(task='UT', max_trans=1, random_seed=1, fields='x', trans_methods=None, trans_config=None, return_unk=True, sub_methods=None, sub_config=None, attack_methods=None, validate_methods=None, **kwargs)[source]¶

Bases: abc.ABC

Transformation controller which applies multi transformations to each data sample.

__init__(task='UT', max_trans=1, random_seed=1, fields='x', trans_methods=None, trans_config=None, return_unk=True, sub_methods=None, sub_config=None, attack_methods=None, validate_methods=None, **kwargs)[source]¶

Parameters

task (str) – Indicate which task of your transformation data.
max_trans (int) – Maximum transformed samples generate by one original sample pre Transformation.
random_seed (int) – random number seed to reproduce generation.
fields (str|list) – Indicate which fields to apply transformations. Multi fields transform just for some special task, like: SM、NLI.
trans_methods (list) – list of transformations’ name.
trans_config (dict) – transformation class configs, useful to control the behavior of transformations.
return_unk (bool) – Some transformation may generate unk labels, s.t. insert a word to a sequence in NER task. If set False, would skip these transformations.
sub_methods (list) – list of subpopulations’ name.
sub_config (dict) – subpopulation class configs, useful to control the behavior of subpopulation.
attack_methods (str) – path to the python file containing the Attack instances.
validate_methods (list) – confidence calculate functions.

prepare(dataset)[source]¶

Check dataset

Parameters: dataset (textflint.Dataset) – the input dataset

generate(dataset, model=None)[source]¶

Returns a list of possible generated samples for dataset.

Parameters

dataset (textflint.Dataset) – the input dataset
model (textflint.FlintModel) – the model to attack if given.

Returns

yield (original samples, new samples, generated function string).

generate_by_transformations(dataset, **kwargs)[source]¶

Generate samples by a list of transformation methods.

Parameters: dataset – the input dataset
Returns: (original samples, new samples, generated function string)

generate_by_subpopulations(dataset, **kwargs)[source]¶

Generate samples by a list of subpopulation methods.

Parameters: dataset – the input dataset
Returns: the transformed dataset

generate_by_attacks(dataset, model=None, **kwargs)[source]¶

Generate samples by a list of attack methods.

Parameters

dataset – the input dataset
model – the model to attack if given.

Returns

the transformed dataset

class textflint.adapter.ReportGenerator[source]¶

Bases: object

Plotting robustness report, return radar figure, sunbrust figure, and bar chart figure.

Example:

{

“model_name”: “BERT”, “dataset_name”: “medical data”, “transformation”: {

“Case”: {
“ori_precision”: 0.70, “trans_precision”: 0.65, “ori_f1”: 0.63, “trans_f1”: 0.60, “size”: 5000,

}, “Ocr”: {

“ori_precision”: 0.72, “trans_precision”: 0.43, “ori_f1”: 0.62, “trans_f1”: 0.41, “size”: 5000,

}

}, “subpopulation”: {

“LengthLengthSubPopulation-0.0-0.1”: {
“trans_precision”: 0.68, “trans_f1”: 0.63, “size”: 500

}

}, “attack”: {

“Bert-Attack”: {
“ori_precision”: 0.72, “trans_precision”: 0.43, “ori_f1”: 0.62, “trans_f1”: 0.41, “size”: 400,

}

}

plot(evaluate_json)[source]¶

Analysis evaluation result and plot three reports in html.

Parameters: evaluate_json (dict) – evaluate result of specific model.

static get_radar_fig(radar_pd)[source]¶: Get radar figure of linguistic classifications.

static get_sunburst_fig(df, settings)[source]¶: Get sunburst figure of linguistic classifications and show details.

static get_bar_chart(pd, cols, model_name=None, dataset_name=None)[source]¶: Get bar chart figure.

textflint.adapter.load_module_from_file(module_name, file_path)[source]¶: Uses importlib to dynamically open a file and load an object from it.