textflint.adapter

TextFlint Adapter Class

textflint.adapter.auto_config(task='UT', config=None)[source]

Check config input or create config automatically.

Parameters
  • task (str) – task name

  • config (str|dict|textflint.config.Config) – config to control generation procedure.

Returns

textflint.config.Config instance.

textflint.adapter.auto_dataset(data_input=None, task='UT')[source]

Create Dataset instance and load data input automatically.

Parameters
  • data_input (dict|list|string) – json object or json/csv file.

  • task (str) – task name.

Returns

textflint.Dataset instance.

textflint.adapter.auto_flintmodel(model, task)[source]

Check flint model type and whether compatible to task.

Parameters
  • model (textflint.FlintModel|str) – FlintModel instance or python file path which contains FlintModel instance

  • task (str) – task name

Returns

textflint.FlintModel

textflint.adapter.auto_generator(config_obj)[source]

Automatic create task generator to apply transformations, subpopulations and adversarial attacks.

Parameters

config_obj (textflint.Config) – Config instance.

Returns

textflint.Generator

textflint.adapter.auto_report_generator()[source]

Return a ReportGenerator instance.

Returns

ReportGenerator

class textflint.adapter.Config(task='UT', out_dir=None, max_trans=1, random_seed=1, fields=None, flint_model=None, trans_methods=None, trans_config=None, return_unk=True, sub_methods=None, sub_config=None, attack_methods=None, validate_methods=None, **kwargs)[source]

Bases: object

Hold some config params to control generation and report procedure.

__init__(task='UT', out_dir=None, max_trans=1, random_seed=1, fields=None, flint_model=None, trans_methods=None, trans_config=None, return_unk=True, sub_methods=None, sub_config=None, attack_methods=None, validate_methods=None, **kwargs)[source]
Parameters
  • task (str) – task name

  • out_dir (string) – out dir for saving generated samples, default current path.

  • max_trans (int) – maximum transformed samples generate by one original sample pre Transformation.

  • random_seed (int) – random number seed to reproduce generation.

  • fields (str|list[str]) – fields on which new samples are generated.

::param str model_file: path to the python file containing

the FlintModel instance which named ‘model’.

Parameters
  • trans_methods (list) – indicate what transformations to apply to dataset.

  • trans_config (dict) – parameters for the initialization of the transformation instances.

  • return_unk (bool) – whether apply transformations which may influence label of sample.

  • sub_methods (list) – indicate what subpopulations to apply to dataset.

  • sub_config (dict) – parameters for the initialization of the subpopulation instances.

  • attack_methods (str) – path to the python file containing the Attack instances which named “attacks”.

  • validate_methods (str|list[str]) – indicate use which validate methods to calculate confidence of generated samples.

check_config()[source]

Check common config params.

get_generate_methods(methods, task_to_methods, allow_pipeline=False)[source]

Validate transformation or subpopulation methods.

Watch out! Some UT transformations/subpopulations may not compatible with your task, please choose your method carefully.

Parameters
  • methods (list) – transformation or subpopulation need to apply to dataset. If not provide, return default generated methods.

  • task_to_methods (dict) – map allowed methods by task name.

  • allow_pipeline (bool) – whether allow pipeline input

Returns

list of transformation/subpopulation.

classmethod from_dict(json_object)[source]

Constructs a Config from a Python dictionary of parameters.

classmethod from_json_file(json_file)[source]

Constructs a Config from a json file of parameters.

to_dict()[source]

Serializes this instance to a Python dictionary.

to_json_string()[source]

Serializes this instance to a JSON string.

to_json_file(json_file)[source]

Serializes this instance to a JSON file.

class textflint.adapter.Dataset(task='UT')[source]

Bases: object

Any iterable of (label, text_input) pairs qualifies as a Dataset.

__init__(task='UT')[source]
Parameters

task (str) – indicate data sample format.

free()[source]

Fully clear dataset.

dump()[source]

Return dataset in json object format.

load(dataset)[source]

Loads json object and prepares it as a Dataset.

Support two formats input, Example:

  1. {‘x’: [

    ‘The robustness of deep neural networks has received much attention recently’, ‘We focus on certified robustness of smoothed classifiers in this work’, …, ‘our approach exceeds the state-of-the-art.’ ],

    ‘y’: [

    ‘neural’, ‘positive’, …, ‘positive’ ]}

    1. [

      {‘x’: ‘The robustness of deep neural networks has received much attention recently’, ‘y’: ‘neural’}, {‘x’: ‘We focus on certified robustness of smoothed classifiers in this work’, ‘y’: ‘positive’}, …, {‘x’: ‘our approach exceeds the state-of-the-art.’, ‘y’: ‘positive’} ]

Parameters

dataset (list|dict) –

Returns

load_json(json_path, encoding='utf-8', fields=None, dropna=True)[source]

Loads json file, each line of the file is a json string.

Parameters
  • json_path – file path

  • encoding – file’s encoding, default: utf-8

  • fields – json object’s fields that needed, if None, all fields are needed. default: None

  • dropna – weather to ignore and drop invalid data, :if False, raise ValueError when reading invalid data. default: True

Returns

load_csv(csv_path, encoding='utf-8', headers=None, sep=',', dropna=True)[source]

Loads csv file, one line correspond one sample.

Parameters
  • csv_path – file path

  • encoding – file’s encoding, default: utf-8

  • headers – file’s headers, if None, make file’s first line as headers. default: None

  • sep – separator for each column. default: ‘,’

  • dropna – weather to ignore and drop invalid data, :if False, raise ValueError when reading invalid data. default: True

Returns

load_hugging_face(name, subset='train')[source]

Loads a dataset from HuggingFace datasets and prepares it as a Dataset.

Parameters
  • name – the dataset name

  • subset – the subset of the main dataset.

Returns

append(data_sample, sample_id=- 1)[source]

Load single data sample and append to dataset.

Parameters
  • data_sample (dict|sample) –

  • sample_id (int) – useful to identify sample, default -1

Returns

True / False indicate whether append action successful.

extend(data_samples)[source]

Load multi data samples and extend to dataset.

Parameters

data_samples (list|dict|Sample) –

Returns

static norm_input(data_samples)[source]

Convert various data input to list of dict. Example:

 {'x': [
          'The robustness of deep neural networks has received
          much attention recently',
          'We focus on certified robustness of smoothed classifiers
          in this work',
          ...,
          'our approach exceeds the state-of-the-art.'
      ],
 'y': [
          'neural',
          'positive',
          ...,
          'positive'
      ]
}
convert to
[
    {'x': 'The robustness of deep neural networks has received
    much attention recently', 'y': 'neural'},
    {'x': 'We focus on certified robustness of smoothed classifiers
    in this work', 'y': 'positive'},
    ...,
    {'x': 'our approach exceeds the state-of-the-art.',
    'y': 'positive'}
]
Parameters

data_samples (list|dict|Sample) –

Returns

Normalized data.

save_csv(out_path, encoding='utf-8', headers=None, sep=',')[source]

Save dataset to csv file.

Parameters
  • out_path – file path

  • encoding – file’s encoding, default: utf-8

  • headers – file’s headers, if None, make file’s first line as headers. default: None

  • sep – separator for each column. default: ‘,’

Returns

save_json(out_path, encoding='utf-8', fields=None)[source]

Save dataset to json file which contains json object in each line.

Parameters
  • out_path – file path

  • encoding – file’s encoding, default: utf-8

  • fields – json object’s fields that needed, if None, all fields are needed. default: None

Returns

class textflint.adapter.FlintModel(model, tokenizer, task='SA', batch_size=1)[source]

Bases: abc.ABC

A model wrapper queries a model with a list of text inputs.

Classification-based models return a list of lists, where each sublist represents the model’s scores for a given input.

Text-to-text models return a list of strings, where each string is the output – like a translation or summarization – for a given input.

__init__(model, tokenizer, task='SA', batch_size=1)[source]
Parameters
  • model – any model object

  • tokenizer – support tokenize sentence and convert tokens to model input ids

  • task (str) – task name

  • batch_size (int) – batch size to apply evaluation

evaluate(data_samples, prefix='')[source]
Parameters
  • data_samples (list[Sample]) – list of Samples

  • prefix (str) – name prefix to add to metrics

Returns

dict obj to save metrics result

get_grad(*inputs)[source]

Get gradient of loss with respect to input tokens.

Parameters

inputs (tuple) – tuple of original texts

get_model_grad(*inputs)[source]

Get gradient of loss with respect to input tokens.

Parameters

inputs (tuple) – list of original text

unzip_samples(data_samples)[source]

Unzip sample to input texts and labels.

Parameters

data_samples (list) – sample list

Returns

(inputs_text), labels.

class textflint.adapter.Generator(task='UT', max_trans=1, random_seed=1, fields='x', trans_methods=None, trans_config=None, return_unk=True, sub_methods=None, sub_config=None, attack_methods=None, validate_methods=None, **kwargs)[source]

Bases: abc.ABC

Transformation controller which applies multi transformations to each data sample.

__init__(task='UT', max_trans=1, random_seed=1, fields='x', trans_methods=None, trans_config=None, return_unk=True, sub_methods=None, sub_config=None, attack_methods=None, validate_methods=None, **kwargs)[source]
Parameters
  • task (str) – Indicate which task of your transformation data.

  • max_trans (int) – Maximum transformed samples generate by one original sample pre Transformation.

  • random_seed (int) – random number seed to reproduce generation.

  • fields (str|list) – Indicate which fields to apply transformations. Multi fields transform just for some special task, like: SM、NLI.

  • trans_methods (list) – list of transformations’ name.

  • trans_config (dict) – transformation class configs, useful to control the behavior of transformations.

  • return_unk (bool) – Some transformation may generate unk labels, s.t. insert a word to a sequence in NER task. If set False, would skip these transformations.

  • sub_methods (list) – list of subpopulations’ name.

  • sub_config (dict) – subpopulation class configs, useful to control the behavior of subpopulation.

  • attack_methods (str) – path to the python file containing the Attack instances.

  • validate_methods (list) – confidence calculate functions.

prepare(dataset)[source]

Check dataset

Parameters

dataset (textflint.Dataset) – the input dataset

generate(dataset, model=None)[source]

Returns a list of possible generated samples for dataset.

Parameters
Returns

yield (original samples, new samples, generated function string).

generate_by_transformations(dataset, **kwargs)[source]

Generate samples by a list of transformation methods.

Parameters

dataset – the input dataset

Returns

(original samples, new samples, generated function string)

generate_by_subpopulations(dataset, **kwargs)[source]

Generate samples by a list of subpopulation methods.

Parameters

dataset – the input dataset

Returns

the transformed dataset

generate_by_attacks(dataset, model=None, **kwargs)[source]

Generate samples by a list of attack methods.

Parameters
  • dataset – the input dataset

  • model – the model to attack if given.

Returns

the transformed dataset

class textflint.adapter.ReportGenerator[source]

Bases: object

Plotting robustness report, return radar figure, sunbrust figure, and bar chart figure.

Example:
{

“model_name”: “BERT”, “dataset_name”: “medical data”, “transformation”: {

“Case”: {

“ori_precision”: 0.70, “trans_precision”: 0.65, “ori_f1”: 0.63, “trans_f1”: 0.60, “size”: 5000,

}, “Ocr”: {

“ori_precision”: 0.72, “trans_precision”: 0.43, “ori_f1”: 0.62, “trans_f1”: 0.41, “size”: 5000,

}

}, “subpopulation”: {

“LengthLengthSubPopulation-0.0-0.1”: {

“trans_precision”: 0.68, “trans_f1”: 0.63, “size”: 500

}

}, “attack”: {

“Bert-Attack”: {

“ori_precision”: 0.72, “trans_precision”: 0.43, “ori_f1”: 0.62, “trans_f1”: 0.41, “size”: 400,

}

}

}

plot(evaluate_json)[source]

Analysis evaluation result and plot three reports in html.

Parameters

evaluate_json (dict) – evaluate result of specific model.

static get_radar_fig(radar_pd)[source]

Get radar figure of linguistic classifications.

static get_sunburst_fig(df, settings)[source]

Get sunburst figure of linguistic classifications and show details.

static get_bar_chart(pd, cols, model_name=None, dataset_name=None)[source]

Get bar chart figure.

textflint.adapter.load_module_from_file(module_name, file_path)[source]

Uses importlib to dynamically open a file and load an object from it.