textflint.adapter¶
TextFlint Adapter Class¶
-
textflint.adapter.
auto_config
(task='UT', config=None)[source]¶ Check config input or create config automatically.
- Parameters
task (str) – task name
config (str|dict|textflint.config.Config) – config to control generation procedure.
- Returns
textflint.config.Config instance.
-
textflint.adapter.
auto_dataset
(data_input=None, task='UT')[source]¶ Create Dataset instance and load data input automatically.
- Parameters
data_input (dict|list|string) – json object or json/csv file.
task (str) – task name.
- Returns
textflint.Dataset instance.
-
textflint.adapter.
auto_flintmodel
(model, task)[source]¶ Check flint model type and whether compatible to task.
- Parameters
model (textflint.FlintModel|str) – FlintModel instance or python file path which contains FlintModel instance
task (str) – task name
- Returns
textflint.FlintModel
-
textflint.adapter.
auto_generator
(config_obj)[source]¶ Automatic create task generator to apply transformations, subpopulations and adversarial attacks.
- Parameters
config_obj (textflint.Config) – Config instance.
- Returns
textflint.Generator
-
textflint.adapter.
auto_report_generator
()[source]¶ Return a ReportGenerator instance.
- Returns
ReportGenerator
-
class
textflint.adapter.
Config
(task='UT', out_dir=None, max_trans=1, random_seed=1, fields=None, flint_model=None, trans_methods=None, trans_config=None, return_unk=True, sub_methods=None, sub_config=None, attack_methods=None, validate_methods=None, **kwargs)[source]¶ Bases:
object
Hold some config params to control generation and report procedure.
-
__init__
(task='UT', out_dir=None, max_trans=1, random_seed=1, fields=None, flint_model=None, trans_methods=None, trans_config=None, return_unk=True, sub_methods=None, sub_config=None, attack_methods=None, validate_methods=None, **kwargs)[source]¶ - Parameters
task (str) – task name
out_dir (string) – out dir for saving generated samples, default current path.
max_trans (int) – maximum transformed samples generate by one original sample pre Transformation.
random_seed (int) – random number seed to reproduce generation.
fields (str|list[str]) – fields on which new samples are generated.
- ::param str model_file: path to the python file containing
the FlintModel instance which named ‘model’.
- Parameters
trans_methods (list) – indicate what transformations to apply to dataset.
trans_config (dict) – parameters for the initialization of the transformation instances.
return_unk (bool) – whether apply transformations which may influence label of sample.
sub_methods (list) – indicate what subpopulations to apply to dataset.
sub_config (dict) – parameters for the initialization of the subpopulation instances.
attack_methods (str) – path to the python file containing the Attack instances which named “attacks”.
validate_methods (str|list[str]) – indicate use which validate methods to calculate confidence of generated samples.
-
get_generate_methods
(methods, task_to_methods, allow_pipeline=False)[source]¶ Validate transformation or subpopulation methods.
Watch out! Some UT transformations/subpopulations may not compatible with your task, please choose your method carefully.
- Parameters
methods (list) – transformation or subpopulation need to apply to dataset. If not provide, return default generated methods.
task_to_methods (dict) – map allowed methods by task name.
allow_pipeline (bool) – whether allow pipeline input
- Returns
list of transformation/subpopulation.
-
-
class
textflint.adapter.
Dataset
(task='UT')[source]¶ Bases:
object
Any iterable of (label, text_input) pairs qualifies as a
Dataset
.-
load
(dataset)[source]¶ Loads json object and prepares it as a Dataset.
Support two formats input, Example:
- {‘x’: [
‘The robustness of deep neural networks has received much attention recently’, ‘We focus on certified robustness of smoothed classifiers in this work’, …, ‘our approach exceeds the state-of-the-art.’ ],
- ‘y’: [
‘neural’, ‘positive’, …, ‘positive’ ]}
- [
{‘x’: ‘The robustness of deep neural networks has received much attention recently’, ‘y’: ‘neural’}, {‘x’: ‘We focus on certified robustness of smoothed classifiers in this work’, ‘y’: ‘positive’}, …, {‘x’: ‘our approach exceeds the state-of-the-art.’, ‘y’: ‘positive’} ]
- Parameters
dataset (list|dict) –
- Returns
-
load_json
(json_path, encoding='utf-8', fields=None, dropna=True)[source]¶ Loads json file, each line of the file is a json string.
- Parameters
json_path – file path
encoding – file’s encoding, default: utf-8
fields – json object’s fields that needed, if None, all fields are needed. default: None
dropna – weather to ignore and drop invalid data, :if False, raise ValueError when reading invalid data. default: True
- Returns
-
load_csv
(csv_path, encoding='utf-8', headers=None, sep=',', dropna=True)[source]¶ Loads csv file, one line correspond one sample.
- Parameters
csv_path – file path
encoding – file’s encoding, default: utf-8
headers – file’s headers, if None, make file’s first line as headers. default: None
sep – separator for each column. default: ‘,’
dropna – weather to ignore and drop invalid data, :if False, raise ValueError when reading invalid data. default: True
- Returns
-
load_hugging_face
(name, subset='train')[source]¶ Loads a dataset from HuggingFace
datasets
and prepares it as a Dataset.- Parameters
name – the dataset name
subset – the subset of the main dataset.
- Returns
-
append
(data_sample, sample_id=- 1)[source]¶ Load single data sample and append to dataset.
- Parameters
data_sample (dict|sample) –
sample_id (int) – useful to identify sample, default -1
- Returns
True / False indicate whether append action successful.
-
extend
(data_samples)[source]¶ Load multi data samples and extend to dataset.
- Parameters
data_samples (list|dict|Sample) –
- Returns
-
static
norm_input
(data_samples)[source]¶ Convert various data input to list of dict. Example:
{'x': [ 'The robustness of deep neural networks has received much attention recently', 'We focus on certified robustness of smoothed classifiers in this work', ..., 'our approach exceeds the state-of-the-art.' ], 'y': [ 'neural', 'positive', ..., 'positive' ] } convert to [ {'x': 'The robustness of deep neural networks has received much attention recently', 'y': 'neural'}, {'x': 'We focus on certified robustness of smoothed classifiers in this work', 'y': 'positive'}, ..., {'x': 'our approach exceeds the state-of-the-art.', 'y': 'positive'} ]
- Parameters
data_samples (list|dict|Sample) –
- Returns
Normalized data.
-
save_csv
(out_path, encoding='utf-8', headers=None, sep=',')[source]¶ Save dataset to csv file.
- Parameters
out_path – file path
encoding – file’s encoding, default: utf-8
headers – file’s headers, if None, make file’s first line as headers. default: None
sep – separator for each column. default: ‘,’
- Returns
-
-
class
textflint.adapter.
FlintModel
(model, tokenizer, task='SA', batch_size=1)[source]¶ Bases:
abc.ABC
A model wrapper queries a model with a list of text inputs.
Classification-based models return a list of lists, where each sublist represents the model’s scores for a given input.
Text-to-text models return a list of strings, where each string is the output – like a translation or summarization – for a given input.
-
__init__
(model, tokenizer, task='SA', batch_size=1)[source]¶ - Parameters
model – any model object
tokenizer – support tokenize sentence and convert tokens to model input ids
task (str) – task name
batch_size (int) – batch size to apply evaluation
-
evaluate
(data_samples, prefix='')[source]¶ - Parameters
data_samples (list[Sample]) – list of Samples
prefix (str) – name prefix to add to metrics
- Returns
dict obj to save metrics result
-
get_grad
(*inputs)[source]¶ Get gradient of loss with respect to input tokens.
- Parameters
inputs (tuple) – tuple of original texts
-
-
class
textflint.adapter.
Generator
(task='UT', max_trans=1, random_seed=1, fields='x', trans_methods=None, trans_config=None, return_unk=True, sub_methods=None, sub_config=None, attack_methods=None, validate_methods=None, **kwargs)[source]¶ Bases:
abc.ABC
Transformation controller which applies multi transformations to each data sample.
-
__init__
(task='UT', max_trans=1, random_seed=1, fields='x', trans_methods=None, trans_config=None, return_unk=True, sub_methods=None, sub_config=None, attack_methods=None, validate_methods=None, **kwargs)[source]¶ - Parameters
task (str) – Indicate which task of your transformation data.
max_trans (int) – Maximum transformed samples generate by one original sample pre Transformation.
random_seed (int) – random number seed to reproduce generation.
fields (str|list) – Indicate which fields to apply transformations. Multi fields transform just for some special task, like: SM、NLI.
trans_methods (list) – list of transformations’ name.
trans_config (dict) – transformation class configs, useful to control the behavior of transformations.
return_unk (bool) – Some transformation may generate unk labels, s.t. insert a word to a sequence in NER task. If set False, would skip these transformations.
sub_methods (list) – list of subpopulations’ name.
sub_config (dict) – subpopulation class configs, useful to control the behavior of subpopulation.
attack_methods (str) – path to the python file containing the Attack instances.
validate_methods (list) – confidence calculate functions.
-
prepare
(dataset)[source]¶ Check dataset
- Parameters
dataset (textflint.Dataset) – the input dataset
-
generate
(dataset, model=None)[source]¶ Returns a list of possible generated samples for
dataset
.- Parameters
dataset (textflint.Dataset) – the input dataset
model (textflint.FlintModel) – the model to attack if given.
- Returns
yield (original samples, new samples, generated function string).
-
generate_by_transformations
(dataset, **kwargs)[source]¶ Generate samples by a list of transformation methods.
- Parameters
dataset – the input dataset
- Returns
(original samples, new samples, generated function string)
-
-
class
textflint.adapter.
ReportGenerator
[source]¶ Bases:
object
Plotting robustness report, return radar figure, sunbrust figure, and bar chart figure.
- Example:
- {
“model_name”: “BERT”, “dataset_name”: “medical data”, “transformation”: {
- “Case”: {
“ori_precision”: 0.70, “trans_precision”: 0.65, “ori_f1”: 0.63, “trans_f1”: 0.60, “size”: 5000,
}, “Ocr”: {
“ori_precision”: 0.72, “trans_precision”: 0.43, “ori_f1”: 0.62, “trans_f1”: 0.41, “size”: 5000,
}
}, “subpopulation”: {
- “LengthLengthSubPopulation-0.0-0.1”: {
“trans_precision”: 0.68, “trans_f1”: 0.63, “size”: 500
}
}, “attack”: {
- “Bert-Attack”: {
“ori_precision”: 0.72, “trans_precision”: 0.43, “ori_f1”: 0.62, “trans_f1”: 0.41, “size”: 400,
}
}
}
-
plot
(evaluate_json)[source]¶ Analysis evaluation result and plot three reports in html.
- Parameters
evaluate_json (dict) – evaluate result of specific model.