textflint.generation_layer.transformation.MRC.perturb_question¶

Perturb Answer with BackTrans or MLM¶

class textflint.generation_layer.transformation.MRC.perturb_question.PerturbQuestion(transform_method='BackTrans', device='cuda:0')[source]¶

Bases: textflint.generation_layer.transformation.transformation.Transformation

Transform the question

Example:

origin: Where was Super Bowl 50 held?
transform: Where did Super Bowl 50 take place?

__init__(transform_method='BackTrans', device='cuda:0')[source]¶

Parameters

transform_method – paraphrase method
device – GPU device or CPU

class textflint.generation_layer.transformation.MRC.perturb_question.BackTrans(from_model_name=None, to_model_name=None, device=None, **kwargs)[source]¶

Bases: textflint.generation_layer.transformation.transformation.Transformation

Back Translation with hugging-face translation models. A sentence can only be transformed into one sentence at most.

__init__(from_model_name=None, to_model_name=None, device=None, **kwargs)[source]¶

Parameters

from_model_name (str) – model to translate original language to target language
to_model_name (str) – model to translate target language to original language
device – indicate utilize cpu or which gpu device to run neural network

static get_device(device)[source]¶

Get gpu or cpu device.

Parameters: device (str) – device string “cpu” means use cpu device. “cuda:0” means use gpu device which index is 0.
Returns: device in torch.

class textflint.generation_layer.transformation.MRC.perturb_question.MLMSuggestion(masked_model=None, device=None, accrue_threshold=1, max_sent_size=100, trans_min=1, trans_max=10, trans_p=0.2, stop_words=None, **kwargs)[source]¶

Bases: textflint.generation_layer.transformation.word_substitute.WordSubstitute

Transforms an input by replacing its tokens with words of mask language predicted. To accelerate transformation for long text, input single sentence to language model rather than whole text.

__init__(masked_model=None, device=None, accrue_threshold=1, max_sent_size=100, trans_min=1, trans_max=10, trans_p=0.2, stop_words=None, **kwargs)[source]¶

Parameters

masked_model (str) – masked language model to predicate candidates
device (str) – indicate utilize cpu or which gpu device to run neural network
accrue_threshold (int) – threshold of Bert results to pick
max_sent_size – max_sent_size
trans_min (int) – Minimum number of character will be augmented.
trans_max (int) – Maximum number of character will be augmented. If None is passed, number of augmentation is calculated via aup_char_p. If calculated result from aug_p is smaller than aug_max, will use calculated result from aup_char_p. Otherwise, using aug_max.
trans_p (float) – Percentage of character (per token) will be augmented.
stop_words (list) – List of words which will be skipped from augment operation.

get_model()[source]¶: Loads masked language model to predict candidates.

pre_calculate_allowed_tokens()[source]¶

Precalculate meaningful tokens, filter tokens which is not an alphabetic string.

Pre filter would accelerate procedure of verifying pos tags of candidates.

class textflint.generation_layer.transformation.MRC.perturb_question.Transformation(**kwargs)[source]¶

Bases: abc.ABC

An abstract class for transforming a sequence of text to produce a list of potential adversarial example.

processor = <textflint.common.preprocess.en_processor.EnProcessor object>¶

transform(sample, n=1, field='x', **kwargs)[source]¶

Transform data sample to a list of Sample.

Parameters

sample (Sample) – Data sample for augmentation.
n (int) – Max number of unique augmented output, default is 5.
field (str|list) – Indicate which fields to apply transformations.
**kwargs (dict) –
other auxiliary params.

Returns

list of Sample

classmethod sample_num(x, num)[source]¶

Get ‘num’ samples from x.

Parameters

x (list) – list to sample
num (int) – sample number

Returns

max ‘num’ unique samples.