textflint.generation_layer.transformation.MRC.perturb_question

Perturb Answer with BackTrans or MLM

class textflint.generation_layer.transformation.MRC.perturb_question.PerturbQuestion(transform_method='BackTrans', device='cuda:0')[source]

Bases: textflint.generation_layer.transformation.transformation.Transformation

Transform the question

Example:

origin: Where was Super Bowl 50 held?
transform: Where did Super Bowl 50 take place?
__init__(transform_method='BackTrans', device='cuda:0')[source]
Parameters
  • transform_method – paraphrase method

  • device – GPU device or CPU

class textflint.generation_layer.transformation.MRC.perturb_question.BackTrans(from_model_name=None, to_model_name=None, device=None, **kwargs)[source]

Bases: textflint.generation_layer.transformation.transformation.Transformation

Back Translation with hugging-face translation models. A sentence can only be transformed into one sentence at most.

__init__(from_model_name=None, to_model_name=None, device=None, **kwargs)[source]
Parameters
  • from_model_name (str) – model to translate original language to target language

  • to_model_name (str) – model to translate target language to original language

  • device – indicate utilize cpu or which gpu device to run neural network

static get_device(device)[source]

Get gpu or cpu device.

Parameters

device (str) – device string “cpu” means use cpu device. “cuda:0” means use gpu device which index is 0.

Returns

device in torch.

class textflint.generation_layer.transformation.MRC.perturb_question.MLMSuggestion(masked_model=None, device=None, accrue_threshold=1, max_sent_size=100, trans_min=1, trans_max=10, trans_p=0.2, stop_words=None, **kwargs)[source]

Bases: textflint.generation_layer.transformation.word_substitute.WordSubstitute

Transforms an input by replacing its tokens with words of mask language predicted. To accelerate transformation for long text, input single sentence to language model rather than whole text.

__init__(masked_model=None, device=None, accrue_threshold=1, max_sent_size=100, trans_min=1, trans_max=10, trans_p=0.2, stop_words=None, **kwargs)[source]
Parameters
  • masked_model (str) – masked language model to predicate candidates

  • device (str) – indicate utilize cpu or which gpu device to run neural network

  • accrue_threshold (int) – threshold of Bert results to pick

  • max_sent_size – max_sent_size

  • trans_min (int) – Minimum number of character will be augmented.

  • trans_max (int) – Maximum number of character will be augmented. If None is passed, number of augmentation is calculated via aup_char_p. If calculated result from aug_p is smaller than aug_max, will use calculated result from aup_char_p. Otherwise, using aug_max.

  • trans_p (float) – Percentage of character (per token) will be augmented.

  • stop_words (list) – List of words which will be skipped from augment operation.

get_model()[source]

Loads masked language model to predict candidates.

pre_calculate_allowed_tokens()[source]

Precalculate meaningful tokens, filter tokens which is not an alphabetic string.

Pre filter would accelerate procedure of verifying pos tags of candidates.

class textflint.generation_layer.transformation.MRC.perturb_question.Transformation(**kwargs)[source]

Bases: abc.ABC

An abstract class for transforming a sequence of text to produce a list of potential adversarial example.

processor = <textflint.common.preprocess.en_processor.EnProcessor object>
transform(sample, n=1, field='x', **kwargs)[source]

Transform data sample to a list of Sample.

Parameters
  • sample (Sample) – Data sample for augmentation.

  • n (int) – Max number of unique augmented output, default is 5.

  • field (str|list) – Indicate which fields to apply transformations.

  • **kwargs (dict) –

    other auxiliary params.

Returns

list of Sample

classmethod sample_num(x, num)[source]

Get ‘num’ samples from x.

Parameters
  • x (list) – list to sample

  • num (int) – sample number

Returns

max ‘num’ unique samples.