textflint.generation_layer.transformation.word_substitute

WordSubstitute Base Class

class textflint.generation_layer.transformation.word_substitute.WordSubstitute(trans_min=1, trans_max=10, trans_p=0.1, stop_words=None, **kwargs)[source]

Bases: textflint.generation_layer.transformation.transformation.Transformation

Word replace transformation to implement normal word replace functions.

__init__(trans_min=1, trans_max=10, trans_p=0.1, stop_words=None, **kwargs)[source]
Parameters
  • trans_min (int) – Minimum number of word will be augmented.

  • trans_max (int) – Maximum number of word will be augmented. If None is passed, number of augmentation is calculated via aup_char_p. If calculated result from aug_p is smaller than aug_max, will use calculated result from aup_char_p. Otherwise, using aug_max.

  • trans_p (float) – Percentage of word will be augmented.

  • stop_words (list) – List of words which will be skipped from augment operation.

  • processor (EnProcessor) –

  • get_pos (bool) – whether pass pos tag to _get_substitute_words API.

abstract skip_aug(tokens, mask, pos=None)[source]

Returns the index of the replaced tokens.

Parameters

tokens (list) – tokenized words or word with pos tag pairs

Return list

the index of the replaced tokens

is_stop_words(token)[source]

Judge whether the input word belongs to the stop words vocab.

Parameters

token (str) – the input word to be judged

Return bool

is a stop word or not

pre_skip_aug(tokens, mask)[source]

Skip the tokens in stop words list or punctuation list.

Parameters
  • tokens (list) – the list of tokens

  • mask (list) – the list of mask Indicates whether each word is allowed to be substituted. ORIGIN is allowed, while TASK_MASK and MODIFIED_MASK is not.

Return list

List of possible substituted token index.

get_trans_cnt(size)[source]

Get the num of words/chars transformation.

Parameters

size (int) – the size of target sentence

Return int

number of words to apply transformation.

class textflint.generation_layer.transformation.word_substitute.Transformation(**kwargs)[source]

Bases: abc.ABC

An abstract class for transforming a sequence of text to produce a list of potential adversarial example.

processor = <textflint.common.preprocess.en_processor.EnProcessor object>
transform(sample, n=1, field='x', **kwargs)[source]

Transform data sample to a list of Sample.

Parameters
  • sample (Sample) – Data sample for augmentation.

  • n (int) – Max number of unique augmented output, default is 5.

  • field (str|list) – Indicate which fields to apply transformations.

  • **kwargs (dict) –

    other auxiliary params.

Returns

list of Sample

classmethod sample_num(x, num)[source]

Get ‘num’ samples from x.

Parameters
  • x (list) – list to sample

  • num (int) – sample number

Returns

max ‘num’ unique samples.

textflint.generation_layer.transformation.word_substitute.abstractmethod(funcobj)[source]

A decorator indicating abstract methods.

Requires that the metaclass is ABCMeta or derived from it. A class that has a metaclass derived from ABCMeta cannot be instantiated unless all of its abstract methods are overridden. The abstract methods can be called using any of the normal ‘super’ call mechanisms.

Usage:

class C(metaclass=ABCMeta):

@abstractmethod def my_abstract_method(self, …):

textflint.generation_layer.transformation.word_substitute.trade_off_sub_words(sub_words, sub_indices, trans_num=None, n=1)[source]

Select proper candidate words to maximum number of transform result. Select words of top n substitutes words number.

Parameters
  • sub_words (list) – list of substitutes word of each legal word

  • sub_indices (list) – list of indices of each legal word

  • trans_num (int) – max number of words to apply substitution

  • n (int) –

Returns

sub_words after alignment + indices of sub_words