textflint.generation_layer.transformation.NER.ent_typos

Swap/delete/add random character for entities

class textflint.generation_layer.transformation.NER.ent_typos.EntTypos(mode='random', skip_first_char=True, skip_last_char=False, **kwargs)[source]

Bases: textflint.generation_layer.transformation.transformation.Transformation

Transformation that simulate typos error to transform sentence.

__init__(mode='random', skip_first_char=True, skip_last_char=False, **kwargs)[source]
Parameters

mode (str) – just support

[‘random’, ‘replace’, ‘swap’, ‘insert’, ‘delete’] :param bool skip_first_char: whether skip the first char of target word :param bool skip_last_char: whether skip the last char of target word. :param **kwargs:

class textflint.generation_layer.transformation.NER.ent_typos.Transformation(**kwargs)[source]

Bases: abc.ABC

An abstract class for transforming a sequence of text to produce a list of potential adversarial example.

processor = <textflint.common.preprocess.en_processor.EnProcessor object>
transform(sample, n=1, field='x', **kwargs)[source]

Transform data sample to a list of Sample.

Parameters
  • sample (Sample) – Data sample for augmentation.

  • n (int) – Max number of unique augmented output, default is 5.

  • field (str|list) – Indicate which fields to apply transformations.

  • **kwargs (dict) –

    other auxiliary params.

Returns

list of Sample

classmethod sample_num(x, num)[source]

Get ‘num’ samples from x.

Parameters
  • x (list) – list to sample

  • num (int) – sample number

Returns

max ‘num’ unique samples.

textflint.generation_layer.transformation.NER.ent_typos.delete(word, num=1, skip_first=False, skip_last=False)[source]

Perturb the word wityh 1 letter deleted.

Parameters
  • word (str) – number of typos to add

  • num (int) – number of typos to add

  • skip_first (bool) – whether delete the char at the beginning of word

  • skip_last (bool) – whether delete the char at the end of word

Returns

perturbed strings

textflint.generation_layer.transformation.NER.ent_typos.get_random_letter(src_char=None)[source]

Get replaced letter according src_char format.

Parameters

src_char (char) –

Returns

default return a lower letter

textflint.generation_layer.transformation.NER.ent_typos.get_start_end(word, skip_first=False, skip_last=False)[source]

Get valid operation range of one word.

Parameters
  • word (str) – target word string

  • skip_first (bool) – whether operate first char

  • skip_last (bool) – whether operate last char

Returns

start index, last index

textflint.generation_layer.transformation.NER.ent_typos.insert(word, num=1, skip_first=False, skip_last=False)[source]

Perturb the word with 1 random character inserted.

Parameters
  • word (str) – target word

  • num (int) – number of typos to add

  • skip_first (bool) – whether insert char at the beginning of word

  • skip_last (bool) – whether insert char at the end of word

Returns

perturbed strings

textflint.generation_layer.transformation.NER.ent_typos.replace(word, num=1, skip_first=False, skip_last=False)[source]

Perturb the word with 1 letter substituted for a random letter.

Parameters
  • word (str) – target word

  • num (int) – number of typos to add

  • skip_first (bool) – whether replace the char at the beginning of word

  • skip_last (bool) – whether replace the char at the beginning of word

Returns

perturbed strings

textflint.generation_layer.transformation.NER.ent_typos.swap(word, num=1, skip_first=False, skip_last=False)[source]

Swaps random characters with their neighbors.

Parameters
  • word (str) – target word

  • num (int) – number of typos to add

  • skip_first (bool) – whether swap first char of word

  • skip_last (bool) – whether swap last char of word

Returns

perturbed strings

textflint.generation_layer.transformation.NER.ent_typos.trade_off_sub_words(sub_words, sub_indices, trans_num=None, n=1)[source]

Select proper candidate words to maximum number of transform result. Select words of top n substitutes words number.

Parameters
  • sub_words (list) – list of substitutes word of each legal word

  • sub_indices (list) – list of indices of each legal word

  • trans_num (int) – max number of words to apply substitution

  • n (int) –

Returns

sub_words after alignment + indices of sub_words