textflint.generation_layer.transformation.UT.typos¶
Typos Transformation for add/remove punctuation.¶
-
class
textflint.generation_layer.transformation.UT.typos.Typos(trans_min=1, trans_max=10, trans_p=0.3, stop_words=None, mode='random', skip_first_char=True, skip_last_char=True, **kwargs)[source]¶ Bases:
textflint.generation_layer.transformation.word_substitute.WordSubstituteTransformation that simulate typos error to transform sentence.
https://arxiv.org/pdf/1711.02173.pdf
-
__init__(trans_min=1, trans_max=10, trans_p=0.3, stop_words=None, mode='random', skip_first_char=True, skip_last_char=True, **kwargs)[source]¶ - Parameters
trans_min (int) – Minimum number of character will be augmented.
trans_max (int) – Maximum number of character will be augmented. If None is passed, number of augmentation is calculated via aup_char_p.If calculated result from aug_p is smaller than aug_max, will use calculated result from aup_char_p. Otherwise, using aug_max.
trans_p (float) – Percentage of character (per token) will be augmented.
stop_words (list) – List of words which will be skipped from augment operation.
mode (str) – just support [‘random’, ‘replace’, ‘swap’, ‘insert’, ‘delete’].
skip_first_char (bool) – whether skip the first char of target word.
skip_last_char (bool) – whether skip the last char of target word.
-
-
class
textflint.generation_layer.transformation.UT.typos.WordSubstitute(trans_min=1, trans_max=10, trans_p=0.1, stop_words=None, **kwargs)[source]¶ Bases:
textflint.generation_layer.transformation.transformation.TransformationWord replace transformation to implement normal word replace functions.
-
__init__(trans_min=1, trans_max=10, trans_p=0.1, stop_words=None, **kwargs)[source]¶ - Parameters
trans_min (int) – Minimum number of word will be augmented.
trans_max (int) – Maximum number of word will be augmented. If None is passed, number of augmentation is calculated via aup_char_p. If calculated result from aug_p is smaller than aug_max, will use calculated result from aup_char_p. Otherwise, using aug_max.
trans_p (float) – Percentage of word will be augmented.
stop_words (list) – List of words which will be skipped from augment operation.
processor (EnProcessor) –
get_pos (bool) – whether pass pos tag to _get_substitute_words API.
-
abstract
skip_aug(tokens, mask, pos=None)[source]¶ Returns the index of the replaced tokens.
- Parameters
tokens (list) – tokenized words or word with pos tag pairs
- Return list
the index of the replaced tokens
-
is_stop_words(token)[source]¶ Judge whether the input word belongs to the stop words vocab.
- Parameters
token (str) – the input word to be judged
- Return bool
is a stop word or not
-
pre_skip_aug(tokens, mask)[source]¶ Skip the tokens in stop words list or punctuation list.
- Parameters
tokens (list) – the list of tokens
mask (list) – the list of mask Indicates whether each word is allowed to be substituted. ORIGIN is allowed, while TASK_MASK and MODIFIED_MASK is not.
- Return list
List of possible substituted token index.
-
-
textflint.generation_layer.transformation.UT.typos.delete(word, num=1, skip_first=False, skip_last=False)[source]¶ Perturb the word wityh 1 letter deleted.
- Parameters
word (str) – number of typos to add
num (int) – number of typos to add
skip_first (bool) – whether delete the char at the beginning of word
skip_last (bool) – whether delete the char at the end of word
- Returns
perturbed strings
-
textflint.generation_layer.transformation.UT.typos.get_random_letter(src_char=None)[source]¶ Get replaced letter according src_char format.
- Parameters
src_char (char) –
- Returns
default return a lower letter
-
textflint.generation_layer.transformation.UT.typos.get_start_end(word, skip_first=False, skip_last=False)[source]¶ Get valid operation range of one word.
- Parameters
word (str) – target word string
skip_first (bool) – whether operate first char
skip_last (bool) – whether operate last char
- Returns
start index, last index
-
textflint.generation_layer.transformation.UT.typos.insert(word, num=1, skip_first=False, skip_last=False)[source]¶ Perturb the word with 1 random character inserted.
- Parameters
word (str) – target word
num (int) – number of typos to add
skip_first (bool) – whether insert char at the beginning of word
skip_last (bool) – whether insert char at the end of word
- Returns
perturbed strings
-
textflint.generation_layer.transformation.UT.typos.replace(word, num=1, skip_first=False, skip_last=False)[source]¶ Perturb the word with 1 letter substituted for a random letter.
- Parameters
word (str) – target word
num (int) – number of typos to add
skip_first (bool) – whether replace the char at the beginning of word
skip_last (bool) – whether replace the char at the beginning of word
- Returns
perturbed strings
-
textflint.generation_layer.transformation.UT.typos.swap(word, num=1, skip_first=False, skip_last=False)[source]¶ Swaps random characters with their neighbors.
- Parameters
word (str) – target word
num (int) – number of typos to add
skip_first (bool) – whether swap first char of word
skip_last (bool) – whether swap last char of word
- Returns
perturbed strings