textflint.generation_layer.transformation.POS.multi_pos_swap¶

SwapMultiPOS transformation for POS tagging¶

class textflint.generation_layer.transformation.POS.multi_pos_swap.SwapMultiPOS(treebank_tag='JJ', trans_max=2, trans_p=1, **kwargs)[source]¶

Bases: textflint.generation_layer.transformation.word_substitute.WordSubstitute

Word Swap by swaping words that have multiple POS tags in WordNet.

__init__(treebank_tag='JJ', trans_max=2, trans_p=1, **kwargs)[source]¶

Parameters

treebank_tag – words with this pos tag will be replaced
kwargs –

get_candidates_from_wordnet()[source]¶

get all possible multi-pos words with pos tags same as treebank_tag.

Returns: a list

skip_aug(tokens, mask, pos=None)[source]¶

Returns the index of the replaced tokens.

Parameters

tokens – list, tokenized words or word with pos tag pairs
mask – list, the mask symbol of the tokens
pos – list, the pos tags of the tokens

Returns

list, the words at these indices that can be replaced

class textflint.generation_layer.transformation.POS.multi_pos_swap.POSSample(data, origin=None, sample_id=None)[source]¶

Bases: textflint.input_layer.component.sample.sample.Sample

POS Sample class to hold the necessary info and provide atomic operations.

get_pos(field)[source]¶

Get text field pos tag.

Parameters: field – str
Returns: list, a pos tag list.

check_data(data)[source]¶: Check rare data format.

is_legal()[source]¶: Validate whether the sample is legal

delete_field_at_indices(field, indices)[source]¶: See sample.py for details.

insert_field_before_indices(field, indices, items)[source]¶: See sample.py for details.

insert_field_after_indices(field, indices, items)[source]¶: See sample.py for details.

unequal_replace_field_at_indices(field, indices, rep_items)[source]¶: See sample.py for details.

load(data)[source]¶: Parse data into sample field value.

dump()[source]¶: Convert sample info to input data json format.

class textflint.generation_layer.transformation.POS.multi_pos_swap.WordSubstitute(trans_min=1, trans_max=10, trans_p=0.1, stop_words=None, **kwargs)[source]¶

Bases: textflint.generation_layer.transformation.transformation.Transformation

Word replace transformation to implement normal word replace functions.

__init__(trans_min=1, trans_max=10, trans_p=0.1, stop_words=None, **kwargs)[source]¶

Parameters

trans_min (int) – Minimum number of word will be augmented.
trans_max (int) – Maximum number of word will be augmented. If None is passed, number of augmentation is calculated via aup_char_p. If calculated result from aug_p is smaller than aug_max, will use calculated result from aup_char_p. Otherwise, using aug_max.
trans_p (float) – Percentage of word will be augmented.
stop_words (list) – List of words which will be skipped from augment operation.
processor (EnProcessor) –
get_pos (bool) – whether pass pos tag to _get_substitute_words API.

abstract skip_aug(tokens, mask, pos=None)[source]¶

Returns the index of the replaced tokens.

Parameters: tokens (list) – tokenized words or word with pos tag pairs
Return list: the index of the replaced tokens

is_stop_words(token)[source]¶

Judge whether the input word belongs to the stop words vocab.

Parameters: token (str) – the input word to be judged
Return bool: is a stop word or not

pre_skip_aug(tokens, mask)[source]¶

Skip the tokens in stop words list or punctuation list.

Parameters

tokens (list) – the list of tokens
mask (list) – the list of mask Indicates whether each word is allowed to be substituted. ORIGIN is allowed, while TASK_MASK and MODIFIED_MASK is not.

Return list

List of possible substituted token index.

get_trans_cnt(size)[source]¶

Get the num of words/chars transformation.

Parameters: size (int) – the size of target sentence
Return int: number of words to apply transformation.