textflint.generation_layer.transformation.POS.prefix_swap¶
SwapPrefix transformation for POS tagging¶
-
class
textflint.generation_layer.transformation.POS.prefix_swap.SwapPrefix(trans_max=2, trans_p=1, **kwargs)[source]¶ Bases:
textflint.generation_layer.transformation.word_substitute.WordSubstituteSwap prefix and keep the same POS tags.
-
class
textflint.generation_layer.transformation.POS.prefix_swap.Counter(**kwds)[source]¶ Bases:
dictDict subclass for counting hashable items. Sometimes called a bag or multiset. Elements are stored as dictionary keys and their counts are stored as dictionary values.
>>> c = Counter('abcdeabcdabcaba') # count elements from a string
>>> c.most_common(3) # three most common elements [('a', 5), ('b', 4), ('c', 3)] >>> sorted(c) # list all unique elements ['a', 'b', 'c', 'd', 'e'] >>> ''.join(sorted(c.elements())) # list elements with repetitions 'aaaaabbbbcccdde' >>> sum(c.values()) # total of all counts 15
>>> c['a'] # count of letter 'a' 5 >>> for elem in 'shazam': # update counts from an iterable ... c[elem] += 1 # by adding 1 to each element's count >>> c['a'] # now there are seven 'a' 7 >>> del c['b'] # remove all 'b' >>> c['b'] # now there are zero 'b' 0
>>> d = Counter('simsalabim') # make another counter >>> c.update(d) # add in the second counter >>> c['a'] # now there are nine 'a' 9
>>> c.clear() # empty the counter >>> c Counter()
Note: If a count is set to zero or reduced to zero, it will remain in the counter until the entry is deleted or the counter is cleared:
>>> c = Counter('aaabbc') >>> c['b'] -= 2 # reduce the count of 'b' by two >>> c.most_common() # 'b' is still in, but its count is zero [('a', 3), ('c', 1), ('b', 0)]
-
__init__(**kwds)[source]¶ Create a new, empty Counter object. And if given, count elements from an input iterable. Or, initialize the count from another mapping of elements to their counts.
>>> c = Counter() # a new, empty counter >>> c = Counter('gallahad') # a new counter from an iterable >>> c = Counter({'a': 4, 'b': 2}) # a new counter from a mapping >>> c = Counter(a=4, b=2) # a new counter from keyword args
-
most_common(n=None)[source]¶ List the n most common elements and their counts from the most common to the least. If n is None, then list all element counts.
>>> Counter('abcdeabcdabcaba').most_common(3) [('a', 5), ('b', 4), ('c', 3)]
-
elements()[source]¶ Iterator over elements repeating each as many times as its count.
>>> c = Counter('ABCABC') >>> sorted(c.elements()) ['A', 'A', 'B', 'B', 'C', 'C']
# Knuth’s example for prime factors of 1836: 2**2 * 3**3 * 17**1 >>> prime_factors = Counter({2: 2, 3: 3, 17: 1}) >>> product = 1 >>> for factor in prime_factors.elements(): # loop over factors … product *= factor # and multiply them >>> product 1836
Note, if an element’s count has been set to zero or is a negative number, elements() will ignore it.
-
update(**kwds)[source]¶ Like dict.update() but add counts instead of replacing them.
Source can be an iterable, a dictionary, or another Counter instance.
>>> c = Counter('which') >>> c.update('witch') # add elements from another iterable >>> d = Counter('watch') >>> c.update(d) # add elements from another counter >>> c['h'] # four 'h' in which, witch, and watch 4
-
subtract(**kwds)[source]¶ Like dict.update() but subtracts counts instead of replacing them. Counts can be reduced below zero. Both the inputs and outputs are allowed to contain zero and negative counts.
Source can be an iterable, a dictionary, or another Counter instance.
>>> c = Counter('which') >>> c.subtract('witch') # subtract elements from another iterable >>> c.subtract(Counter('watch')) # subtract elements from another counter >>> c['h'] # 2 in which, minus 1 in witch, minus 1 in watch 0 >>> c['w'] # 1 in which, minus 1 in witch, minus 1 in watch -1
-
-
class
textflint.generation_layer.transformation.POS.prefix_swap.POSSample(data, origin=None, sample_id=None)[source]¶ Bases:
textflint.input_layer.component.sample.sample.SamplePOS Sample class to hold the necessary info and provide atomic operations.
-
class
textflint.generation_layer.transformation.POS.prefix_swap.WordSubstitute(trans_min=1, trans_max=10, trans_p=0.1, stop_words=None, **kwargs)[source]¶ Bases:
textflint.generation_layer.transformation.transformation.TransformationWord replace transformation to implement normal word replace functions.
-
__init__(trans_min=1, trans_max=10, trans_p=0.1, stop_words=None, **kwargs)[source]¶ - Parameters
trans_min (int) – Minimum number of word will be augmented.
trans_max (int) – Maximum number of word will be augmented. If None is passed, number of augmentation is calculated via aup_char_p. If calculated result from aug_p is smaller than aug_max, will use calculated result from aup_char_p. Otherwise, using aug_max.
trans_p (float) – Percentage of word will be augmented.
stop_words (list) – List of words which will be skipped from augment operation.
processor (EnProcessor) –
get_pos (bool) – whether pass pos tag to _get_substitute_words API.
-
abstract
skip_aug(tokens, mask, pos=None)[source]¶ Returns the index of the replaced tokens.
- Parameters
tokens (list) – tokenized words or word with pos tag pairs
- Return list
the index of the replaced tokens
-
is_stop_words(token)[source]¶ Judge whether the input word belongs to the stop words vocab.
- Parameters
token (str) – the input word to be judged
- Return bool
is a stop word or not
-
pre_skip_aug(tokens, mask)[source]¶ Skip the tokens in stop words list or punctuation list.
- Parameters
tokens (list) – the list of tokens
mask (list) – the list of mask Indicates whether each word is allowed to be substituted. ORIGIN is allowed, while TASK_MASK and MODIFIED_MASK is not.
- Return list
List of possible substituted token index.
-
-
textflint.generation_layer.transformation.POS.prefix_swap.download_if_needed(folder_name)[source]¶ Folder name will be saved as .cache/textflint/[folder_name]. If it doesn’t exist on disk, the zip file will be downloaded and extracted.
- Parameters
folder_name (str) – path to folder or file in cache
- Returns
path to the downloaded folder or file on disk