textflint.generation_layer.transformation.COREF.random_repeat¶
- Coref - Rnd repeat: Randomly choose some sentences, and each of them
will be repeated somewhere else in the sample.
-
class
textflint.generation_layer.transformation.COREF.random_repeat.RndRepeat(trans_p=0.2, **kwargs)[source]¶ Bases:
textflint.generation_layer.transformation.transformation.TransformationRandomly choose trans_p * num_sentences sentences, and each of them will be repeated
somewhere else in the sample.
- Attributes:
trans_p: proportion of repeated sentences; default 0.2 processor: textflint.common.preprocess.TextProcessor.
Example:
ori: { 'sentences': [ ['I', 'came'], ['I', 'saw'], ['I', 'conquered'], ['Anna', 'bel', 'wanna', 'sleep'], ['Anna', 'bel', 'is', 'happy']], 'clusters': [ [[1, 1], [3, 3], [5, 5]], [[7, 8], [11, 12]]]} trans: { 'sentences': [ ['I', 'came'], ['I', 'saw'], ['Anna', 'bel', 'wanna', 'sleep'], ['I', 'conquered'], ['Anna', 'bel', 'wanna', 'sleep'], ['Anna', 'bel', 'is', 'happy']], 'clusters': [ [[1, 1], [3, 3], [9, 9]], [[5, 6], [11, 12], [17, 18]]]}
-
class
textflint.generation_layer.transformation.COREF.random_repeat.CorefSample(data, origin=None, sample_id=None)[source]¶ Bases:
textflint.input_layer.component.sample.sample.SampleCoref Sample
-
check_data(data)[source]¶ Check if data is a conll-dict and is ready to be predicted.
- Parameters
data (None|dict) – Must have key: sentences, clusters May have key: doc_key, speakers, constituents, ner
- Returns
-
load(data)[source]¶ Convert a conll-dict to CorefSample.
- Parameters
data (None|dict) – None, or a conll-style dict Must have key: sentences, clusters May have key: doc_key, speakers, constituents, ner
- Returns
-
dump(with_check=True)[source]¶ Dump a CorefSample to a conll-dict.
- Parameters
with_check (bool) – whether the dumped conll-dict should be checked
- Return dict ret_dict
a conll-style dict
-
pretty_print(show='Sample:')[source]¶ A pretty-printer for CorefSample. Print useful sample information by calling this function.
- Parameters
show (str) – optional, the welcome information of printing this sample
-
num_sentences()[source]¶ the number of sentences in this sample
- Param
- Return int
the number of sentences in this sample
-
get_kth_sen(k)[source]¶ get the kth sen as a word list
- Parameters
k (int) – sen id
- Return list
kth sen, word list
-
eqlen_sen_map()[source]¶ Generate [0, 0, 1, 1, 1, 2, 2] from self.sen_map = [2, 3, 2]
- Param
- Return list
sentence mapping with equal length to x, like [0, 0, 1, 1, 1, 2, 2]
-
index_in_sen(idx)[source]¶ For the given word idx, determine which sen it is in.
- Parameters
idx (int) – word idx
- Return int
sen_idx, which sentence is word idx in
-
static
sens2doc(sens)[source]¶ Given an 2nd list of str (word list list), concat it and records the length of each sentence
- Parameters
sens (list) – 2nd list of str (word list list)
- Returns (list, list)
x as list of str (word list), sen_map as list of int (sen len list)
-
static
doc2sens(x, sen_map)[source]¶ Given x and sen_map, return sens. Inverse to sens2doc.
- Parameters
x (list) – list of str (word list)
sen_map (list) – list of int (sen len list)
- Return list
sens as 2nd list of str (word list list)
-
insert_field_before_indices(field, indices, items)[source]¶ Insert items of given scopes before indices of field value simutaneously
- Parameters
field (str) – transformed field
indices (list) – indices of insert positions
items (list) – insert items
- Return ~textflint.CorefSample
modified sample
-
insert_field_after_indices(field, indices, items)[source]¶ Insert items of given scopes after indices of field value simutaneously.
- Parameters
field (str) – transformed field
indices (list) – indices of insert positions
items (list) – insert items
- Return ~textflint.CorefSample
modified sample
-
delete_field_at_indices(field, indices)[source]¶ Delete items of given scopes of field value.
- Parameters
field (str) – transformed field
indices (list) – indices of delete positions
- Return ~textflint.CorefSample
modified sample
-
replace_field_at_indices(field, indices, items)[source]¶ Replace scope items of field value with items. :param str field: transformed field :param list indices: indices of delete positions :param list items: insert items :return ~textflint.CorefSample: modified sample
-
static
concat_conlls(*args)[source]¶ Given several CorefSamples, concat the values key by key.
- Param
Some CorefSamples
- Return ~textflint.input_layer.component.sample.CorefSample
A CorefSample, as the docs are concanated to form one x
-
shuffle_conll(sen_idxs)[source]¶ Given a CorefSample and shuffled sentence indexes, reproduce a CorefSample with respect to the indexes.
- Parameters
sen_idxs (list) – a list of ints. the indexes in a shuffled order we expect sen_idxs is like [1, 3, 0, 4, 2, 5] when sen_num = 6
- Return ~textflint.input_layer.component.sample.CorefSample
a CorefSample with respect to the shuffled index
-
part_conll(pres_idxs)[source]¶ Only sentences with indexs will be kept, and all the structures of clusters are kept for convenience of concat.
- Parameters
pres_idxs (list) – a list of ints. the indexes to be preserved we expect pres_idxs is from [0..num_sen], and is in ascending order, like [0, 1, 3, 5] when num_sen = 6
- Return ~textflint.input_layer.component.sample.CorefSample
a CorefPartSample of a conll-part
-
part_before_conll(sen_idx)[source]¶ Only sentences [0, sen_idx) will be kept, and all the structures of clusters are kept for convenience of concat.
- Parameters
sen_idx (int) – sentences with idx < sen_idx will be preserved
- Return ~textflint.input_layer.component.sample.CorefSample
a CorefPartSample of a conll-part
-
part_after_conll(sen_idx)[source]¶ Only sentences [sen_idx:] will be kept, and all the structures of clusters are kept for convenience of concat.
- Parameters
sen_idx (int) – sentences with idx < sen_idx will be preserved
- Return ~textflint.input_layer.component.sample.CorefSample
a CorefPartSample of a conll-part
-
-
class
textflint.generation_layer.transformation.COREF.random_repeat.ListField(field_value, **kwargs)[source]¶ Bases:
textflint.input_layer.component.field.field.FieldA helper class that represents input list values that to be modified.
Operations which modify field_value would generate new Field instance.
-
__init__(field_value, **kwargs)[source]¶ - Parameters
field_value ([str]) – The list that ListField represents.
-
replace_at_indices(indices, new_items)[source]¶ Replace items at indices.
Notice: just support isometric replace.
- Parameters
indices (list[int|list|slice]) – each index can be int indicate replace single item or their list like [1, 2, 3]. each index can be list like (0,3) indicate replace items from 0 to 3(not included) or their list like [(0, 3), (5,6)] each index can be slice which would be convert to list.
new_items (list) – items corresponding indices.
- Returns
new field object.
-
replace_at_index(index, new_items)[source]¶ Replace item at index.
- Parameters
index (int|list|slice) –
can be int indicate replace single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items from 0
to 3(not included) or their list like [(0, 3), (5,6)]
can be slice which would be convert to list.
new_items (list) – items corresponding index.
- Returns
new field object.
-
delete_at_indices(indices)[source]¶ Delete items at indices.
- Parameters
indices (list[int|list|slice]) – each index can be int indicate delete single item or their list like [1, 2, 3]. each index can be list like (0,3) indicate replace items from 0 to 3(not included) or their list like [(0, 3), (5,6)] each index can be slice which would be convert to list.
- Returns
new field object.
-
delete_at_index(index)[source]¶ Delete item at index.
- Parameters
index (int|list|slice) –
can be int indicate delete single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items from 0
to 3(not included) or their list like [(0, 3), (5,6)]
can be slice which would be convert to list.
- Returns
new field object.
-
insert_before_indices(indices, new_items)[source]¶ Insert items before indices.
- Parameters
indices (list[int|list|slice]) – each index can be int indicate insert single item or their list like [1, 2, 3]. each index can be list like (0,3) indicate replace items from 0 to 3(not included) or their list like [(0, 3), (5,6)] each index can be slice which would be convert to list.
new_items (list) – items corresponding indices.
- Returns
new field object.
-
insert_before_index(index, new_items)[source]¶ Insert items before index.
- Parameters
index (int|list|slice) –
can be int indicate insert single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items from 0
to 3(not included) or their list like [(0, 3), (5,6)]
can be slice which would be convert to list.
new_items (list) – items corresponding index.
- Returns
new field object.
-
insert_after_indices(indices, new_items)[source]¶ Insert item after index.
- Parameters
indices (list[int|list|slice]) – each index can be int indicate insert single item or their list like [1, 2, 3]. each index can be list like (0,3) indicate replace items from 0 to 3(not included) or their list like [(0, 3), (5,6)] each index can be slice which would be convert to list.
new_items (list) – items corresponding indices.
- Returns
new field object.
-
insert_after_index(index, new_items)[source]¶ Insert item after index.
- Parameters
index (int|list|slice) –
can be int indicate insert single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items from 0
to 3(not included) or their list like [(0, 3), (5,6)]
can be slice which would be convert to list.
new_items (list) – items corresponding index
- Returns
new field object.
-
-
class
textflint.generation_layer.transformation.COREF.random_repeat.Transformation(**kwargs)[source]¶ Bases:
abc.ABCAn abstract class for transforming a sequence of text to produce a list of potential adversarial example.
-
processor= <textflint.common.preprocess.en_processor.EnProcessor object>¶
-
transform(sample, n=1, field='x', **kwargs)[source]¶ Transform data sample to a list of Sample.
- Parameters
sample (Sample) – Data sample for augmentation.
n (int) – Max number of unique augmented output, default is 5.
field (str|list) – Indicate which fields to apply transformations.
**kwargs (dict) –
other auxiliary params.
- Returns
list of Sample
-
-
textflint.generation_layer.transformation.COREF.random_repeat.ceil(x, /)¶ Return the ceiling of x as an Integral.
This is the smallest integer >= x.