textflint.generation_layer.transformation.COREF.random_concat¶
- Coref - Rnd concat: Concat randomly chosen samples from
other_samples behind samples from sample
-
class
textflint.generation_layer.transformation.COREF.random_concat.RndConcat(**kwargs)[source]¶ Bases:
textflint.generation_layer.transformation.transformation.Transformation- Concatenate one extra sample to the original sample, with maintaining
the coref-relations themselves.
- Attributes:
processor: textflint.common.preprocess.TextProcessor.
Example:
ori: { 'sentences': [ ['I', 'came'], ['I', 'saw'], ['I', 'conquered'], ['Anna', 'bel', 'wanna', 'sleep'], ['Anna', 'bel', 'is', 'happy'] ], 'clusters': [ [[1, 1], [3, 3], [5, 5]], [[7, 8], [11, 12]]]} trans: { 'sentences': [ ['I', 'came'], ['I', 'saw'], ['I', 'conquered'], ['Anna', 'bel', 'wanna', 'sleep'], ['Anna', 'bel', 'is', 'happy'], ['who', 'is', 'this', 'boy'], ['he', 'is', 'Jotion']], 'clusters': [ [[1, 1], [3, 3], [5, 5]], [[7, 8], [11, 12]], [[17, 18], [19, 19], [21, 21]]]}
-
class
textflint.generation_layer.transformation.COREF.random_concat.CorefSample(data, origin=None, sample_id=None)[source]¶ Bases:
textflint.input_layer.component.sample.sample.SampleCoref Sample
-
check_data(data)[source]¶ Check if data is a conll-dict and is ready to be predicted.
- Parameters
data (None|dict) – Must have key: sentences, clusters May have key: doc_key, speakers, constituents, ner
- Returns
-
load(data)[source]¶ Convert a conll-dict to CorefSample.
- Parameters
data (None|dict) – None, or a conll-style dict Must have key: sentences, clusters May have key: doc_key, speakers, constituents, ner
- Returns
-
dump(with_check=True)[source]¶ Dump a CorefSample to a conll-dict.
- Parameters
with_check (bool) – whether the dumped conll-dict should be checked
- Return dict ret_dict
a conll-style dict
-
pretty_print(show='Sample:')[source]¶ A pretty-printer for CorefSample. Print useful sample information by calling this function.
- Parameters
show (str) – optional, the welcome information of printing this sample
-
num_sentences()[source]¶ the number of sentences in this sample
- Param
- Return int
the number of sentences in this sample
-
get_kth_sen(k)[source]¶ get the kth sen as a word list
- Parameters
k (int) – sen id
- Return list
kth sen, word list
-
eqlen_sen_map()[source]¶ Generate [0, 0, 1, 1, 1, 2, 2] from self.sen_map = [2, 3, 2]
- Param
- Return list
sentence mapping with equal length to x, like [0, 0, 1, 1, 1, 2, 2]
-
index_in_sen(idx)[source]¶ For the given word idx, determine which sen it is in.
- Parameters
idx (int) – word idx
- Return int
sen_idx, which sentence is word idx in
-
static
sens2doc(sens)[source]¶ Given an 2nd list of str (word list list), concat it and records the length of each sentence
- Parameters
sens (list) – 2nd list of str (word list list)
- Returns (list, list)
x as list of str (word list), sen_map as list of int (sen len list)
-
static
doc2sens(x, sen_map)[source]¶ Given x and sen_map, return sens. Inverse to sens2doc.
- Parameters
x (list) – list of str (word list)
sen_map (list) – list of int (sen len list)
- Return list
sens as 2nd list of str (word list list)
-
insert_field_before_indices(field, indices, items)[source]¶ Insert items of given scopes before indices of field value simutaneously
- Parameters
field (str) – transformed field
indices (list) – indices of insert positions
items (list) – insert items
- Return ~textflint.CorefSample
modified sample
-
insert_field_after_indices(field, indices, items)[source]¶ Insert items of given scopes after indices of field value simutaneously.
- Parameters
field (str) – transformed field
indices (list) – indices of insert positions
items (list) – insert items
- Return ~textflint.CorefSample
modified sample
-
delete_field_at_indices(field, indices)[source]¶ Delete items of given scopes of field value.
- Parameters
field (str) – transformed field
indices (list) – indices of delete positions
- Return ~textflint.CorefSample
modified sample
-
replace_field_at_indices(field, indices, items)[source]¶ Replace scope items of field value with items. :param str field: transformed field :param list indices: indices of delete positions :param list items: insert items :return ~textflint.CorefSample: modified sample
-
static
concat_conlls(*args)[source]¶ Given several CorefSamples, concat the values key by key.
- Param
Some CorefSamples
- Return ~textflint.input_layer.component.sample.CorefSample
A CorefSample, as the docs are concanated to form one x
-
shuffle_conll(sen_idxs)[source]¶ Given a CorefSample and shuffled sentence indexes, reproduce a CorefSample with respect to the indexes.
- Parameters
sen_idxs (list) – a list of ints. the indexes in a shuffled order we expect sen_idxs is like [1, 3, 0, 4, 2, 5] when sen_num = 6
- Return ~textflint.input_layer.component.sample.CorefSample
a CorefSample with respect to the shuffled index
-
part_conll(pres_idxs)[source]¶ Only sentences with indexs will be kept, and all the structures of clusters are kept for convenience of concat.
- Parameters
pres_idxs (list) – a list of ints. the indexes to be preserved we expect pres_idxs is from [0..num_sen], and is in ascending order, like [0, 1, 3, 5] when num_sen = 6
- Return ~textflint.input_layer.component.sample.CorefSample
a CorefPartSample of a conll-part
-
part_before_conll(sen_idx)[source]¶ Only sentences [0, sen_idx) will be kept, and all the structures of clusters are kept for convenience of concat.
- Parameters
sen_idx (int) – sentences with idx < sen_idx will be preserved
- Return ~textflint.input_layer.component.sample.CorefSample
a CorefPartSample of a conll-part
-
part_after_conll(sen_idx)[source]¶ Only sentences [sen_idx:] will be kept, and all the structures of clusters are kept for convenience of concat.
- Parameters
sen_idx (int) – sentences with idx < sen_idx will be preserved
- Return ~textflint.input_layer.component.sample.CorefSample
a CorefPartSample of a conll-part
-
-
class
textflint.generation_layer.transformation.COREF.random_concat.Transformation(**kwargs)[source]¶ Bases:
abc.ABCAn abstract class for transforming a sequence of text to produce a list of potential adversarial example.
-
processor= <textflint.common.preprocess.en_processor.EnProcessor object>¶
-
transform(sample, n=1, field='x', **kwargs)[source]¶ Transform data sample to a list of Sample.
- Parameters
sample (Sample) – Data sample for augmentation.
n (int) – Max number of unique augmented output, default is 5.
field (str|list) – Indicate which fields to apply transformations.
**kwargs (dict) –
other auxiliary params.
- Returns
list of Sample
-