textflint.generation_layer.transformation.COREF.random_concat

Coref - Rnd concat: Concat randomly chosen samples from

other_samples behind samples from sample


class textflint.generation_layer.transformation.COREF.random_concat.RndConcat(**kwargs)[source]

Bases: textflint.generation_layer.transformation.transformation.Transformation

Concatenate one extra sample to the original sample, with maintaining

the coref-relations themselves.

Attributes:

processor: textflint.common.preprocess.TextProcessor.

Example:

ori: {
    'sentences': [
        ['I', 'came'], ['I', 'saw'], ['I', 'conquered'], 
        ['Anna', 'bel', 'wanna', 'sleep'],
        ['Anna', 'bel', 'is', 'happy']
        ],
    'clusters': [
        [[1, 1], [3, 3], [5, 5]], 
        [[7, 8], [11, 12]]]}
trans: {
    'sentences': [
        ['I', 'came'], ['I', 'saw'], ['I', 'conquered'], 
        ['Anna', 'bel', 'wanna', 'sleep'],
        ['Anna', 'bel', 'is', 'happy'],
        ['who', 'is', 'this', 'boy'], ['he', 'is', 'Jotion']],
    'clusters': [
        [[1, 1], [3, 3], [5, 5]], 
        [[7, 8], [11, 12]], 
        [[17, 18], [19, 19], [21, 21]]]}
class textflint.generation_layer.transformation.COREF.random_concat.CorefSample(data, origin=None, sample_id=None)[source]

Bases: textflint.input_layer.component.sample.sample.Sample

Coref Sample

check_data(data)[source]

Check if data is a conll-dict and is ready to be predicted.

Parameters

data (None|dict) – Must have key: sentences, clusters May have key: doc_key, speakers, constituents, ner

Returns

Validate whether the sample is legal.

load(data)[source]

Convert a conll-dict to CorefSample.

Parameters

data (None|dict) – None, or a conll-style dict Must have key: sentences, clusters May have key: doc_key, speakers, constituents, ner

Returns

dump(with_check=True)[source]

Dump a CorefSample to a conll-dict.

Parameters

with_check (bool) – whether the dumped conll-dict should be checked

Return dict ret_dict

a conll-style dict

pretty_print(show='Sample:')[source]

A pretty-printer for CorefSample. Print useful sample information by calling this function.

Parameters

show (str) – optional, the welcome information of printing this sample

num_sentences()[source]

the number of sentences in this sample

Param

Return int

the number of sentences in this sample

get_kth_sen(k)[source]

get the kth sen as a word list

Parameters

k (int) – sen id

Return list

kth sen, word list

eqlen_sen_map()[source]

Generate [0, 0, 1, 1, 1, 2, 2] from self.sen_map = [2, 3, 2]

Param

Return list

sentence mapping with equal length to x, like [0, 0, 1, 1, 1, 2, 2]

index_in_sen(idx)[source]

For the given word idx, determine which sen it is in.

Parameters

idx (int) – word idx

Return int

sen_idx, which sentence is word idx in

static sens2doc(sens)[source]

Given an 2nd list of str (word list list), concat it and records the length of each sentence

Parameters

sens (list) – 2nd list of str (word list list)

Returns (list, list)

x as list of str (word list), sen_map as list of int (sen len list)

static doc2sens(x, sen_map)[source]

Given x and sen_map, return sens. Inverse to sens2doc.

Parameters
  • x (list) – list of str (word list)

  • sen_map (list) – list of int (sen len list)

Return list

sens as 2nd list of str (word list list)

insert_field_before_indices(field, indices, items)[source]

Insert items of given scopes before indices of field value simutaneously

Parameters
  • field (str) – transformed field

  • indices (list) – indices of insert positions

  • items (list) – insert items

Return ~textflint.CorefSample

modified sample

insert_field_after_indices(field, indices, items)[source]

Insert items of given scopes after indices of field value simutaneously.

Parameters
  • field (str) – transformed field

  • indices (list) – indices of insert positions

  • items (list) – insert items

Return ~textflint.CorefSample

modified sample

delete_field_at_indices(field, indices)[source]

Delete items of given scopes of field value.

Parameters
  • field (str) – transformed field

  • indices (list) – indices of delete positions

Return ~textflint.CorefSample

modified sample

replace_field_at_indices(field, indices, items)[source]

Replace scope items of field value with items. :param str field: transformed field :param list indices: indices of delete positions :param list items: insert items :return ~textflint.CorefSample: modified sample

static concat_conlls(*args)[source]

Given several CorefSamples, concat the values key by key.

Param

Some CorefSamples

Return ~textflint.input_layer.component.sample.CorefSample

A CorefSample, as the docs are concanated to form one x

shuffle_conll(sen_idxs)[source]

Given a CorefSample and shuffled sentence indexes, reproduce a CorefSample with respect to the indexes.

Parameters

sen_idxs (list) – a list of ints. the indexes in a shuffled order we expect sen_idxs is like [1, 3, 0, 4, 2, 5] when sen_num = 6

Return ~textflint.input_layer.component.sample.CorefSample

a CorefSample with respect to the shuffled index

part_conll(pres_idxs)[source]

Only sentences with indexs will be kept, and all the structures of clusters are kept for convenience of concat.

Parameters

pres_idxs (list) – a list of ints. the indexes to be preserved we expect pres_idxs is from [0..num_sen], and is in ascending order, like [0, 1, 3, 5] when num_sen = 6

Return ~textflint.input_layer.component.sample.CorefSample

a CorefPartSample of a conll-part

part_before_conll(sen_idx)[source]

Only sentences [0, sen_idx) will be kept, and all the structures of clusters are kept for convenience of concat.

Parameters

sen_idx (int) – sentences with idx < sen_idx will be preserved

Return ~textflint.input_layer.component.sample.CorefSample

a CorefPartSample of a conll-part

part_after_conll(sen_idx)[source]

Only sentences [sen_idx:] will be kept, and all the structures of clusters are kept for convenience of concat.

Parameters

sen_idx (int) – sentences with idx < sen_idx will be preserved

Return ~textflint.input_layer.component.sample.CorefSample

a CorefPartSample of a conll-part

class textflint.generation_layer.transformation.COREF.random_concat.Transformation(**kwargs)[source]

Bases: abc.ABC

An abstract class for transforming a sequence of text to produce a list of potential adversarial example.

processor = <textflint.common.preprocess.en_processor.EnProcessor object>
transform(sample, n=1, field='x', **kwargs)[source]

Transform data sample to a list of Sample.

Parameters
  • sample (Sample) – Data sample for augmentation.

  • n (int) – Max number of unique augmented output, default is 5.

  • field (str|list) – Indicate which fields to apply transformations.

  • **kwargs (dict) –

    other auxiliary params.

Returns

list of Sample

classmethod sample_num(x, num)[source]

Get ‘num’ samples from x.

Parameters
  • x (list) – list to sample

  • num (int) – sample number

Returns

max ‘num’ unique samples.