textflint.generation_layer.transformation.COREF.random_repeat

Coref - Rnd repeat: Randomly choose some sentences, and each of them

will be repeated somewhere else in the sample.


class textflint.generation_layer.transformation.COREF.random_repeat.RndRepeat(trans_p=0.2, **kwargs)[source]

Bases: textflint.generation_layer.transformation.transformation.Transformation

Randomly choose trans_p * num_sentences sentences, and each of them will be repeated

somewhere else in the sample.

Attributes:

trans_p: proportion of repeated sentences; default 0.2 processor: textflint.common.preprocess.TextProcessor.

Example:

ori: {
    'sentences': [
        ['I', 'came'], ['I', 'saw'], ['I', 'conquered'], 
        ['Anna', 'bel', 'wanna', 'sleep'],
        ['Anna', 'bel', 'is', 'happy']],
    'clusters': [
        [[1, 1], [3, 3], [5, 5]], 
        [[7, 8], [11, 12]]]}
trans: {
    'sentences': [
        ['I', 'came'], ['I', 'saw'], ['Anna', 'bel', 'wanna', 'sleep'], 
        ['I', 'conquered'], ['Anna', 'bel', 'wanna', 'sleep'], 
        ['Anna', 'bel', 'is', 'happy']],
    'clusters': [
        [[1, 1], [3, 3], [9, 9]], 
        [[5, 6], [11, 12], [17, 18]]]}
class textflint.generation_layer.transformation.COREF.random_repeat.CorefSample(data, origin=None, sample_id=None)[source]

Bases: textflint.input_layer.component.sample.sample.Sample

Coref Sample

check_data(data)[source]

Check if data is a conll-dict and is ready to be predicted.

Parameters

data (None|dict) – Must have key: sentences, clusters May have key: doc_key, speakers, constituents, ner

Returns

Validate whether the sample is legal.

load(data)[source]

Convert a conll-dict to CorefSample.

Parameters

data (None|dict) – None, or a conll-style dict Must have key: sentences, clusters May have key: doc_key, speakers, constituents, ner

Returns

dump(with_check=True)[source]

Dump a CorefSample to a conll-dict.

Parameters

with_check (bool) – whether the dumped conll-dict should be checked

Return dict ret_dict

a conll-style dict

pretty_print(show='Sample:')[source]

A pretty-printer for CorefSample. Print useful sample information by calling this function.

Parameters

show (str) – optional, the welcome information of printing this sample

num_sentences()[source]

the number of sentences in this sample

Param

Return int

the number of sentences in this sample

get_kth_sen(k)[source]

get the kth sen as a word list

Parameters

k (int) – sen id

Return list

kth sen, word list

eqlen_sen_map()[source]

Generate [0, 0, 1, 1, 1, 2, 2] from self.sen_map = [2, 3, 2]

Param

Return list

sentence mapping with equal length to x, like [0, 0, 1, 1, 1, 2, 2]

index_in_sen(idx)[source]

For the given word idx, determine which sen it is in.

Parameters

idx (int) – word idx

Return int

sen_idx, which sentence is word idx in

static sens2doc(sens)[source]

Given an 2nd list of str (word list list), concat it and records the length of each sentence

Parameters

sens (list) – 2nd list of str (word list list)

Returns (list, list)

x as list of str (word list), sen_map as list of int (sen len list)

static doc2sens(x, sen_map)[source]

Given x and sen_map, return sens. Inverse to sens2doc.

Parameters
  • x (list) – list of str (word list)

  • sen_map (list) – list of int (sen len list)

Return list

sens as 2nd list of str (word list list)

insert_field_before_indices(field, indices, items)[source]

Insert items of given scopes before indices of field value simutaneously

Parameters
  • field (str) – transformed field

  • indices (list) – indices of insert positions

  • items (list) – insert items

Return ~textflint.CorefSample

modified sample

insert_field_after_indices(field, indices, items)[source]

Insert items of given scopes after indices of field value simutaneously.

Parameters
  • field (str) – transformed field

  • indices (list) – indices of insert positions

  • items (list) – insert items

Return ~textflint.CorefSample

modified sample

delete_field_at_indices(field, indices)[source]

Delete items of given scopes of field value.

Parameters
  • field (str) – transformed field

  • indices (list) – indices of delete positions

Return ~textflint.CorefSample

modified sample

replace_field_at_indices(field, indices, items)[source]

Replace scope items of field value with items. :param str field: transformed field :param list indices: indices of delete positions :param list items: insert items :return ~textflint.CorefSample: modified sample

static concat_conlls(*args)[source]

Given several CorefSamples, concat the values key by key.

Param

Some CorefSamples

Return ~textflint.input_layer.component.sample.CorefSample

A CorefSample, as the docs are concanated to form one x

shuffle_conll(sen_idxs)[source]

Given a CorefSample and shuffled sentence indexes, reproduce a CorefSample with respect to the indexes.

Parameters

sen_idxs (list) – a list of ints. the indexes in a shuffled order we expect sen_idxs is like [1, 3, 0, 4, 2, 5] when sen_num = 6

Return ~textflint.input_layer.component.sample.CorefSample

a CorefSample with respect to the shuffled index

part_conll(pres_idxs)[source]

Only sentences with indexs will be kept, and all the structures of clusters are kept for convenience of concat.

Parameters

pres_idxs (list) – a list of ints. the indexes to be preserved we expect pres_idxs is from [0..num_sen], and is in ascending order, like [0, 1, 3, 5] when num_sen = 6

Return ~textflint.input_layer.component.sample.CorefSample

a CorefPartSample of a conll-part

part_before_conll(sen_idx)[source]

Only sentences [0, sen_idx) will be kept, and all the structures of clusters are kept for convenience of concat.

Parameters

sen_idx (int) – sentences with idx < sen_idx will be preserved

Return ~textflint.input_layer.component.sample.CorefSample

a CorefPartSample of a conll-part

part_after_conll(sen_idx)[source]

Only sentences [sen_idx:] will be kept, and all the structures of clusters are kept for convenience of concat.

Parameters

sen_idx (int) – sentences with idx < sen_idx will be preserved

Return ~textflint.input_layer.component.sample.CorefSample

a CorefPartSample of a conll-part

class textflint.generation_layer.transformation.COREF.random_repeat.ListField(field_value, **kwargs)[source]

Bases: textflint.input_layer.component.field.field.Field

A helper class that represents input list values that to be modified.

Operations which modify field_value would generate new Field instance.

__init__(field_value, **kwargs)[source]
Parameters

field_value ([str]) – The list that ListField represents.

replace_at_indices(indices, new_items)[source]

Replace items at indices.

Notice: just support isometric replace.

Parameters
  • indices (list[int|list|slice]) – each index can be int indicate replace single item or their list like [1, 2, 3]. each index can be list like (0,3) indicate replace items from 0 to 3(not included) or their list like [(0, 3), (5,6)] each index can be slice which would be convert to list.

  • new_items (list) – items corresponding indices.

Returns

new field object.

replace_at_index(index, new_items)[source]

Replace item at index.

Parameters
  • index (int|list|slice) –

    can be int indicate replace single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items from 0

    to 3(not included) or their list like [(0, 3), (5,6)]

    can be slice which would be convert to list.

  • new_items (list) – items corresponding index.

Returns

new field object.

delete_at_indices(indices)[source]

Delete items at indices.

Parameters

indices (list[int|list|slice]) – each index can be int indicate delete single item or their list like [1, 2, 3]. each index can be list like (0,3) indicate replace items from 0 to 3(not included) or their list like [(0, 3), (5,6)] each index can be slice which would be convert to list.

Returns

new field object.

delete_at_index(index)[source]

Delete item at index.

Parameters

index (int|list|slice) –

can be int indicate delete single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items from 0

to 3(not included) or their list like [(0, 3), (5,6)]

can be slice which would be convert to list.

Returns

new field object.

insert_before_indices(indices, new_items)[source]

Insert items before indices.

Parameters
  • indices (list[int|list|slice]) – each index can be int indicate insert single item or their list like [1, 2, 3]. each index can be list like (0,3) indicate replace items from 0 to 3(not included) or their list like [(0, 3), (5,6)] each index can be slice which would be convert to list.

  • new_items (list) – items corresponding indices.

Returns

new field object.

insert_before_index(index, new_items)[source]

Insert items before index.

Parameters
  • index (int|list|slice) –

    can be int indicate insert single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items from 0

    to 3(not included) or their list like [(0, 3), (5,6)]

    can be slice which would be convert to list.

  • new_items (list) – items corresponding index.

Returns

new field object.

insert_after_indices(indices, new_items)[source]

Insert item after index.

Parameters
  • indices (list[int|list|slice]) – each index can be int indicate insert single item or their list like [1, 2, 3]. each index can be list like (0,3) indicate replace items from 0 to 3(not included) or their list like [(0, 3), (5,6)] each index can be slice which would be convert to list.

  • new_items (list) – items corresponding indices.

Returns

new field object.

insert_after_index(index, new_items)[source]

Insert item after index.

Parameters
  • index (int|list|slice) –

    can be int indicate insert single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items from 0

    to 3(not included) or their list like [(0, 3), (5,6)]

    can be slice which would be convert to list.

  • new_items (list) – items corresponding index

Returns

new field object.

swap_at_index(first_index, second_index)[source]

Swap item between first_index and second_index.

Parameters
  • first_index (int) – index of first item

  • second_index (int) – index of second item

Returns

new field object.

class textflint.generation_layer.transformation.COREF.random_repeat.Transformation(**kwargs)[source]

Bases: abc.ABC

An abstract class for transforming a sequence of text to produce a list of potential adversarial example.

processor = <textflint.common.preprocess.en_processor.EnProcessor object>
transform(sample, n=1, field='x', **kwargs)[source]

Transform data sample to a list of Sample.

Parameters
  • sample (Sample) – Data sample for augmentation.

  • n (int) – Max number of unique augmented output, default is 5.

  • field (str|list) – Indicate which fields to apply transformations.

  • **kwargs (dict) –

    other auxiliary params.

Returns

list of Sample

classmethod sample_num(x, num)[source]

Get ‘num’ samples from x.

Parameters
  • x (list) – list to sample

  • num (int) – sample number

Returns

max ‘num’ unique samples.

textflint.generation_layer.transformation.COREF.random_repeat.ceil(x, /)

Return the ceiling of x as an Integral.

This is the smallest integer >= x.