textflint.generation_layer.transformation.RE.swap_ent

EntitySwap class for entity swap

class textflint.generation_layer.transformation.RE.swap_ent.SwapEnt(type='lowfreq', **kwargs)[source]

Bases: textflint.generation_layer.transformation.transformation.Transformation

Replace entity mention with entity with same entity types

replace_en(types, index, token)[source]

replace entity with random token span

Parameters
  • types (str) – entity type

  • index (list) – entity index [start, end]

  • token (list) – tokenized sentence

Return Tuple(list, int)

new sentence and

number of new entity words greater than old entity words

subj_and_obj_transform(sample, n, entity)[source]

transform both subject and object entities

Parameters
  • sample (RESample) – re_sample input

  • n (int) – number of generated samples

Return list

transformed sample list

single_transform(sample, n, entity)[source]

transform subject or object entity

Parameters
  • sample (RESample) – re_sample input

  • n (int) – number of generated samples

Return list

transformed sample list

class textflint.generation_layer.transformation.RE.swap_ent.RESample(data, origin=None, sample_id=None)[source]

Bases: textflint.input_layer.component.sample.sample.Sample

transform and retrieve features of RESample

check_data(data)[source]

check whether type of data is correct

Parameters

data (dict) – data dict containing ‘x’, ‘subj’, ‘obj’ and ‘y’

Validate whether the sample is legal

get_sent_ids()[source]

Generate sentence ID

Returns

string: sentence ID

load(data)[source]

Convert data dict which contains essential information to SASample.

Params

dict data: contains ‘token’, ‘subj’ ,’obj’, ‘relation’ keys.

get_dp()[source]

get dependency parsing

Return Tuple(list, list)

dependency tag of sentence and head of sentence

get_en()[source]

get entity index

Return Tuple(int, int, int, int)

start index of subject entity, end index of subject entity, start index of object entity and end index of object entity

get_type()[source]

get entity type

Return Tuple(string, string)

entity type of subject and entity type of object

get_sent()[source]

get tokenized sentence

Return Tuple(list, string)

tokenized sentence and relation

delete_field_at_indices(field, indices)[source]

delete word of given indices in sentence

Parameters
  • field (string) – field to be operated on

  • indices (list) – a list of index to be deleted

Return dict

contains ‘token’, ‘subj’ ,’obj’ keys

insert_field_after_indices(field, indices, new_item)[source]

insert word before given indices in sentence

Parameters
  • field (string) – field to be operated on

  • indices (list) – a list of index to be inserted

  • new_item (list) – list of items to be inserted

Return dict

contains ‘token’, ‘subj’ ,’obj’ keys

insert_field_before_indices(field, indices, new_item)[source]

insert word after given indices in sentence

Parameters
  • field (string) – field to be operated on

  • indices (list) – a list of index to be inserted

  • new_item (list) – list of items to be inserted

Return dict

contains ‘token’, ‘subj’ ,’obj’ keys

replace_sample_fields(data)[source]

replace sample fields for RE transformation

Parameters

data (dict) – contains transformed x, subj, obj keys

Return RESample

transformed sample

stan_ner_transform()[source]

Generate ner list

Return list

ner tags

get_pos()[source]

get pos tagging of sentence

Return list

pos tags

dump()[source]

output data sample

Return dict

containing x, subj, obj, y and sample_id

class textflint.generation_layer.transformation.RE.swap_ent.Transformation(**kwargs)[source]

Bases: abc.ABC

An abstract class for transforming a sequence of text to produce a list of potential adversarial example.

processor = <textflint.common.preprocess.en_processor.EnProcessor object>
transform(sample, n=1, field='x', **kwargs)[source]

Transform data sample to a list of Sample.

Parameters
  • sample (Sample) – Data sample for augmentation.

  • n (int) – Max number of unique augmented output, default is 5.

  • field (str|list) – Indicate which fields to apply transformations.

  • **kwargs (dict) –

    other auxiliary params.

Returns

list of Sample

classmethod sample_num(x, num)[source]

Get ‘num’ samples from x.

Parameters
  • x (list) – list to sample

  • num (int) – sample number

Returns

max ‘num’ unique samples.

textflint.generation_layer.transformation.RE.swap_ent.download_if_needed(folder_name)[source]

Folder name will be saved as .cache/textflint/[folder_name]. If it doesn’t exist on disk, the zip file will be downloaded and extracted.

Parameters

folder_name (str) – path to folder or file in cache

Returns

path to the downloaded folder or file on disk