textflint.generation_layer.transformation.RE.swap_ent¶

EntitySwap class for entity swap

class textflint.generation_layer.transformation.RE.swap_ent.SwapEnt(type='lowfreq', **kwargs)[source]¶

Bases: textflint.generation_layer.transformation.transformation.Transformation

Replace entity mention with entity with same entity types

replace_en(types, index, token)[source]¶

replace entity with random token span

Parameters

types (str) – entity type
index (list) – entity index [start, end]
token (list) – tokenized sentence

Return Tuple(list, int)

new sentence and

number of new entity words greater than old entity words

subj_and_obj_transform(sample, n, entity)[source]¶

transform both subject and object entities

Parameters

sample (RESample) – re_sample input
n (int) – number of generated samples

Return list

transformed sample list

single_transform(sample, n, entity)[source]¶

transform subject or object entity

Parameters

sample (RESample) – re_sample input
n (int) – number of generated samples

Return list

transformed sample list

class textflint.generation_layer.transformation.RE.swap_ent.RESample(data, origin=None, sample_id=None)[source]¶

Bases: textflint.input_layer.component.sample.sample.Sample

transform and retrieve features of RESample

check_data(data)[source]¶

check whether type of data is correct

Parameters: data (dict) – data dict containing ‘x’, ‘subj’, ‘obj’ and ‘y’

is_legal()[source]¶: Validate whether the sample is legal

get_sent_ids()[source]¶

Generate sentence ID

Returns: string: sentence ID

load(data)[source]¶

Convert data dict which contains essential information to SASample.

Params: dict data: contains ‘token’, ‘subj’ ,’obj’, ‘relation’ keys.

get_dp()[source]¶

get dependency parsing

Return Tuple(list, list): dependency tag of sentence and head of sentence

get_en()[source]¶

get entity index

Return Tuple(int, int, int, int): start index of subject entity, end index of subject entity, start index of object entity and end index of object entity

get_type()[source]¶

get entity type

Return Tuple(string, string): entity type of subject and entity type of object

get_sent()[source]¶

get tokenized sentence

Return Tuple(list, string): tokenized sentence and relation

delete_field_at_indices(field, indices)[source]¶

delete word of given indices in sentence

Parameters

field (string) – field to be operated on
indices (list) – a list of index to be deleted

Return dict

contains ‘token’, ‘subj’ ,’obj’ keys

insert_field_after_indices(field, indices, new_item)[source]¶

insert word before given indices in sentence

Parameters

field (string) – field to be operated on
indices (list) – a list of index to be inserted
new_item (list) – list of items to be inserted

Return dict

contains ‘token’, ‘subj’ ,’obj’ keys

insert_field_before_indices(field, indices, new_item)[source]¶

insert word after given indices in sentence

Parameters

field (string) – field to be operated on
indices (list) – a list of index to be inserted
new_item (list) – list of items to be inserted

Return dict

contains ‘token’, ‘subj’ ,’obj’ keys

replace_sample_fields(data)[source]¶

replace sample fields for RE transformation

Parameters: data (dict) – contains transformed x, subj, obj keys
Return RESample: transformed sample

stan_ner_transform()[source]¶

Generate ner list

Return list: ner tags

get_pos()[source]¶

get pos tagging of sentence

Return list: pos tags

dump()[source]¶

output data sample

Return dict: containing x, subj, obj, y and sample_id

class textflint.generation_layer.transformation.RE.swap_ent.Transformation(**kwargs)[source]¶

Bases: abc.ABC

An abstract class for transforming a sequence of text to produce a list of potential adversarial example.

processor = <textflint.common.preprocess.en_processor.EnProcessor object>¶

transform(sample, n=1, field='x', **kwargs)[source]¶

Transform data sample to a list of Sample.

Parameters

sample (Sample) – Data sample for augmentation.
n (int) – Max number of unique augmented output, default is 5.
field (str|list) – Indicate which fields to apply transformations.
**kwargs (dict) –
other auxiliary params.

Returns

list of Sample

classmethod sample_num(x, num)[source]¶

Get ‘num’ samples from x.

Parameters

x (list) – list to sample
num (int) – sample number

Returns

max ‘num’ unique samples.

textflint.generation_layer.transformation.RE.swap_ent.download_if_needed(folder_name)[source]¶

Folder name will be saved as .cache/textflint/[folder_name]. If it doesn’t exist on disk, the zip file will be downloaded and extracted.

Parameters: folder_name (str) – path to folder or file in cache
Returns: path to the downloaded folder or file on disk