textflint.generation_layer.generator.dp_generator¶
DP Generator Class¶
-
class
textflint.generation_layer.generator.dp_generator.
DPGenerator
(task='DP', max_trans=1, fields='x', trans_methods=None, trans_config=None, return_unk=True, sub_methods=None, sub_config=None, attack_methods=None, validate_methods=None, **kwargs)[source]¶ Bases:
textflint.generation_layer.generator.generator.Generator
Dependency Parsing Generator aims to apply Dependency Parsing data generation function.
-
class
textflint.generation_layer.generator.dp_generator.
DPSample
(data, origin=None, sample_id=None)[source]¶ Bases:
textflint.input_layer.component.sample.sample.Sample
DP Sample class to hold the data info and provide atomic operations.
-
load
(data)[source]¶ Convert data dict to DPSample and get matched brackets.
- Parameters
data (dict) – contains ‘word’, ‘postag’, ‘head’, ‘deprel’ keys.
-
insert_field_after_indices
(field, indices, items)[source]¶ Insert items of multi given scopes before indices of field value at the same time.
- Parameters
field (str) – Only value ‘x’ supported.
indices (list) – shape:indices_num
items (list) – shape: indices_num, correspond to indices
- Return ~DPSample
The sentence with words added.
-
insert_field_after_index
(field, ins_index, new_item)[source]¶ Insert given data after the given index.
- Parameters
field (str) – Only value ‘x’ supported.
ins_index (int) – The index where the word will be inserted after.
new_item (str) – The word to be inserted.
- Return ~DPSample
The sentence with one word added.
-
insert_field_before_indices
(field, indices, items)[source]¶ Insert items of multi given scopes before indices of field value at the same time.
- Parameters
field (str) – Only value ‘x’ supported.
indices (list) – shape:indices_num
items (list) – shape: indices_num, correspond to indices
- Return ~DPSample
The sentence with words added.
-
insert_field_before_index
(field, ins_index, new_item)[source]¶ Insert given data before the given position.
- Parameters
field (str) – Only value ‘x’ supported.
ins_index (int) – The index where the word will be inserted after.
new_item (str) – The word to be inserted.
- Return ~DPSample
The sentence with one word added.
-
delete_field_at_indices
(field, indices)[source]¶ Delete items of given scopes of field value.
- Parameters
field (str) – Only value ‘x’ supported.
indices (list) –
shape:indices_num each index can be int indicate replace single item or their list
like [1, 2, 3],
- can be list like (0,3) indicate replace items
from 0 to 3(not included),
can be slice which would be convert to list.
- Return ~DPSample
The sentence with words deleted.
-
delete_field_at_index
(field, del_index)[source]¶ Delete data at the given position.
- Parameters
field (str) – Only value ‘x’ supported.
del_index (int|list|slice) –
- can be int indicate replace single item or their list
like [1, 2, 3],
- can be list like (0,3) indicate replace items
from 0 to 3(not included),
can be slice which would be convert to list.
- Return ~DPSample
The sentence with one word deleted.
-
-
class
textflint.generation_layer.generator.dp_generator.
Generator
(task='UT', max_trans=1, random_seed=1, fields='x', trans_methods=None, trans_config=None, return_unk=True, sub_methods=None, sub_config=None, attack_methods=None, validate_methods=None, **kwargs)[source]¶ Bases:
abc.ABC
Transformation controller which applies multi transformations to each data sample.
-
__init__
(task='UT', max_trans=1, random_seed=1, fields='x', trans_methods=None, trans_config=None, return_unk=True, sub_methods=None, sub_config=None, attack_methods=None, validate_methods=None, **kwargs)[source]¶ - Parameters
task (str) – Indicate which task of your transformation data.
max_trans (int) – Maximum transformed samples generate by one original sample pre Transformation.
random_seed (int) – random number seed to reproduce generation.
fields (str|list) – Indicate which fields to apply transformations. Multi fields transform just for some special task, like: SM、NLI.
trans_methods (list) – list of transformations’ name.
trans_config (dict) – transformation class configs, useful to control the behavior of transformations.
return_unk (bool) – Some transformation may generate unk labels, s.t. insert a word to a sequence in NER task. If set False, would skip these transformations.
sub_methods (list) – list of subpopulations’ name.
sub_config (dict) – subpopulation class configs, useful to control the behavior of subpopulation.
attack_methods (str) – path to the python file containing the Attack instances.
validate_methods (list) – confidence calculate functions.
-
prepare
(dataset)[source]¶ Check dataset
- Parameters
dataset (textflint.Dataset) – the input dataset
-
generate
(dataset, model=None)[source]¶ Returns a list of possible generated samples for
dataset
.- Parameters
dataset (textflint.Dataset) – the input dataset
model (textflint.FlintModel) – the model to attack if given.
- Returns
yield (original samples, new samples, generated function string).
-
generate_by_transformations
(dataset, **kwargs)[source]¶ Generate samples by a list of transformation methods.
- Parameters
dataset – the input dataset
- Returns
(original samples, new samples, generated function string)
-