textflint.generation_layer.generator.cws_generator¶
-
class
textflint.generation_layer.generator.cws_generator.
CWSGenerator
(task='CWS', fields='x', max_trans=1, trans_methods=None, trans_config=None, return_unk=True, sub_methods=None, sub_config=None, attack_methods=None, validate_methods=None, **kwargs)[source]¶ Bases:
textflint.generation_layer.generator.generator.Generator
CWS Generator aims to apply CWS data generation function.
-
class
textflint.generation_layer.generator.cws_generator.
Generator
(task='UT', max_trans=1, random_seed=1, fields='x', trans_methods=None, trans_config=None, return_unk=True, sub_methods=None, sub_config=None, attack_methods=None, validate_methods=None, **kwargs)[source]¶ Bases:
abc.ABC
Transformation controller which applies multi transformations to each data sample.
-
__init__
(task='UT', max_trans=1, random_seed=1, fields='x', trans_methods=None, trans_config=None, return_unk=True, sub_methods=None, sub_config=None, attack_methods=None, validate_methods=None, **kwargs)[source]¶ - Parameters
task (str) – Indicate which task of your transformation data.
max_trans (int) – Maximum transformed samples generate by one original sample pre Transformation.
random_seed (int) – random number seed to reproduce generation.
fields (str|list) – Indicate which fields to apply transformations. Multi fields transform just for some special task, like: SM、NLI.
trans_methods (list) – list of transformations’ name.
trans_config (dict) – transformation class configs, useful to control the behavior of transformations.
return_unk (bool) – Some transformation may generate unk labels, s.t. insert a word to a sequence in NER task. If set False, would skip these transformations.
sub_methods (list) – list of subpopulations’ name.
sub_config (dict) – subpopulation class configs, useful to control the behavior of subpopulation.
attack_methods (str) – path to the python file containing the Attack instances.
validate_methods (list) – confidence calculate functions.
-
prepare
(dataset)[source]¶ Check dataset
- Parameters
dataset (textflint.Dataset) – the input dataset
-
generate
(dataset, model=None)[source]¶ Returns a list of possible generated samples for
dataset
.- Parameters
dataset (textflint.Dataset) – the input dataset
model (textflint.FlintModel) – the model to attack if given.
- Returns
yield (original samples, new samples, generated function string).
-
generate_by_transformations
(dataset, **kwargs)[source]¶ Generate samples by a list of transformation methods.
- Parameters
dataset – the input dataset
- Returns
(original samples, new samples, generated function string)
-