AttackRecipe¶

AttackRecipe aims to find a perturbation of an input text satisfies the attack’s goal to fool the given FlintModel. In contrast to Transformation, AttackRecipe requires the prediction scores of the target model. textflint provides an interface to integrate the easy-to-use adversarial attack recipes implemented based on textattack. Users can refer to textattack for more information about the supported AttackRecipe. This section provides a brief introduction to how to use AttackRecipe in textflint.

Using an `AttackRecipe` based on `textattack`¶

Define a list of AttackRecipe in a python file without defining the specific victim model. For example, we create a attack_ins.py file with the following commands:

from textattack.goal_functions import UntargetedClassification
from textattack.search_methods import GreedySearch
from textattack.constraints.pre_transformation import RepeatModification, StopwordModification
from textattack.transformations import WordSwapWordNet
from textflint.generation_layer.attack import Attack # Note that here we use the Attack from textflint

# Define the goal function class
goal_function = UntargetedClassification
# We'll constrain modification of already modified indices and stopwords
constraints = [RepeatModification(),
               StopwordModification()]
# We're going to use WordSwapWordNet as the attack transformation.
transformation = WordSwapWordNet()
# We'll use the Greedy search method
search_method = GreedySearch()
# Now, let's make the attack from the 4 components:
attack = Attack(goal_function, constraints, transformation, search_method)

# ...
# many attacks form an attack list
attacks = [attack]

Define the path of above file in the config json file. For example, the config file SA.json might look as follows:

{
  "task": "SA",
  "max_trans": 1,
  "fields": "x",
  "return_unk": true,
  "trans_config": {},
  "trans_methods": [],
  "sub_methods": [],
  "attack_methods": "/home/yjc/codes/attack_demo/attack_ins.py" //path to attack_ins.py
}

Load the SA test dataset:

[1]:

from textflint.input_layer.model.test_model.model_helper import data_loader_csv
from textflint.common.utils.install import download_if_needed
test_data_set = data_loader_csv(download_if_needed('DATASET/sa_test.csv'))

Create your own modelwrapper that implementing the function evaluate and encode. More details can be found in the modelwrapper tutorial.

[ ]:

from textflint.input_layer.model.flint_model.textcnn_torch import TextCNNTorch
model = TextCNNTorch()

Feeding the dataset test_data_set, output path out_dir_path, config file config and model textcnn_wrapper to the SA engine, and run it! textflint will automatically scan the attack_ins.py file and load the attacks inside.

[ ]:

from textflint.engine import Engine
from textflint.input_layer.config.config import Config

config = Config.from_json_file('/home/yjc/codes/attack_demo/SA.json')
out_dir_path = '/home/yjc/codes/attack_demo/test_result'

engine = Engine('SA')
engine.run(test_data_set, out_dir_path, config, model)

The adverisial samples based on the AttackRecipe will be also automatically saved to the directory out_dir_path, and we can take a quick look at the contents:

[ ]:

with open('/home/yjc/codes/attack_demo/test_result/ori_(Search_GreedySearch)_(Goal_UntargetedClassification)_(Trans_WordSwapWordNet)_(Cons_RepeatModification_StopwordModification)_2702.json', 'r') as f:
    for ex in f.readlines()[:2]:
        print("original: ", ex)

with open('/home/yjc/codes/attack_demo/test_result/trans_(Search_GreedySearch)_(Goal_UntargetedClassification)_(Trans_WordSwapWordNet)_(Cons_RepeatModification_StopwordModification)_2702.json', 'r') as f:
    for ex in f.readlines()[:2]:
        print("transformed: ", ex)

Conclusion¶

In this tutorial, we briefly describe how to use textattack’s AttackRecipe to generate adverisial samples. We also support loading multiple attacks at once and executing them all by simply runing the engine.

[ ]:

AttackRecipe¶

Using an AttackRecipe based on textattack¶

Conclusion¶

Using an `AttackRecipe` based on `textattack`¶