Validator

Validator verifies the quality of samples generated by Transformation and AttackRecipe. Here we briefly describe how to use the built-in Validator.

Using the built-in Validator SentenceEncoding

[1]:
from textflint.generation_layer.validator.sentence_encoding import SentenceEncoding
from textflint.input_layer.dataset.dataset import Dataset
from textflint.input_layer.component.sample.sa_sample import SASample


# 1. Define an original sentence.
ori_sentence = 'There is a book on the desk .'
ori_sample = SASample({'x': ori_sentence, 'y': '1'})

# 2. Define some transformed sentence randomly.
trans_sentences = ['There is a book on the desk .',
                   'There is a book on the floor .',
                   'There is a cookie on the desk .',
                   'There is a desk on the book .',
                   'There desk a on the is a book .']

# 3. Feed the sample to Dataset
ori_dataset = Dataset('SA')
trans_dataset = Dataset('SA')
ori_dataset.append(ori_sample, sample_id=0)
for trans_sentence in trans_sentences:
    trans_dataset.append(
        SASample({'x': trans_sentence, 'y': '1'}), sample_id=0)

# 4. Run the SentenceEncoding Validator
score = SentenceEncoding(ori_dataset, trans_dataset, 'x').score
print(score)






[1.0, 0.9341484308242798, 0.8436124324798584, 0.9751133322715759, 0.8803119659423828]

Using the SentenceEncoding to validate BreadWordSwap

Recall that we defined a Transformation BreadWordSwap in the transformation tutorial, and here we can use SentenceEncoding to simply test whether the tranformed data is semantically smooth.

[2]:
from textflint.generation_layer.transformation import WordSubstitute

class BreadWordSwap(WordSubstitute):
    r"""
    Word Swap by randomly swaping words with bread.

    """
    def _get_candidates(self, word, pos=None, n=1):
        r"""
        Returns a list containing apple.

        :param word: str, the word to replace
        :param pos: str, the pos of the word to replace
        :param n: the number of returned words
        :return: a candidates list
        """
        return ['bread']

Let’s first look at the semantic fluidity by replacing one word to bread:

[3]:
trans = BreadWordSwap(trans_min=1) # swap 1 word
trans_sample = trans.transform(ori_sample)
print(trans_sample[0].dump())

trans_dataset = Dataset('SA')
trans_dataset.append(trans_sample[0])
score = SentenceEncoding(ori_dataset, trans_dataset, 'x').score
print(score)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-b30d3ff8660e> in <module>
----> 1 trans = BreadWordSwap(trans_min=1) # swap 1 word
      2 trans_sample = trans.transform(ori_sample)
      3 print(trans_sample[0].dump())
      4
      5 trans_dataset = Dataset('SA')

TypeError: Can't instantiate abstract class BreadWordSwap with abstract methods skip_aug

Then, we further take a look at the semantic fluidity by replacing three words:

[ ]:
trans = BreadWordSwap(trans_min=3) # swap 3 word
trans_sample = trans.transform(ori_sample)
print(trans_sample[0].dump())

trans_dataset = Dataset('SA')
trans_dataset.append(trans_sample[0])
score = SentenceEncoding(ori_dataset, trans_dataset, 'x').score
print(score)

Conclusion

In this tutorial, we show that semantic fluency becomes lower when the number of substitution words becomes more numerous, which explains the need of using Validator to filter lower score samples.

[ ]: