Validator¶
Validator
verifies the quality of samples generated by Transformation
and AttackRecipe
. Here we briefly describe how to use the built-in Validator
.
Using the built-in Validator SentenceEncoding
¶
[1]:
from textflint.generation_layer.validator.sentence_encoding import SentenceEncoding
from textflint.input_layer.dataset.dataset import Dataset
from textflint.input_layer.component.sample.sa_sample import SASample
# 1. Define an original sentence.
ori_sentence = 'There is a book on the desk .'
ori_sample = SASample({'x': ori_sentence, 'y': '1'})
# 2. Define some transformed sentence randomly.
trans_sentences = ['There is a book on the desk .',
'There is a book on the floor .',
'There is a cookie on the desk .',
'There is a desk on the book .',
'There desk a on the is a book .']
# 3. Feed the sample to Dataset
ori_dataset = Dataset('SA')
trans_dataset = Dataset('SA')
ori_dataset.append(ori_sample, sample_id=0)
for trans_sentence in trans_sentences:
trans_dataset.append(
SASample({'x': trans_sentence, 'y': '1'}), sample_id=0)
# 4. Run the SentenceEncoding Validator
score = SentenceEncoding(ori_dataset, trans_dataset, 'x').score
print(score)
[1.0, 0.9341484308242798, 0.8436124324798584, 0.9751133322715759, 0.8803119659423828]
Using the SentenceEncoding
to validate BreadWordSwap
¶
Recall that we defined a Transformation
BreadWordSwap
in the transformation tutorial, and here we can use SentenceEncoding
to simply test whether the tranformed data is semantically smooth.
[2]:
from textflint.generation_layer.transformation import WordSubstitute
class BreadWordSwap(WordSubstitute):
r"""
Word Swap by randomly swaping words with bread.
"""
def _get_candidates(self, word, pos=None, n=1):
r"""
Returns a list containing apple.
:param word: str, the word to replace
:param pos: str, the pos of the word to replace
:param n: the number of returned words
:return: a candidates list
"""
return ['bread']
Let’s first look at the semantic fluidity by replacing one word to bread
:
[3]:
trans = BreadWordSwap(trans_min=1) # swap 1 word
trans_sample = trans.transform(ori_sample)
print(trans_sample[0].dump())
trans_dataset = Dataset('SA')
trans_dataset.append(trans_sample[0])
score = SentenceEncoding(ori_dataset, trans_dataset, 'x').score
print(score)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-3-b30d3ff8660e> in <module>
----> 1 trans = BreadWordSwap(trans_min=1) # swap 1 word
2 trans_sample = trans.transform(ori_sample)
3 print(trans_sample[0].dump())
4
5 trans_dataset = Dataset('SA')
TypeError: Can't instantiate abstract class BreadWordSwap with abstract methods skip_aug
Then, we further take a look at the semantic fluidity by replacing three
words:
[ ]:
trans = BreadWordSwap(trans_min=3) # swap 3 word
trans_sample = trans.transform(ori_sample)
print(trans_sample[0].dump())
trans_dataset = Dataset('SA')
trans_dataset.append(trans_sample[0])
score = SentenceEncoding(ori_dataset, trans_dataset, 'x').score
print(score)
Conclusion¶
In this tutorial, we show that semantic fluency becomes lower when the number of substitution words becomes more numerous, which explains the need of using Validator
to filter lower score samples.
[ ]: