Transformation

In order to verify the robustness comprehensively, textflint offers 20 universal transformations and 60 task-specific transformations, covering 12 NLP tasks. The full list of Transformations can be found in our website or github

How to use a built-in Transformation

textflint offers multiple Transformations for each task. Here, we give an example on how to use the AddSum Transformation on Sentiment Analysis (SA) task.

[1]:
# 1. Import the Sample for SA task and the transformation AddSum
from textflint.input_layer.component.sample.sa_sample import SASample
from textflint.generation_layer.transformation.SA.add_sum import AddSum

# 2. Define the SA data sample.
data = {'x': "Brilliant and moving performances by Tom Courtenay and Peter Finch",
        'y': 'positive'}
sa_sample = SASample(data)

# 3. Define the parameter of AddSum, here we only add summary for each person.
trans = AddSum(entity_type='person')

# 4. Transform the sample
trans_sample = trans.transform(sa_sample)
[2]:
trans_sample[0].dump()
[2]:
{'x': 'Brilliant and moving performances by Tom Courtenay (Sir Thomas Daniel Courtenay (/ˈkɔːrtni/; born 25 February 1937) is an English actor of stage and screen. After studying at the Royal Academy of Dramatic Art, Courtenay achieved prominence in the 1960s with a series of acclaimed film roles, including The Loneliness of the Long Distance Runner (1962)\u2060, for which he received the BAFTA Award for Most Promising Newcomer to Leading Film Roles\u2060, and Doctor Zhivago (1965), for which he received an Academy Award nomination for Best Supporting Actor. Other notable film roles during this period include Billy Liar (1963), King and Country (1964), for which he was awarded the Volpi Cup for Best Actor at the Venice Film Festival, King Rat (1965), and The Night of the Generals .) and Peter Finch (Frederick George Peter Ingle Finch (28 September 1916 \xa0 – 14 January 1977) was an English-Australian actor. He is best remembered for his role as crazed television anchorman Howard Beale in the 1976 film Network, which earned him a posthumous Academy Award for Best Actor, his fifth Best Actor award from the British Academy of Film and Television Arts, and a Best Actor award from the Golden Globes. )',
 'y': 'positive',
 'sample_id': None}

We can see that a corresponding description has been added after Tom Courtenay and Peter Finch. This transformation should not change the label y, so it is still positive. The sample_id is used when we transform a dataset, and can be ignored here.

Define your own Transformation

Bread word swap

As an introduction to writing transformations for textflint, we’re going to try a very simple transformation: one that replaces random words of a text with the word ‘bread’. In textflint, there’s an abstract WordSubstitute class that handles the heavy lifting of breaking sentences into words and avoiding replacement of stopwords. We can extend WordSubstitute and implement a single method, _get_candidates, to indicate to replace each word with ‘bread’. 🍞

[3]:
from textflint.generation_layer.transformation import WordSubstitute

class BreadWordSwap(WordSubstitute):
    r"""
    Word Swap by randomly swaping words with bread.

    """
    def _get_candidates(self, word, pos=None, n=1):
        r"""
        Returns a list containing apple.

        :param word: str, the word to replace
        :param pos: str, the pos of the word to replace
        :param n: the number of returned words
        :return: a candidates list
        """
        return ['bread']

    def skip_aug(self, tokens, mask, pos):
        r"""
        Returns the index of the replaced tokens.

        :param list tokens: tokenized words or word with pos tag pairs
        :return list: the index of the replaced tokens

        """
        return self.pre_skip_aug(tokens, mask) # here we use the default method which ignores the `pos` value

Try the Freshly-baked BreadWordSwap

Once we have created our own transformation method, we can generate a transformed sample though following commands:

[4]:
trans = BreadWordSwap()
trans_sample = trans.transform(sa_sample)
trans_sample[0].dump()
[4]:
{'x': 'Brilliant and moving bread by Tom Courtenay and Peter Finch',
 'y': 'positive',
 'sample_id': None}

We can see that only one word performances is replaced by bread. We can control the number of replaced words by setting the parameters trans_min, trans_max or trans_p. For example, if you want to get a sample with at least three words are replaced by bread, you can set trans_min to 3 as follows:

[5]:
trans = BreadWordSwap(trans_min=3)
trans_sample = trans.transform(sa_sample)
trans_sample[0].dump()
[5]:
{'x': 'Brilliant and moving performances by bread bread and bread Finch',
 'y': 'positive',
 'sample_id': None}

Conclusion

In this tutorial, we show how to use a built-in Transformation AddSum and define our own Transformation BreadWordSwap. You also learned how to control the number of transformed words. In fact, you may also find that replacing three words in BreadWordSwap makes the result sample meaningless and hard-to-read. Our next section will show you how to use Validator to filter out sample that are useless.

[ ]: