textflint.generation_layer.transformation.MRC.add_sent_diverse¶

Add a distractor sentence to penalize MRC model¶

This transformation is based on CoreNLP, which is written in Java; recent releases require Java 1.8+. You need to have Java installed to run CoreNLP.

class textflint.generation_layer.transformation.MRC.add_sent_diverse.AddSentDiverse[source]¶

Bases: textflint.generation_layer.transformation.transformation.Transformation

Generate a distractor before the sentence with answer.

Example:

origin question: Which NFL team represented the AFC at Super Bowl 50?
transform distarctor: The UNICEF team of Kew Gardens represented
    the UNICEF at Champ Bowl 40.

class textflint.generation_layer.transformation.MRC.add_sent_diverse.ConstituencyParse(tag, children=None, word=None, index=None)[source]¶

Bases: object

A CoreNLP constituency parse (or a node in a parse tree).

classmethod from_corenlp(s)[source]¶: Parses the “parse” attribute returned by CoreNLP parse annotator.

classmethod replace_words(tree, new_words)[source]¶: Return a new tree, with new words replacing old ones.

exception textflint.generation_layer.transformation.MRC.add_sent_diverse.FlintError[source]¶

Bases: RuntimeError

Default error thrown by textflint functions. FlintError will be raised if you do not give any error type specification,

class textflint.generation_layer.transformation.MRC.add_sent_diverse.MRCSample(data, origin=None, sample_id=None)[source]¶

Bases: textflint.input_layer.component.sample.sample.Sample

MRC Sample class to hold the mrc data info and provide atomic operations.

STEMMER = <LancasterStemmer>¶

wn = <WordNetCorpusReader in '/home/docs/.cache/textflint/NLTK_DATA/wordnet'>¶

POS_TO_WORDNET = {'JJ': 'a', 'JJR': 'a', 'JJS': 'a', 'NN': 'n'}¶

__init__(data, origin=None, sample_id=None)[source]¶: The sample object for machine reading comprehension task :param dict data: The dict obj that contains data info. :param bool origin: :param int sample_id: sample index

check_data(data)[source]¶: Check whether the input data is legal :param dict data: dict obj that contains data info

is_legal()[source]¶: Validate whether the sample is legal :return: bool

static convert_idx(text, tokens)[source]¶

Get the start and end character idx of tokens in the context

Parameters

text (str) – context text
tokens (list) – context words

Returns

list of spans

load_answers(ans, spans)[source]¶

Get word-level positions of answers

Parameters

ans (dict) – answers dict with character position and text
spans (list) – the start idx and end idx of tokens

get_answers()[source]¶

Get copy of answers

Returns: dict, answers

set_answers_mask()[source]¶: Set the answers with TASK_MASK

load(data)[source]¶

Convert data dict which contains essential information to MRCSample.

Parameters: data (dict) – the dict obj that contains dict info

dump()[source]¶

Convert data dict which contains essential information to MRCSample.

Returns: dict object

delete_field_at_index(field, index)[source]¶

Delete the word seat in del_index.

:param str field:field name :param int|list|slice index: modified scope :return: modified sample

delete_field_at_indices(field, indices)[source]¶

Delete items of given scopes of field value.

Parameters

field (str) – field name
indices (list) – list of int/list/slice, modified scopes

Returns

modified Sample

insert_field_before_indices(field, indices, items)[source]¶

Insert items of multi given scopes before indices of field value at the same time.

Parameters

field (str) – field name
indices (list) – list of int/list/slice, modified scopes
items (list) – inserted items

Returns

modified Sample

insert_field_before_index(field, index, items)[source]¶

Insert item before index of field value.

Parameters

field (str) – field name
index (int) – modified scope
items – inserted item

Returns

modified Sample

insert_field_after_index(field, index, new_item)[source]¶

Insert item after index of field value.

Parameters

field (str) – field name
index (int) – modified scope
new_item – inserted item

Returns

modified Sample

insert_field_after_indices(field, indices, items)[source]¶

Insert items of multi given scopes after indices of field value at the same time.

Parameters

field (str) – field name
indices (list) – list of int/list/slice, modified scopes
items (list) – inserted items

Returns

modified Sample

unequal_replace_field_at_indices(field, indices, rep_items)[source]¶

Replace scope items of field value with rep_items which may not equal with scope.

Parameters

field (str) – field name
indices (list) – list of int/list/slice, modified scopes
rep_items (list) – replace items

Returns

modified sample

static get_answer_position(spans, answer_start, answer_end)[source]¶: Get answer tokens start position and end position

static run_conversion(question, answer, tokens, const_parse)[source]¶

Convert the question and answer to a declarative sentence

Parameters

question (str) – question
answer (str) – answer
tokens (list) – the semantic tag dicts of question
const_parse – the constituency parse of question

Returns

a declarative sentence

convert_answer(answer, sent_tokens, question)[source]¶

Replace the ground truth with fake answer based on specific rules

Parameters

answer (str) – ground truth, str
sent_tokens (list) – sentence dicts, like [{‘word’: ‘Saint’, ‘pos’: ‘NNP’, ‘lemma’: ‘Saint’, ‘ner’: ‘PERSON’}…]
question (str) – question sentence

Return str

fake answer

static alter_sentence(sample, nearby_word_dict=None, pos_tag_dict=None, rules=None)[source]¶

Parameters

sample – sentence dicts, like [{‘word’: ‘Saint’, ‘pos’: ‘NNP’, ‘lemma’: ‘Saint’, ‘ner’: ‘PERSON’}…]
nearby_word_dict – the dictionary to search for nearby words
pos_tag_dict – the dictionary to search for the most frequent pos tags
rules – the rules to alter the sentence

Returns

alter_sentence, alter_sentence dicts

static alter_special(token, **kwargs)[source]¶

Alter special tokens

Parameters

token – the token to alter
kwargs –

Returns

like ‘US’ -> ‘UK’

static alter_wordnet_antonyms(token, **kwargs)[source]¶

Replace words with wordnet antonyms

Parameters

token – the token to replace
kwargs –

Returns

like good -> bad

static alter_wordnet_synonyms(token, **kwargs)[source]¶

Replace words with synonyms

Parameters

token – the token to replace
kwargs –

Returns

like good -> great

static alter_nearby(pos_list, ignore_pos=False, is_ner=False)[source]¶

Alter words based on glove embedding space

Parameters

pos_list – pos tags list
ignore_pos (bool) – whether to match pos tag
is_ner (bool) – indicate ner

Returns

like ‘Mary’ -> ‘Rose’

static alter_entity_type(token, **kwargs)[source]¶

Alter entity

Parameters

token – the word to replace
kwargs –

Returns

like ‘London’ -> ‘Berlin’

static get_answer_tokens(sent_tokens, answer)[source]¶

Extract the pos, ner, lemma tags of answer tokens

Parameters

sent_tokens (list) – a list of dicts
answer (str) – answer

Returns

a list of dicts like [ {‘word’: ‘Saint’, ‘pos’: ‘NNP’, ‘lemma’: ‘Saint’, ‘ner’: ‘PERSON’}, {‘word’: ‘Bernadette’, ‘pos’: ‘NNP’, ‘lemma’: ‘Bernadette’, …}, {‘word’: ‘Soubirous’, ‘pos’: ‘NNP’, ‘lemma’: ‘Soubirous’, …] ]

static ans_entity_full(ner_tag, new_ans)[source]¶

Returns a function that yields new_ans iff every token has |ner_tag|

Parameters

ner_tag (str) – ner tag
new_ans (list) – like [{‘word’: ‘Saint’, ‘pos’: ‘NNP’, ‘lemma’: ‘Saint’, ‘ner’: ‘PERSON’}…]

Returns

fake answer, str

static ans_abbrev(new_ans)[source]¶

Parameters: strnew_ans – answer words
Return str: fake answer

static ans_match_wh(wh_word, new_ans)[source]¶

Returns a function that yields new_ans: if the question starts with |wh_word|

Parameters

wh_word (str) – question word
new_ans (list) – like [{‘word’: ‘Saint’, ‘pos’: ‘NNP’, ‘lemma’: ‘Saint’, ‘ner’: ‘PERSON’}…]

Return str

fake answers,

static ans_pos(pos, new_ans, end=False, add_dt=False)[source]¶

Returns a function that yields new_ans if the first/last token has |pos|

Parameters

pos (str) – pos tag
new_ans (list) – like [{‘word’: ‘Saint’, ‘pos’: ‘NNP’, ‘lemma’: ‘Saint’, ‘ner’: ‘PERSON’}…]
end (bool) – whether to use the last word to match the pos tag
add_dt (bool) – whether to add a determiner

Return str

fake answer

static read_const_parse(parse_str)[source]¶: Construct a constituency tree based on constituency parser

static fix_style(s)[source]¶: Minor, general style fixes for questions.

class textflint.generation_layer.transformation.MRC.add_sent_diverse.Transformation(**kwargs)[source]¶

Bases: abc.ABC

An abstract class for transforming a sequence of text to produce a list of potential adversarial example.

processor = <textflint.common.preprocess.en_processor.EnProcessor object>¶

transform(sample, n=1, field='x', **kwargs)[source]¶

Transform data sample to a list of Sample.

Parameters

sample (Sample) – Data sample for augmentation.
n (int) – Max number of unique augmented output, default is 5.
field (str|list) – Indicate which fields to apply transformations.
**kwargs (dict) –
other auxiliary params.

Returns

list of Sample

classmethod sample_num(x, num)[source]¶

Get ‘num’ samples from x.

Parameters

x (list) – list to sample
num (int) – sample number

Returns

max ‘num’ unique samples.