textflint.generation_layer.transformation.MRC.add_sent_diverse

Add a distractor sentence to penalize MRC model

This transformation is based on CoreNLP, which is written in Java; recent releases require Java 1.8+. You need to have Java installed to run CoreNLP.

class textflint.generation_layer.transformation.MRC.add_sent_diverse.AddSentDiverse[source]

Bases: textflint.generation_layer.transformation.transformation.Transformation

Generate a distractor before the sentence with answer.

Example:

origin question: Which NFL team represented the AFC at Super Bowl 50?
transform distarctor: The UNICEF team of Kew Gardens represented
    the UNICEF at Champ Bowl 40.
class textflint.generation_layer.transformation.MRC.add_sent_diverse.ConstituencyParse(tag, children=None, word=None, index=None)[source]

Bases: object

A CoreNLP constituency parse (or a node in a parse tree).

Word-level constituents have |word| and |index| set and no children. Phrase-level constituents have no |word| or |index| and have at least one child.

classmethod from_corenlp(s)[source]

Parses the “parse” attribute returned by CoreNLP parse annotator.

classmethod replace_words(tree, new_words)[source]

Return a new tree, with new words replacing old ones.

exception textflint.generation_layer.transformation.MRC.add_sent_diverse.FlintError[source]

Bases: RuntimeError

Default error thrown by textflint functions. FlintError will be raised if you do not give any error type specification,

class textflint.generation_layer.transformation.MRC.add_sent_diverse.MRCSample(data, origin=None, sample_id=None)[source]

Bases: textflint.input_layer.component.sample.sample.Sample

MRC Sample class to hold the mrc data info and provide atomic operations.

STEMMER = <LancasterStemmer>
wn = <WordNetCorpusReader in '/home/docs/.cache/textflint/NLTK_DATA/wordnet'>
POS_TO_WORDNET = {'JJ': 'a', 'JJR': 'a', 'JJS': 'a', 'NN': 'n'}
__init__(data, origin=None, sample_id=None)[source]

The sample object for machine reading comprehension task :param dict data: The dict obj that contains data info. :param bool origin: :param int sample_id: sample index

check_data(data)[source]

Check whether the input data is legal :param dict data: dict obj that contains data info

Validate whether the sample is legal :return: bool

static convert_idx(text, tokens)[source]

Get the start and end character idx of tokens in the context

Parameters
  • text (str) – context text

  • tokens (list) – context words

Returns

list of spans

load_answers(ans, spans)[source]

Get word-level positions of answers

Parameters
  • ans (dict) – answers dict with character position and text

  • spans (list) – the start idx and end idx of tokens

get_answers()[source]

Get copy of answers

Returns

dict, answers

set_answers_mask()[source]

Set the answers with TASK_MASK

load(data)[source]

Convert data dict which contains essential information to MRCSample.

Parameters

data (dict) – the dict obj that contains dict info

dump()[source]

Convert data dict which contains essential information to MRCSample.

Returns

dict object

delete_field_at_index(field, index)[source]

Delete the word seat in del_index.

:param str field:field name :param int|list|slice index: modified scope :return: modified sample

delete_field_at_indices(field, indices)[source]

Delete items of given scopes of field value.

Parameters
  • field (str) – field name

  • indices (list) – list of int/list/slice, modified scopes

Returns

modified Sample

insert_field_before_indices(field, indices, items)[source]

Insert items of multi given scopes before indices of field value at the same time.

Parameters
  • field (str) – field name

  • indices (list) – list of int/list/slice, modified scopes

  • items (list) – inserted items

Returns

modified Sample

insert_field_before_index(field, index, items)[source]

Insert item before index of field value.

Parameters
  • field (str) – field name

  • index (int) – modified scope

  • items – inserted item

Returns

modified Sample

insert_field_after_index(field, index, new_item)[source]

Insert item after index of field value.

Parameters
  • field (str) – field name

  • index (int) – modified scope

  • new_item – inserted item

Returns

modified Sample

insert_field_after_indices(field, indices, items)[source]

Insert items of multi given scopes after indices of field value at the same time.

Parameters
  • field (str) – field name

  • indices (list) – list of int/list/slice, modified scopes

  • items (list) – inserted items

Returns

modified Sample

unequal_replace_field_at_indices(field, indices, rep_items)[source]

Replace scope items of field value with rep_items which may not equal with scope.

Parameters
  • field (str) – field name

  • indices (list) – list of int/list/slice, modified scopes

  • rep_items (list) – replace items

Returns

modified sample

static get_answer_position(spans, answer_start, answer_end)[source]

Get answer tokens start position and end position

static run_conversion(question, answer, tokens, const_parse)[source]

Convert the question and answer to a declarative sentence

Parameters
  • question (str) – question

  • answer (str) – answer

  • tokens (list) – the semantic tag dicts of question

  • const_parse – the constituency parse of question

Returns

a declarative sentence

convert_answer(answer, sent_tokens, question)[source]

Replace the ground truth with fake answer based on specific rules

Parameters
  • answer (str) – ground truth, str

  • sent_tokens (list) – sentence dicts, like [{‘word’: ‘Saint’, ‘pos’: ‘NNP’, ‘lemma’: ‘Saint’, ‘ner’: ‘PERSON’}…]

  • question (str) – question sentence

Return str

fake answer

static alter_sentence(sample, nearby_word_dict=None, pos_tag_dict=None, rules=None)[source]
Parameters
  • sample – sentence dicts, like [{‘word’: ‘Saint’, ‘pos’: ‘NNP’, ‘lemma’: ‘Saint’, ‘ner’: ‘PERSON’}…]

  • nearby_word_dict – the dictionary to search for nearby words

  • pos_tag_dict – the dictionary to search for the most frequent pos tags

  • rules – the rules to alter the sentence

Returns

alter_sentence, alter_sentence dicts

static alter_special(token, **kwargs)[source]

Alter special tokens

Parameters
  • token – the token to alter

  • kwargs

Returns

like ‘US’ -> ‘UK’

static alter_wordnet_antonyms(token, **kwargs)[source]

Replace words with wordnet antonyms

Parameters
  • token – the token to replace

  • kwargs

Returns

like good -> bad

static alter_wordnet_synonyms(token, **kwargs)[source]

Replace words with synonyms

Parameters
  • token – the token to replace

  • kwargs

Returns

like good -> great

static alter_nearby(pos_list, ignore_pos=False, is_ner=False)[source]

Alter words based on glove embedding space

Parameters
  • pos_list – pos tags list

  • ignore_pos (bool) – whether to match pos tag

  • is_ner (bool) – indicate ner

Returns

like ‘Mary’ -> ‘Rose’

static alter_entity_type(token, **kwargs)[source]

Alter entity

Parameters
  • token – the word to replace

  • kwargs

Returns

like ‘London’ -> ‘Berlin’

static get_answer_tokens(sent_tokens, answer)[source]

Extract the pos, ner, lemma tags of answer tokens

Parameters
  • sent_tokens (list) – a list of dicts

  • answer (str) – answer

Returns

a list of dicts like [ {‘word’: ‘Saint’, ‘pos’: ‘NNP’, ‘lemma’: ‘Saint’, ‘ner’: ‘PERSON’}, {‘word’: ‘Bernadette’, ‘pos’: ‘NNP’, ‘lemma’: ‘Bernadette’, …}, {‘word’: ‘Soubirous’, ‘pos’: ‘NNP’, ‘lemma’: ‘Soubirous’, …] ]

static ans_entity_full(ner_tag, new_ans)[source]

Returns a function that yields new_ans iff every token has |ner_tag|

Parameters
  • ner_tag (str) – ner tag

  • new_ans (list) – like [{‘word’: ‘Saint’, ‘pos’: ‘NNP’, ‘lemma’: ‘Saint’, ‘ner’: ‘PERSON’}…]

Returns

fake answer, str

static ans_abbrev(new_ans)[source]
Parameters

strnew_ans – answer words

Return str

fake answer

static ans_match_wh(wh_word, new_ans)[source]
Returns a function that yields new_ans

if the question starts with |wh_word|

Parameters
  • wh_word (str) – question word

  • new_ans (list) – like [{‘word’: ‘Saint’, ‘pos’: ‘NNP’, ‘lemma’: ‘Saint’, ‘ner’: ‘PERSON’}…]

Return str

fake answers,

static ans_pos(pos, new_ans, end=False, add_dt=False)[source]

Returns a function that yields new_ans if the first/last token has |pos|

Parameters
  • pos (str) – pos tag

  • new_ans (list) – like [{‘word’: ‘Saint’, ‘pos’: ‘NNP’, ‘lemma’: ‘Saint’, ‘ner’: ‘PERSON’}…]

  • end (bool) – whether to use the last word to match the pos tag

  • add_dt (bool) – whether to add a determiner

Return str

fake answer

static read_const_parse(parse_str)[source]

Construct a constituency tree based on constituency parser

static fix_style(s)[source]

Minor, general style fixes for questions.

class textflint.generation_layer.transformation.MRC.add_sent_diverse.Transformation(**kwargs)[source]

Bases: abc.ABC

An abstract class for transforming a sequence of text to produce a list of potential adversarial example.

processor = <textflint.common.preprocess.en_processor.EnProcessor object>
transform(sample, n=1, field='x', **kwargs)[source]

Transform data sample to a list of Sample.

Parameters
  • sample (Sample) – Data sample for augmentation.

  • n (int) – Max number of unique augmented output, default is 5.

  • field (str|list) – Indicate which fields to apply transformations.

  • **kwargs (dict) –

    other auxiliary params.

Returns

list of Sample

classmethod sample_num(x, num)[source]

Get ‘num’ samples from x.

Parameters
  • x (list) – list to sample

  • num (int) – sample number

Returns

max ‘num’ unique samples.