textflint.generation_layer.transformation.MRC.add_sent_diverse¶
Add a distractor sentence to penalize MRC model¶
This transformation is based on CoreNLP, which is written in Java; recent releases require Java 1.8+. You need to have Java installed to run CoreNLP.
-
class
textflint.generation_layer.transformation.MRC.add_sent_diverse.AddSentDiverse[source]¶ Bases:
textflint.generation_layer.transformation.transformation.TransformationGenerate a distractor before the sentence with answer.
Example:
origin question: Which NFL team represented the AFC at Super Bowl 50? transform distarctor: The UNICEF team of Kew Gardens represented the UNICEF at Champ Bowl 40.
-
class
textflint.generation_layer.transformation.MRC.add_sent_diverse.ConstituencyParse(tag, children=None, word=None, index=None)[source]¶ Bases:
objectA CoreNLP constituency parse (or a node in a parse tree).
Word-level constituents have |word| and |index| set and no children. Phrase-level constituents have no |word| or |index| and have at least one child.
-
exception
textflint.generation_layer.transformation.MRC.add_sent_diverse.FlintError[source]¶ Bases:
RuntimeErrorDefault error thrown by textflint functions. FlintError will be raised if you do not give any error type specification,
-
class
textflint.generation_layer.transformation.MRC.add_sent_diverse.MRCSample(data, origin=None, sample_id=None)[source]¶ Bases:
textflint.input_layer.component.sample.sample.SampleMRC Sample class to hold the mrc data info and provide atomic operations.
-
STEMMER= <LancasterStemmer>¶
-
wn= <WordNetCorpusReader in '/home/docs/.cache/textflint/NLTK_DATA/wordnet'>¶
-
POS_TO_WORDNET= {'JJ': 'a', 'JJR': 'a', 'JJS': 'a', 'NN': 'n'}¶
-
__init__(data, origin=None, sample_id=None)[source]¶ The sample object for machine reading comprehension task :param dict data: The dict obj that contains data info. :param bool origin: :param int sample_id: sample index
-
check_data(data)[source]¶ Check whether the input data is legal :param dict data: dict obj that contains data info
-
static
convert_idx(text, tokens)[source]¶ Get the start and end character idx of tokens in the context
- Parameters
text (str) – context text
tokens (list) – context words
- Returns
list of spans
-
load_answers(ans, spans)[source]¶ Get word-level positions of answers
- Parameters
ans (dict) – answers dict with character position and text
spans (list) – the start idx and end idx of tokens
-
load(data)[source]¶ Convert data dict which contains essential information to MRCSample.
- Parameters
data (dict) – the dict obj that contains dict info
-
dump()[source]¶ Convert data dict which contains essential information to MRCSample.
- Returns
dict object
-
delete_field_at_index(field, index)[source]¶ Delete the word seat in del_index.
:param str field:field name :param int|list|slice index: modified scope :return: modified sample
-
delete_field_at_indices(field, indices)[source]¶ Delete items of given scopes of field value.
- Parameters
field (str) – field name
indices (list) – list of int/list/slice, modified scopes
- Returns
modified Sample
-
insert_field_before_indices(field, indices, items)[source]¶ Insert items of multi given scopes before indices of field value at the same time.
- Parameters
field (str) – field name
indices (list) – list of int/list/slice, modified scopes
items (list) – inserted items
- Returns
modified Sample
-
insert_field_before_index(field, index, items)[source]¶ Insert item before index of field value.
- Parameters
field (str) – field name
index (int) – modified scope
items – inserted item
- Returns
modified Sample
-
insert_field_after_index(field, index, new_item)[source]¶ Insert item after index of field value.
- Parameters
field (str) – field name
index (int) – modified scope
new_item – inserted item
- Returns
modified Sample
-
insert_field_after_indices(field, indices, items)[source]¶ Insert items of multi given scopes after indices of field value at the same time.
- Parameters
field (str) – field name
indices (list) – list of int/list/slice, modified scopes
items (list) – inserted items
- Returns
modified Sample
-
unequal_replace_field_at_indices(field, indices, rep_items)[source]¶ Replace scope items of field value with rep_items which may not equal with scope.
- Parameters
field (str) – field name
indices (list) – list of int/list/slice, modified scopes
rep_items (list) – replace items
- Returns
modified sample
-
static
get_answer_position(spans, answer_start, answer_end)[source]¶ Get answer tokens start position and end position
-
static
run_conversion(question, answer, tokens, const_parse)[source]¶ Convert the question and answer to a declarative sentence
- Parameters
question (str) – question
answer (str) – answer
tokens (list) – the semantic tag dicts of question
const_parse – the constituency parse of question
- Returns
a declarative sentence
-
convert_answer(answer, sent_tokens, question)[source]¶ Replace the ground truth with fake answer based on specific rules
- Parameters
answer (str) – ground truth, str
sent_tokens (list) – sentence dicts, like [{‘word’: ‘Saint’, ‘pos’: ‘NNP’, ‘lemma’: ‘Saint’, ‘ner’: ‘PERSON’}…]
question (str) – question sentence
- Return str
fake answer
-
static
alter_sentence(sample, nearby_word_dict=None, pos_tag_dict=None, rules=None)[source]¶ - Parameters
sample – sentence dicts, like [{‘word’: ‘Saint’, ‘pos’: ‘NNP’, ‘lemma’: ‘Saint’, ‘ner’: ‘PERSON’}…]
nearby_word_dict – the dictionary to search for nearby words
pos_tag_dict – the dictionary to search for the most frequent pos tags
rules – the rules to alter the sentence
- Returns
alter_sentence, alter_sentence dicts
-
static
alter_special(token, **kwargs)[source]¶ Alter special tokens
- Parameters
token – the token to alter
kwargs –
- Returns
like ‘US’ -> ‘UK’
-
static
alter_wordnet_antonyms(token, **kwargs)[source]¶ Replace words with wordnet antonyms
- Parameters
token – the token to replace
kwargs –
- Returns
like good -> bad
-
static
alter_wordnet_synonyms(token, **kwargs)[source]¶ Replace words with synonyms
- Parameters
token – the token to replace
kwargs –
- Returns
like good -> great
-
static
alter_nearby(pos_list, ignore_pos=False, is_ner=False)[source]¶ Alter words based on glove embedding space
- Parameters
pos_list – pos tags list
ignore_pos (bool) – whether to match pos tag
is_ner (bool) – indicate ner
- Returns
like ‘Mary’ -> ‘Rose’
-
static
alter_entity_type(token, **kwargs)[source]¶ Alter entity
- Parameters
token – the word to replace
kwargs –
- Returns
like ‘London’ -> ‘Berlin’
-
static
get_answer_tokens(sent_tokens, answer)[source]¶ Extract the pos, ner, lemma tags of answer tokens
- Parameters
sent_tokens (list) – a list of dicts
answer (str) – answer
- Returns
a list of dicts like [ {‘word’: ‘Saint’, ‘pos’: ‘NNP’, ‘lemma’: ‘Saint’, ‘ner’: ‘PERSON’}, {‘word’: ‘Bernadette’, ‘pos’: ‘NNP’, ‘lemma’: ‘Bernadette’, …}, {‘word’: ‘Soubirous’, ‘pos’: ‘NNP’, ‘lemma’: ‘Soubirous’, …] ]
-
static
ans_entity_full(ner_tag, new_ans)[source]¶ Returns a function that yields new_ans iff every token has |ner_tag|
- Parameters
ner_tag (str) – ner tag
new_ans (list) – like [{‘word’: ‘Saint’, ‘pos’: ‘NNP’, ‘lemma’: ‘Saint’, ‘ner’: ‘PERSON’}…]
- Returns
fake answer, str
-
static
ans_match_wh(wh_word, new_ans)[source]¶ - Returns a function that yields new_ans
if the question starts with |wh_word|
- Parameters
wh_word (str) – question word
new_ans (list) – like [{‘word’: ‘Saint’, ‘pos’: ‘NNP’, ‘lemma’: ‘Saint’, ‘ner’: ‘PERSON’}…]
- Return str
fake answers,
-
static
ans_pos(pos, new_ans, end=False, add_dt=False)[source]¶ Returns a function that yields new_ans if the first/last token has |pos|
- Parameters
pos (str) – pos tag
new_ans (list) – like [{‘word’: ‘Saint’, ‘pos’: ‘NNP’, ‘lemma’: ‘Saint’, ‘ner’: ‘PERSON’}…]
end (bool) – whether to use the last word to match the pos tag
add_dt (bool) – whether to add a determiner
- Return str
fake answer
-
-
class
textflint.generation_layer.transformation.MRC.add_sent_diverse.Transformation(**kwargs)[source]¶ Bases:
abc.ABCAn abstract class for transforming a sequence of text to produce a list of potential adversarial example.
-
processor= <textflint.common.preprocess.en_processor.EnProcessor object>¶
-
transform(sample, n=1, field='x', **kwargs)[source]¶ Transform data sample to a list of Sample.
- Parameters
sample (Sample) – Data sample for augmentation.
n (int) – Max number of unique augmented output, default is 5.
field (str|list) – Indicate which fields to apply transformations.
**kwargs (dict) –
other auxiliary params.
- Returns
list of Sample
-