textflint.input_layer.component.sample.mrc_sample¶

MRC Sample Class¶

Manage text transformation for MRC. Heavily borrowed from adversarial-squad. For code in adversarial-squad, please check the following link: https://github.com/robinjia/adversarial-squad

class textflint.input_layer.component.sample.mrc_sample.MRCSample(data, origin=None, sample_id=None)[source]¶

Bases: textflint.input_layer.component.sample.sample.Sample

MRC Sample class to hold the mrc data info and provide atomic operations.

STEMMER = <LancasterStemmer>¶

wn = <WordNetCorpusReader in '/home/docs/.cache/textflint/NLTK_DATA/wordnet'>¶

POS_TO_WORDNET = {'JJ': 'a', 'JJR': 'a', 'JJS': 'a', 'NN': 'n'}¶

__init__(data, origin=None, sample_id=None)[source]¶: The sample object for machine reading comprehension task :param dict data: The dict obj that contains data info. :param bool origin: :param int sample_id: sample index

check_data(data)[source]¶: Check whether the input data is legal :param dict data: dict obj that contains data info

is_legal()[source]¶: Validate whether the sample is legal :return: bool

static convert_idx(text, tokens)[source]¶

Get the start and end character idx of tokens in the context

Parameters

text (str) – context text
tokens (list) – context words

Returns

list of spans

load_answers(ans, spans)[source]¶

Get word-level positions of answers

Parameters

ans (dict) – answers dict with character position and text
spans (list) – the start idx and end idx of tokens

get_answers()[source]¶

Get copy of answers

Returns: dict, answers

set_answers_mask()[source]¶: Set the answers with TASK_MASK

load(data)[source]¶

Convert data dict which contains essential information to MRCSample.

Parameters: data (dict) – the dict obj that contains dict info

dump()[source]¶

Convert data dict which contains essential information to MRCSample.

Returns: dict object

delete_field_at_index(field, index)[source]¶

Delete the word seat in del_index.

:param str field:field name :param int|list|slice index: modified scope :return: modified sample

delete_field_at_indices(field, indices)[source]¶

Delete items of given scopes of field value.

Parameters

field (str) – field name
indices (list) – list of int/list/slice, modified scopes

Returns

modified Sample

insert_field_before_indices(field, indices, items)[source]¶

Insert items of multi given scopes before indices of field value at the same time.

Parameters

field (str) – field name
indices (list) – list of int/list/slice, modified scopes
items (list) – inserted items

Returns

modified Sample

insert_field_before_index(field, index, items)[source]¶

Insert item before index of field value.

Parameters

field (str) – field name
index (int) – modified scope
items – inserted item

Returns

modified Sample

insert_field_after_index(field, index, new_item)[source]¶

Insert item after index of field value.

Parameters

field (str) – field name
index (int) – modified scope
new_item – inserted item

Returns

modified Sample

insert_field_after_indices(field, indices, items)[source]¶

Insert items of multi given scopes after indices of field value at the same time.

Parameters

field (str) – field name
indices (list) – list of int/list/slice, modified scopes
items (list) – inserted items

Returns

modified Sample

unequal_replace_field_at_indices(field, indices, rep_items)[source]¶

Replace scope items of field value with rep_items which may not equal with scope.

Parameters

field (str) – field name
indices (list) – list of int/list/slice, modified scopes
rep_items (list) – replace items

Returns

modified sample

static get_answer_position(spans, answer_start, answer_end)[source]¶: Get answer tokens start position and end position

static run_conversion(question, answer, tokens, const_parse)[source]¶

Convert the question and answer to a declarative sentence

Parameters

question (str) – question
answer (str) – answer
tokens (list) – the semantic tag dicts of question
const_parse – the constituency parse of question

Returns

a declarative sentence

convert_answer(answer, sent_tokens, question)[source]¶

Replace the ground truth with fake answer based on specific rules

Parameters

answer (str) – ground truth, str
sent_tokens (list) – sentence dicts, like [{‘word’: ‘Saint’, ‘pos’: ‘NNP’, ‘lemma’: ‘Saint’, ‘ner’: ‘PERSON’}…]
question (str) – question sentence

Return str

fake answer

static alter_sentence(sample, nearby_word_dict=None, pos_tag_dict=None, rules=None)[source]¶

Parameters

sample – sentence dicts, like [{‘word’: ‘Saint’, ‘pos’: ‘NNP’, ‘lemma’: ‘Saint’, ‘ner’: ‘PERSON’}…]
nearby_word_dict – the dictionary to search for nearby words
pos_tag_dict – the dictionary to search for the most frequent pos tags
rules – the rules to alter the sentence

Returns

alter_sentence, alter_sentence dicts

static alter_special(token, **kwargs)[source]¶

Alter special tokens

Parameters

token – the token to alter
kwargs –

Returns

like ‘US’ -> ‘UK’

static alter_wordnet_antonyms(token, **kwargs)[source]¶

Replace words with wordnet antonyms

Parameters

token – the token to replace
kwargs –

Returns

like good -> bad

static alter_wordnet_synonyms(token, **kwargs)[source]¶

Replace words with synonyms

Parameters

token – the token to replace
kwargs –

Returns

like good -> great

static alter_nearby(pos_list, ignore_pos=False, is_ner=False)[source]¶

Alter words based on glove embedding space

Parameters

pos_list – pos tags list
ignore_pos (bool) – whether to match pos tag
is_ner (bool) – indicate ner

Returns

like ‘Mary’ -> ‘Rose’

static alter_entity_type(token, **kwargs)[source]¶

Alter entity

Parameters

token – the word to replace
kwargs –

Returns

like ‘London’ -> ‘Berlin’

static get_answer_tokens(sent_tokens, answer)[source]¶

Extract the pos, ner, lemma tags of answer tokens

Parameters

sent_tokens (list) – a list of dicts
answer (str) – answer

Returns

a list of dicts like [ {‘word’: ‘Saint’, ‘pos’: ‘NNP’, ‘lemma’: ‘Saint’, ‘ner’: ‘PERSON’}, {‘word’: ‘Bernadette’, ‘pos’: ‘NNP’, ‘lemma’: ‘Bernadette’, …}, {‘word’: ‘Soubirous’, ‘pos’: ‘NNP’, ‘lemma’: ‘Soubirous’, …] ]

static ans_entity_full(ner_tag, new_ans)[source]¶

Returns a function that yields new_ans iff every token has |ner_tag|

Parameters

ner_tag (str) – ner tag
new_ans (list) – like [{‘word’: ‘Saint’, ‘pos’: ‘NNP’, ‘lemma’: ‘Saint’, ‘ner’: ‘PERSON’}…]

Returns

fake answer, str

static ans_abbrev(new_ans)[source]¶

Parameters: strnew_ans – answer words
Return str: fake answer

static ans_match_wh(wh_word, new_ans)[source]¶

Returns a function that yields new_ans: if the question starts with |wh_word|

Parameters

wh_word (str) – question word
new_ans (list) – like [{‘word’: ‘Saint’, ‘pos’: ‘NNP’, ‘lemma’: ‘Saint’, ‘ner’: ‘PERSON’}…]

Return str

fake answers,

static ans_pos(pos, new_ans, end=False, add_dt=False)[source]¶

Returns a function that yields new_ans if the first/last token has |pos|

Parameters

pos (str) – pos tag
new_ans (list) – like [{‘word’: ‘Saint’, ‘pos’: ‘NNP’, ‘lemma’: ‘Saint’, ‘ner’: ‘PERSON’}…]
end (bool) – whether to use the last word to match the pos tag
add_dt (bool) – whether to add a determiner

Return str

fake answer

static read_const_parse(parse_str)[source]¶: Construct a constituency tree based on constituency parser

static fix_style(s)[source]¶: Minor, general style fixes for questions.

class textflint.input_layer.component.sample.mrc_sample.AnswerRule[source]¶

Bases: textflint.input_layer.component.sample.mrc_sample.ConversionRule

Just return the answer.

name = 'AnswerRule'¶

class textflint.input_layer.component.sample.mrc_sample.ConstituencyParse(tag, children=None, word=None, index=None)[source]¶

Bases: object

A CoreNLP constituency parse (or a node in a parse tree).

classmethod from_corenlp(s)[source]¶: Parses the “parse” attribute returned by CoreNLP parse annotator.

classmethod replace_words(tree, new_words)[source]¶: Return a new tree, with new words replacing old ones.

class textflint.input_layer.component.sample.mrc_sample.ConstituencyRule(in_pattern, out_pattern, postproc=None)[source]¶

Bases: textflint.input_layer.component.sample.mrc_sample.ConversionRule

A rule for converting question to sentence based on constituency parse.

gen_output(fmt_args)[source]¶: By default, use self.out_pattern. Can be overridden.

class textflint.input_layer.component.sample.mrc_sample.Field(field_value, field_type=<class 'str'>, **kwargs)[source]¶

Bases: object

A helper class that represents input string that to be modified.

__init__(field_value, field_type=<class 'str'>, **kwargs)[source]¶

Parameters

field_value (string|int|list) – The string that Field represents.
field_type (str) – field value type

class textflint.input_layer.component.sample.mrc_sample.FindWHPRule[source]¶

Bases: textflint.input_layer.component.sample.mrc_sample.ConversionRule

A rule that looks for $WHP’s from right to left and does replacements.

name = 'FindWHP'¶

class textflint.input_layer.component.sample.mrc_sample.LancasterStemmer(rule_tuple=None, strip_prefix_flag=False)[source]¶

Bases: nltk.stem.api.StemmerI

Lancaster Stemmer

>>> from nltk.stem.lancaster import LancasterStemmer
>>> st = LancasterStemmer()
>>> st.stem('maximum')     # Remove "-um" when word is intact
'maxim'
>>> st.stem('presumably')  # Don't remove "-um" when word is not intact
'presum'
>>> st.stem('multiply')    # No action taken if word ends with "-ply"
'multiply'
>>> st.stem('provision')   # Replace "-sion" with "-j" to trigger "j" set of rules
'provid'
>>> st.stem('owed')        # Word starting with vowel must contain at least 2 letters
'ow'
>>> st.stem('ear')         # ditto
'ear'
>>> st.stem('saying')      # Words starting with consonant must contain at least 3
'say'
>>> st.stem('crying')      #     letters and one of those letters must be a vowel
'cry'
>>> st.stem('string')      # ditto
'string'
>>> st.stem('meant')       # ditto
'meant'
>>> st.stem('cement')      # ditto
'cem'
>>> st_pre = LancasterStemmer(strip_prefix_flag=True)
>>> st_pre.stem('kilometer') # Test Prefix
'met'
>>> st_custom = LancasterStemmer(rule_tuple=("ssen4>", "s1t."))
>>> st_custom.stem("ness") # Change s to t
'nest'

default_rule_tuple = ('ai*2.', 'a*1.', 'bb1.', 'city3s.', 'ci2>', 'cn1t>', 'dd1.', 'dei3y>', 'deec2ss.', 'dee1.', 'de2>', 'dooh4>', 'e1>', 'feil1v.', 'fi2>', 'gni3>', 'gai3y.', 'ga2>', 'gg1.', 'ht*2.', 'hsiug5ct.', 'hsi3>', 'i*1.', 'i1y>', 'ji1d.', 'juf1s.', 'ju1d.', 'jo1d.', 'jeh1r.', 'jrev1t.', 'jsim2t.', 'jn1d.', 'j1s.', 'lbaifi6.', 'lbai4y.', 'lba3>', 'lbi3.', 'lib2l>', 'lc1.', 'lufi4y.', 'luf3>', 'lu2.', 'lai3>', 'lau3>', 'la2>', 'll1.', 'mui3.', 'mu*2.', 'msi3>', 'mm1.', 'nois4j>', 'noix4ct.', 'noi3>', 'nai3>', 'na2>', 'nee0.', 'ne2>', 'nn1.', 'pihs4>', 'pp1.', 're2>', 'rae0.', 'ra2.', 'ro2>', 'ru2>', 'rr1.', 'rt1>', 'rei3y>', 'sei3y>', 'sis2.', 'si2>', 'ssen4>', 'ss0.', 'suo3>', 'su*2.', 's*1>', 's0.', 'tacilp4y.', 'ta2>', 'tnem4>', 'tne3>', 'tna3>', 'tpir2b.', 'tpro2b.', 'tcud1.', 'tpmus2.', 'tpec2iv.', 'tulo2v.', 'tsis0.', 'tsi3>', 'tt1.', 'uqi3.', 'ugo1.', 'vis3j>', 'vie0.', 'vi2>', 'ylb1>', 'yli3y>', 'ylp0.', 'yl2>', 'ygo1.', 'yhp1.', 'ymo1.', 'ypo1.', 'yti3>', 'yte3>', 'ytl2.', 'yrtsi5.', 'yra3>', 'yro3>', 'yfi3.', 'ycn2t>', 'yca3>', 'zi2>', 'zy1s.')¶

__init__(rule_tuple=None, strip_prefix_flag=False)[source]¶: Create an instance of the Lancaster stemmer.

parseRules(rule_tuple=None)[source]¶

Validate the set of rules used in this stemmer.

If this function is called as an individual method, without using stem method, rule_tuple argument will be compiled into self.rule_dictionary. If this function is called within stem, self._rule_tuple will be used.

stem(word)[source]¶: Stem a word using the Lancaster stemmer.

class textflint.input_layer.component.sample.mrc_sample.Path(*args, **kwargs)[source]¶

Bases: pathlib.PurePath

PurePath subclass that can make system calls.

Path represents a filesystem path but unlike PurePath, also offers methods to do system calls on path objects. Depending on your system, instantiating a Path will return either a PosixPath or a WindowsPath object. You can also instantiate a PosixPath or WindowsPath directly, but cannot instantiate a WindowsPath on a POSIX system or vice versa.

classmethod cwd()[source]¶: Return a new path pointing to the current working directory (as returned by os.getcwd()).

classmethod home()[source]¶: Return a new path pointing to the user’s home directory (as returned by os.path.expanduser(‘~’)).

samefile(other_path)[source]¶: Return whether other_path is the same or not as this file (as returned by os.path.samefile()).

iterdir()[source]¶: Iterate over the files in this directory. Does not yield any result for the special paths ‘.’ and ‘..’.

glob(pattern)[source]¶: Iterate over this subtree and yield all existing files (of any kind, including directories) matching the given relative pattern.

rglob(pattern)[source]¶: Recursively yield all existing files (of any kind, including directories) matching the given relative pattern, anywhere in this subtree.

absolute()[source]¶

Return an absolute version of this path. This function works even if the path doesn’t point to anything.

No normalization is done, i.e. all ‘.’ and ‘..’ will be kept along. Use resolve() to get the canonical path to a file.

resolve(strict=False)[source]¶: Make the path absolute, resolving all symlinks on the way and also normalizing it (for example turning slashes into backslashes under Windows).

stat()[source]¶: Return the result of the stat() system call on this path, like os.stat() does.

owner()[source]¶: Return the login name of the file owner.

group()[source]¶: Return the group name of the file gid.

open(mode='r', buffering=- 1, encoding=None, errors=None, newline=None)[source]¶: Open the file pointed by this path and return a file object, as the built-in open() function does.

read_bytes()[source]¶: Open the file in bytes mode, read it, and close the file.

read_text(encoding=None, errors=None)[source]¶: Open the file in text mode, read it, and close the file.

write_bytes(data)[source]¶: Open the file in bytes mode, write to it, and close the file.

write_text(data, encoding=None, errors=None)[source]¶: Open the file in text mode, write to it, and close the file.

touch(mode=438, exist_ok=True)[source]¶: Create this file with the given access mode, if it doesn’t exist.

mkdir(mode=511, parents=False, exist_ok=False)[source]¶: Create a new directory at this given path.

chmod(mode)[source]¶: Change the permissions of the path, like os.chmod().

lchmod(mode)[source]¶: Like chmod(), except if the path points to a symlink, the symlink’s permissions are changed, rather than its target’s.

unlink()[source]¶: Remove this file or link. If the path is a directory, use rmdir() instead.

rmdir()[source]¶: Remove this directory. The directory must be empty.

lstat()[source]¶: Like stat(), except if the path points to a symlink, the symlink’s status information is returned, rather than its target’s.

rename(target)[source]¶: Rename this path to the given path.

replace(target)[source]¶: Rename this path to the given path, clobbering the existing destination if it exists.

symlink_to(target, target_is_directory=False)[source]¶: Make this path a symlink pointing to the given path. Note the order of arguments (self, target) is the reverse of os.symlink’s.

exists()[source]¶: Whether this path exists.

is_dir()[source]¶: Whether this path is a directory.

is_file()[source]¶: Whether this path is a regular file (also True for symlinks pointing to regular files).

is_mount()[source]¶: Check if this path is a POSIX mount point

is_symlink()[source]¶: Whether this path is a symbolic link.

is_block_device()[source]¶: Whether this path is a block device.

is_char_device()[source]¶: Whether this path is a character device.

is_fifo()[source]¶: Whether this path is a FIFO.

is_socket()[source]¶: Whether this path is a socket.

expanduser()[source]¶: Return a new path with expanded ~ and ~user constructs (as returned by os.path.expanduser)

class textflint.input_layer.component.sample.mrc_sample.ReplaceRule(target, replacement='{}', start=False)[source]¶

Bases: textflint.input_layer.component.sample.mrc_sample.ConversionRule

A simple rule that replaces some tokens with the answer.

class textflint.input_layer.component.sample.mrc_sample.Sample(data, origin=None, sample_id=None)[source]¶

Bases: abc.ABC

Base Sample class to hold the necessary info and provide atomic operations

text_processor = <textflint.common.preprocess.en_processor.EnProcessor object>¶

__init__(data, origin=None, sample_id=None)[source]¶

Parameters

data (dict) – The dict obj that contains data info.
origin (sample) – original sample obj.
sample_id (int) – sampleindex

get_value(field)[source]¶

Get field value by field_str.

Parameters: field (str) – field name
Returns: field value

get_words(field)[source]¶

Get tokenized words of given textfield

Parameters: field (str) – field name
Returns: tokenized words

get_text(field)[source]¶

Get text string of given textfield

Parameters: field (str) – field name
Return string: text

get_mask(field)[source]¶

Get word masks of given textfield

Parameters: field (str) – field name
Returns: list of mask values

get_sentences(field)[source]¶

Get split sentences of given textfield

Parameters: field (str) – field name
Returns: list of sentences

get_pos(field)[source]¶: Get text field pos tags. :param str field: field name :return: pos tag list

get_ner(field)[source]¶

Get text field ner tags

Parameters: field (str) – field name
Returns: ner tag list

replace_fields(fields, field_values, field_masks=None)[source]¶

Fully replace multi fields at the same time and return new sample. Notice: Not suggest use this API as it will set mask values of TextField to MODIFIED_MASK.

Parameters

fields (list) – field str list
field_values (list) – field value list
field_masks (list) – indicate mask values, useful for printable text

Returns

Modified Sample

replace_field(field, field_value, field_mask=None)[source]¶

Fully replace single field and return new sample. Notice: Not suggest use this API as it will set mask values of TextField to MODIFIED_MASK.

Parameters

field (str) – field str
field_value – field_type
field_mask (list) – indicate mask value of field

Returns

Modified Sample

replace_field_at_indices(field, indices, items)[source]¶

Replace items of multi given scopes of field value at the same time. Stay away from the complex function !!!

Be careful of your input list shape.

Parameters

field (str) – field name
of int|list|slice indices (list) –

each index can be int indicate replace single item or their list
like [1, 2, 3],

can be list like (0,3) indicate replace items from
0 to 3(not included),

can be slice which would be convert to list.
items –

Returns

Modified Sample

replace_field_at_index(field, index, items)[source]¶

Replace items of given scope of field value.

Be careful of your input list shape.

Parameters

field (str) – field name
index (int|list|slice) –
can be int indicate replace single item or list like [1, 2, 3], can be list like (0,3) indicate replace items

from 0 to 3(not included),

can be slice which would be convert to list.
items (str|list) – shape: indices_num, correspond to field_sub_items

Returns

Modified Sample

unequal_replace_field_at_indices(field, indices, rep_items)[source]¶

Replace scope items of field value with rep_items which may not equal with scope.

Parameters

field – field str
indices – list of int/tupe/list
rep_items – list

Returns

Modified Sample

delete_field_at_indices(field, indices)[source]¶

Delete items of given scopes of field value.

Parameters

field (str) – field name
of int|list|slice indices (list) –
shape：indices_num each index can be int indicate delete single item or their list

like [1, 2, 3],

can be list like (0,3) indicate replace items
from 0 to 3(not included),

can be slice which would be convert to list.

Returns

Modified Sample

delete_field_at_index(field, index)[source]¶

Delete items of given scopes of field value.

Parameters

field (str) – field value
index (int|list|slice) –
can be int indicate delete single item or their list like [1, 2, 3], can be list like (0,3) indicate replace items

from 0 to 3(not included),

can be slice which would be convert to list.

Returns

Modified Sample

insert_field_before_indices(field, indices, items)[source]¶

Insert items of multi given scopes before indices of field value at the same time.

Stay away from the complex function !!! Be careful of your input list shape.

Parameters

field (str) – field name
indices – list of int, shape：indices_num, list like [1, 2, 3]
items – list of str/list, shape: indices_num, correspond to indices

Returns

Modified Sample

insert_field_before_index(field, index, items)[source]¶

Insert items of multi given scope before index of field value.

Parameters

field (str) – field name
index (int) – indicate which index to insert items
items (str|list) – items to insert

Returns

Modified Sample

insert_field_after_indices(field, indices, items)[source]¶

Insert items of multi given scopes after indices of field value at the same time.

Stay away from the complex function !!! Be careful of your input list shape.

Parameters

field (str) – field name
indices – list of int, shape：indices_num, like [1, 2, 3]
items – list of str/list shape: indices_num, correspond to indices

Returns

Modified Sample

insert_field_after_index(field, index, items)[source]¶

Insert items of multi given scope after index of field value

Parameters

field (str) – field name
index (int) – indicate where to apply insert
items (str|list) – shape: indices_num, correspond to field_sub_items

Returns

Modified Sample

swap_field_at_index(field, first_index, second_index)[source]¶

Swap items between first_index and second_index of field value.

Parameters

field (str) – field name
first_index (int) –
second_index (int) –

Returns

Modified Sample

abstract check_data(data)[source]¶

Check rare data format

Parameters: data – rare data input
Returns

abstract load(data)[source]¶

Parse data into sample field value.

Parameters: data – rare data input

abstract dump()[source]¶

Convert sample info to input data json format.

Returns: dict object.

classmethod clone(original_sample)[source]¶

Deep copy self to a new sample

Parameters: original_sample – sample to be copied
Returns: Sample instance

property is_origin¶: Return whether the sample is original Sample.

class textflint.input_layer.component.sample.mrc_sample.TextField(field_value, mask=None, is_one_sent=False, split_by_space=False, **kwargs)[source]¶

Bases: textflint.input_layer.component.field.field.Field

A helper class that represents input string that to be modified.

Text that Sample contains parsed in data set, TextField provides multiple methods for Sample to modify.

Support sentence level and word level modification, default using word level API.

text_processor = <textflint.common.preprocess.en_processor.EnProcessor object>¶

__init__(field_value, mask=None, is_one_sent=False, split_by_space=False, **kwargs)[source]¶

Parameters

field_value (str|list) – Sentence string or tokenized words.
mask (list) – list of mask values
is_one_sent (bool) – whether input is a sentence
split_by_space (boo) – whether tokenize sentence by split space
kwargs –

pos_of_word_index(desired_word_idx)[source]¶

Get pos tag of given index.

Parameters: desired_word_idx (int) – desire index to get pos tag
Returns: pos tag of word of desired_word_idx.

replace_at_indices(indices, new_items)[source]¶

Replace words at indices and set their mask to MODIFIED_MASK.

Parameters

indices ([int|listslice]) –

each index can be int indicate replace single item
or their list like [1, 2, 3].

each index can be list like (0,3) indicate replace items
from 0 to 3(not included) or their list like [(0, 3), (5,6)]

each index can be slice which would be convert to list.
new_items ([str|list|tuple]) – items corresponding indices.

Returns

Replaced TextField object.

replace_at_index(index, new_items)[source]¶

Replace words at indices and set their mask to MODIFIED_MASK.

Parameters

index (intlistslice) –
can be int indicate replace single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items

from 0 to 3(not included) or their list like [(0, 3), (5,6)]

can be slice which would be convert to list.
new_items (str|listtuple) – items corresponding index.

Returns

Replaced TextField object.

delete_at_indices(indices)[source]¶

Delete words at indices and remove their mask value.

Parameters

indices ([int|list|slice]) –

each index can be int indicate replace single item: or their list like [1, 2, 3].
each index can be list like (0,3) indicate replace items: from 0 to 3(not included) or their list like [(0, 3), (5,6)]

each index can be slice which would be convert to list.

Returns

Modified TextField object.

delete_at_index(index)[source]¶

Delete words at index and remove their mask value.

Parameters

index (int|list|slice) –

can be int indicate replace single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items

from 0 to 3(not included) or their list like [(0, 3), (5,6)]

can be slice which would be convert to list.

Returns

Modified TextField object.

insert_before_indices(indices, new_items)[source]¶

Insert words before indices.

Parameters

indices ([int]) –
can be int indicate replace single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items

from 0 to 3(not included) or their list like [(0, 3), (5,6)]

can be slice which would be convert to list.
new_items ([str|list|tuple]) – items corresponding index.

Returns

new TextField object.

insert_before_index(index, new_items)[source]¶

Insert words before index and remove their mask value.

Parameters

index (int) –
can be int indicate replace single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items

from 0 to 3(not included) or their list like [(0, 3), (5,6)]

can be slice which would be convert to list.
new_items (str|list|tuple) – items corresponding index.

Returns

new TextField object.

insert_after_indices(indices, new_items)[source]¶

Insert words after indices.

Parameters

indices ([int]) –
can be int indicate replace single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items

from 0 to 3(not included) or their list like [(0, 3), (5,6)]

can be slice which would be convert to list.
new_items ([str|list|tuple]) – items corresponding index.

Returns

new TextField object.

insert_after_index(index, new_items)[source]¶

Insert words before index and remove their mask value.

Parameters

index (int) –
can be int indicate replace single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items

from 0 to 3(not included) or their list like [(0, 3), (5,6)]

can be slice which would be convert to list.
new_items (str|list|tuple) – items corresponding index.

Returns

new TextField object.

swap_at_index(first_index, second_index)[source]¶

Swap items between first_index and second_index of origin_list

Parameters

first_index (int) – index of first item
second_index (int) – index of second item

Returns

Modified TextField object.

property pos_tagging¶

Get POS tags.

Example:

given sentence 'All things in their being are good for something.'

>> [('All', 'DT'),
    ('things', 'NNS'),
    ('in', 'IN'),
    ('their', 'PRP$'),
    ('being', 'VBG'),
    ('are', 'VBP'),
    ('good', 'JJ'),
    ('for', 'IN'),
    ('something', 'NN'),
    ('.', '.')]

Returns: Tokenized tokens with their POS tags.

property ner¶

Get NER tags.

Example:

given sentence 'Lionel Messi is a football player from Argentina.'

>>[('Lionel Messi', 0, 2, 'PERSON'),
   ('Argentina', 7, 8, 'LOCATION')]

Returns: A list of tuples, (entity, start, end, label)

property dependency_parsing¶

Dependency parsing.

Example:

given sentence: 'The quick brown fox jumps over the lazy dog.'

>>
    The     DT      4       det
    quick   JJ      4       amod
    brown   JJ      4       amod
    fox     NN      5       nsubj
    jumps   VBZ     0       root
    over    IN      9       case
    the     DT      9       det
    lazy    JJ      9       amod
    dog     NN      5       obl

Returns: A list of tuples, (token, pos, target, type)

textflint.input_layer.component.sample.mrc_sample.deepcopy(x, memo=None, _nil=[])[source]¶

Deep copy operation on arbitrary Python objects.

See the module’s __doc__ string for more info.

textflint.input_layer.component.sample.mrc_sample.normalize_scope(scope)[source]¶

Convert various scope input to list format of [left_bound, right_bound]

Parameters: scope (int|list|tuple|slice) – can be int indicate replace single item like 1 or 3. can be list like (0,3) indicate replace items from 0 to 3(not included) or their list like [5,6] can be slice which would be convert to list or their list.
Return list: [left_bound, right_bound]