textflint.input_layer.component.sample.coref_sample¶
Coref Sample Class¶
-
class
textflint.input_layer.component.sample.coref_sample.CorefSample(data, origin=None, sample_id=None)[source]¶ Bases:
textflint.input_layer.component.sample.sample.SampleCoref Sample
-
check_data(data)[source]¶ Check if data is a conll-dict and is ready to be predicted.
- Parameters
data (None|dict) – Must have key: sentences, clusters May have key: doc_key, speakers, constituents, ner
- Returns
-
load(data)[source]¶ Convert a conll-dict to CorefSample.
- Parameters
data (None|dict) – None, or a conll-style dict Must have key: sentences, clusters May have key: doc_key, speakers, constituents, ner
- Returns
-
dump(with_check=True)[source]¶ Dump a CorefSample to a conll-dict.
- Parameters
with_check (bool) – whether the dumped conll-dict should be checked
- Return dict ret_dict
a conll-style dict
-
pretty_print(show='Sample:')[source]¶ A pretty-printer for CorefSample. Print useful sample information by calling this function.
- Parameters
show (str) – optional, the welcome information of printing this sample
-
num_sentences()[source]¶ the number of sentences in this sample
- Param
- Return int
the number of sentences in this sample
-
get_kth_sen(k)[source]¶ get the kth sen as a word list
- Parameters
k (int) – sen id
- Return list
kth sen, word list
-
eqlen_sen_map()[source]¶ Generate [0, 0, 1, 1, 1, 2, 2] from self.sen_map = [2, 3, 2]
- Param
- Return list
sentence mapping with equal length to x, like [0, 0, 1, 1, 1, 2, 2]
-
index_in_sen(idx)[source]¶ For the given word idx, determine which sen it is in.
- Parameters
idx (int) – word idx
- Return int
sen_idx, which sentence is word idx in
-
static
sens2doc(sens)[source]¶ Given an 2nd list of str (word list list), concat it and records the length of each sentence
- Parameters
sens (list) – 2nd list of str (word list list)
- Returns (list, list)
x as list of str (word list), sen_map as list of int (sen len list)
-
static
doc2sens(x, sen_map)[source]¶ Given x and sen_map, return sens. Inverse to sens2doc.
- Parameters
x (list) – list of str (word list)
sen_map (list) – list of int (sen len list)
- Return list
sens as 2nd list of str (word list list)
-
insert_field_before_indices(field, indices, items)[source]¶ Insert items of given scopes before indices of field value simutaneously
- Parameters
field (str) – transformed field
indices (list) – indices of insert positions
items (list) – insert items
- Return ~textflint.CorefSample
modified sample
-
insert_field_after_indices(field, indices, items)[source]¶ Insert items of given scopes after indices of field value simutaneously.
- Parameters
field (str) – transformed field
indices (list) – indices of insert positions
items (list) – insert items
- Return ~textflint.CorefSample
modified sample
-
delete_field_at_indices(field, indices)[source]¶ Delete items of given scopes of field value.
- Parameters
field (str) – transformed field
indices (list) – indices of delete positions
- Return ~textflint.CorefSample
modified sample
-
replace_field_at_indices(field, indices, items)[source]¶ Replace scope items of field value with items. :param str field: transformed field :param list indices: indices of delete positions :param list items: insert items :return ~textflint.CorefSample: modified sample
-
static
concat_conlls(*args)[source]¶ Given several CorefSamples, concat the values key by key.
- Param
Some CorefSamples
- Return ~textflint.input_layer.component.sample.CorefSample
A CorefSample, as the docs are concanated to form one x
-
shuffle_conll(sen_idxs)[source]¶ Given a CorefSample and shuffled sentence indexes, reproduce a CorefSample with respect to the indexes.
- Parameters
sen_idxs (list) – a list of ints. the indexes in a shuffled order we expect sen_idxs is like [1, 3, 0, 4, 2, 5] when sen_num = 6
- Return ~textflint.input_layer.component.sample.CorefSample
a CorefSample with respect to the shuffled index
-
part_conll(pres_idxs)[source]¶ Only sentences with indexs will be kept, and all the structures of clusters are kept for convenience of concat.
- Parameters
pres_idxs (list) – a list of ints. the indexes to be preserved we expect pres_idxs is from [0..num_sen], and is in ascending order, like [0, 1, 3, 5] when num_sen = 6
- Return ~textflint.input_layer.component.sample.CorefSample
a CorefPartSample of a conll-part
-
part_before_conll(sen_idx)[source]¶ Only sentences [0, sen_idx) will be kept, and all the structures of clusters are kept for convenience of concat.
- Parameters
sen_idx (int) – sentences with idx < sen_idx will be preserved
- Return ~textflint.input_layer.component.sample.CorefSample
a CorefPartSample of a conll-part
-
part_after_conll(sen_idx)[source]¶ Only sentences [sen_idx:] will be kept, and all the structures of clusters are kept for convenience of concat.
- Parameters
sen_idx (int) – sentences with idx < sen_idx will be preserved
- Return ~textflint.input_layer.component.sample.CorefSample
a CorefPartSample of a conll-part
-
-
class
textflint.input_layer.component.sample.coref_sample.CorefPartSample(data, origin=None, sample_id=None)[source]¶ Bases:
textflint.input_layer.component.sample.coref_sample.CorefSampleCoref Part Sample: corresponed to a part of a Coref Sample
-
check_data(data)[source]¶ Check if data is a conll-part. The condition is looser than conll
- Parameters
data (None|dict) – Must have key: sentences, clusters May have key: doc_key, speakers, constituents, ner
- Returns
-
remove_invalid_corefs_from_part()[source]¶ conll parts may contain clusters that has only 0 or 1 span, which is not a valid one.
This function remove these invalid clusters from self.clusters.
- Return ~textflint.input_layer.component.sample.CorefSample
a CorefSample that passes check_data
-
static
concat_conll_parts(*args)[source]¶ concat conll parts
- Param
many CorefPartSamples elements in which are assumed to be parts from the same conll, generated by part_conll. Merge result is still treated as a conll part, which should be postprocessed by remove_invalid_corefs_from_part to form a valid CorefSample.
- Return ~textflint.input_layer.component.sample.CorefPartSample
a CorefPartSample of a conll-part
-
-
class
textflint.input_layer.component.sample.coref_sample.ListField(field_value, **kwargs)[source]¶ Bases:
textflint.input_layer.component.field.field.FieldA helper class that represents input list values that to be modified.
Operations which modify field_value would generate new Field instance.
-
__init__(field_value, **kwargs)[source]¶ - Parameters
field_value ([str]) – The list that ListField represents.
-
replace_at_indices(indices, new_items)[source]¶ Replace items at indices.
Notice: just support isometric replace.
- Parameters
indices (list[int|list|slice]) – each index can be int indicate replace single item or their list like [1, 2, 3]. each index can be list like (0,3) indicate replace items from 0 to 3(not included) or their list like [(0, 3), (5,6)] each index can be slice which would be convert to list.
new_items (list) – items corresponding indices.
- Returns
new field object.
-
replace_at_index(index, new_items)[source]¶ Replace item at index.
- Parameters
index (int|list|slice) –
can be int indicate replace single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items from 0
to 3(not included) or their list like [(0, 3), (5,6)]
can be slice which would be convert to list.
new_items (list) – items corresponding index.
- Returns
new field object.
-
delete_at_indices(indices)[source]¶ Delete items at indices.
- Parameters
indices (list[int|list|slice]) – each index can be int indicate delete single item or their list like [1, 2, 3]. each index can be list like (0,3) indicate replace items from 0 to 3(not included) or their list like [(0, 3), (5,6)] each index can be slice which would be convert to list.
- Returns
new field object.
-
delete_at_index(index)[source]¶ Delete item at index.
- Parameters
index (int|list|slice) –
can be int indicate delete single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items from 0
to 3(not included) or their list like [(0, 3), (5,6)]
can be slice which would be convert to list.
- Returns
new field object.
-
insert_before_indices(indices, new_items)[source]¶ Insert items before indices.
- Parameters
indices (list[int|list|slice]) – each index can be int indicate insert single item or their list like [1, 2, 3]. each index can be list like (0,3) indicate replace items from 0 to 3(not included) or their list like [(0, 3), (5,6)] each index can be slice which would be convert to list.
new_items (list) – items corresponding indices.
- Returns
new field object.
-
insert_before_index(index, new_items)[source]¶ Insert items before index.
- Parameters
index (int|list|slice) –
can be int indicate insert single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items from 0
to 3(not included) or their list like [(0, 3), (5,6)]
can be slice which would be convert to list.
new_items (list) – items corresponding index.
- Returns
new field object.
-
insert_after_indices(indices, new_items)[source]¶ Insert item after index.
- Parameters
indices (list[int|list|slice]) – each index can be int indicate insert single item or their list like [1, 2, 3]. each index can be list like (0,3) indicate replace items from 0 to 3(not included) or their list like [(0, 3), (5,6)] each index can be slice which would be convert to list.
new_items (list) – items corresponding indices.
- Returns
new field object.
-
insert_after_index(index, new_items)[source]¶ Insert item after index.
- Parameters
index (int|list|slice) –
can be int indicate insert single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items from 0
to 3(not included) or their list like [(0, 3), (5,6)]
can be slice which would be convert to list.
new_items (list) – items corresponding index
- Returns
new field object.
-
-
class
textflint.input_layer.component.sample.coref_sample.Sample(data, origin=None, sample_id=None)[source]¶ Bases:
abc.ABCBase Sample class to hold the necessary info and provide atomic operations
-
text_processor= <textflint.common.preprocess.en_processor.EnProcessor object>¶
-
__init__(data, origin=None, sample_id=None)[source]¶ - Parameters
data (dict) – The dict obj that contains data info.
origin (sample) – original sample obj.
sample_id (int) – sampleindex
-
get_value(field)[source]¶ Get field value by field_str.
- Parameters
field (str) – field name
- Returns
field value
-
get_words(field)[source]¶ Get tokenized words of given textfield
- Parameters
field (str) – field name
- Returns
tokenized words
-
get_text(field)[source]¶ Get text string of given textfield
- Parameters
field (str) – field name
- Return string
text
-
get_mask(field)[source]¶ Get word masks of given textfield
- Parameters
field (str) – field name
- Returns
list of mask values
-
get_sentences(field)[source]¶ Get split sentences of given textfield
- Parameters
field (str) – field name
- Returns
list of sentences
-
get_ner(field)[source]¶ Get text field ner tags
- Parameters
field (str) – field name
- Returns
ner tag list
-
replace_fields(fields, field_values, field_masks=None)[source]¶ Fully replace multi fields at the same time and return new sample. Notice: Not suggest use this API as it will set mask values of TextField to MODIFIED_MASK.
- Parameters
fields (list) – field str list
field_values (list) – field value list
field_masks (list) – indicate mask values, useful for printable text
- Returns
Modified Sample
-
replace_field(field, field_value, field_mask=None)[source]¶ Fully replace single field and return new sample. Notice: Not suggest use this API as it will set mask values of TextField to MODIFIED_MASK.
- Parameters
field (str) – field str
field_value – field_type
field_mask (list) – indicate mask value of field
- Returns
Modified Sample
-
replace_field_at_indices(field, indices, items)[source]¶ Replace items of multi given scopes of field value at the same time. Stay away from the complex function !!!
Be careful of your input list shape.
- Parameters
field (str) – field name
of int|list|slice indices (list) –
- each index can be int indicate replace single item or their list
like [1, 2, 3],
- can be list like (0,3) indicate replace items from
0 to 3(not included),
can be slice which would be convert to list.
items –
- Returns
Modified Sample
-
replace_field_at_index(field, index, items)[source]¶ Replace items of given scope of field value.
Be careful of your input list shape.
- Parameters
field (str) – field name
index (int|list|slice) –
can be int indicate replace single item or list like [1, 2, 3], can be list like (0,3) indicate replace items
from 0 to 3(not included),
can be slice which would be convert to list.
items (str|list) – shape: indices_num, correspond to field_sub_items
- Returns
Modified Sample
-
unequal_replace_field_at_indices(field, indices, rep_items)[source]¶ Replace scope items of field value with rep_items which may not equal with scope.
- Parameters
field – field str
indices – list of int/tupe/list
rep_items – list
- Returns
Modified Sample
-
delete_field_at_indices(field, indices)[source]¶ Delete items of given scopes of field value.
- Parameters
field (str) – field name
of int|list|slice indices (list) –
shape:indices_num each index can be int indicate delete single item or their list
like [1, 2, 3],
- can be list like (0,3) indicate replace items
from 0 to 3(not included),
can be slice which would be convert to list.
- Returns
Modified Sample
-
delete_field_at_index(field, index)[source]¶ Delete items of given scopes of field value.
- Parameters
field (str) – field value
index (int|list|slice) –
can be int indicate delete single item or their list like [1, 2, 3], can be list like (0,3) indicate replace items
from 0 to 3(not included),
can be slice which would be convert to list.
- Returns
Modified Sample
-
insert_field_before_indices(field, indices, items)[source]¶ Insert items of multi given scopes before indices of field value at the same time.
Stay away from the complex function !!! Be careful of your input list shape.
- Parameters
field (str) – field name
indices – list of int, shape:indices_num, list like [1, 2, 3]
items – list of str/list, shape: indices_num, correspond to indices
- Returns
Modified Sample
-
insert_field_before_index(field, index, items)[source]¶ Insert items of multi given scope before index of field value.
- Parameters
field (str) – field name
index (int) – indicate which index to insert items
items (str|list) – items to insert
- Returns
Modified Sample
-
insert_field_after_indices(field, indices, items)[source]¶ Insert items of multi given scopes after indices of field value at the same time.
Stay away from the complex function !!! Be careful of your input list shape.
- Parameters
field (str) – field name
indices – list of int, shape:indices_num, like [1, 2, 3]
items – list of str/list shape: indices_num, correspond to indices
- Returns
Modified Sample
-
insert_field_after_index(field, index, items)[source]¶ Insert items of multi given scope after index of field value
- Parameters
field (str) – field name
index (int) – indicate where to apply insert
items (str|list) – shape: indices_num, correspond to field_sub_items
- Returns
Modified Sample
-
swap_field_at_index(field, first_index, second_index)[source]¶ Swap items between first_index and second_index of field value.
- Parameters
field (str) – field name
first_index (int) –
second_index (int) –
- Returns
Modified Sample
-
classmethod
clone(original_sample)[source]¶ Deep copy self to a new sample
- Parameters
original_sample – sample to be copied
- Returns
Sample instance
-
property
is_origin¶ Return whether the sample is original Sample.
-
-
class
textflint.input_layer.component.sample.coref_sample.TextField(field_value, mask=None, is_one_sent=False, split_by_space=False, **kwargs)[source]¶ Bases:
textflint.input_layer.component.field.field.FieldA helper class that represents input string that to be modified.
Text that Sample contains parsed in data set,
TextFieldprovides multiple methods for Sample to modify.Support sentence level and word level modification, default using word level API.
-
text_processor= <textflint.common.preprocess.en_processor.EnProcessor object>¶
-
__init__(field_value, mask=None, is_one_sent=False, split_by_space=False, **kwargs)[source]¶ - Parameters
field_value (str|list) – Sentence string or tokenized words.
mask (list) – list of mask values
is_one_sent (bool) – whether input is a sentence
split_by_space (boo) – whether tokenize sentence by split space
kwargs –
-
pos_of_word_index(desired_word_idx)[source]¶ Get pos tag of given index.
- Parameters
desired_word_idx (int) – desire index to get pos tag
- Returns
pos tag of word of desired_word_idx.
-
replace_at_indices(indices, new_items)[source]¶ Replace words at indices and set their mask to MODIFIED_MASK.
- Parameters
indices ([int|listslice]) –
- each index can be int indicate replace single item
or their list like [1, 2, 3].
- each index can be list like (0,3) indicate replace items
from 0 to 3(not included) or their list like [(0, 3), (5,6)]
each index can be slice which would be convert to list.
new_items ([str|list|tuple]) – items corresponding indices.
- Returns
Replaced TextField object.
-
replace_at_index(index, new_items)[source]¶ Replace words at indices and set their mask to MODIFIED_MASK.
- Parameters
index (intlistslice) –
can be int indicate replace single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items
from 0 to 3(not included) or their list like [(0, 3), (5,6)]
can be slice which would be convert to list.
new_items (str|listtuple) – items corresponding index.
- Returns
Replaced TextField object.
-
delete_at_indices(indices)[source]¶ Delete words at indices and remove their mask value.
- Parameters
indices ([int|list|slice]) –
- each index can be int indicate replace single item
or their list like [1, 2, 3].
- each index can be list like (0,3) indicate replace items
from 0 to 3(not included) or their list like [(0, 3), (5,6)]
each index can be slice which would be convert to list.
- Returns
Modified TextField object.
-
delete_at_index(index)[source]¶ Delete words at index and remove their mask value.
- Parameters
index (int|list|slice) –
can be int indicate replace single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items
from 0 to 3(not included) or their list like [(0, 3), (5,6)]
can be slice which would be convert to list.
- Returns
Modified TextField object.
-
insert_before_indices(indices, new_items)[source]¶ Insert words before indices.
- Parameters
indices ([int]) –
can be int indicate replace single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items
from 0 to 3(not included) or their list like [(0, 3), (5,6)]
can be slice which would be convert to list.
new_items ([str|list|tuple]) – items corresponding index.
- Returns
new TextField object.
-
insert_before_index(index, new_items)[source]¶ Insert words before index and remove their mask value.
- Parameters
index (int) –
can be int indicate replace single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items
from 0 to 3(not included) or their list like [(0, 3), (5,6)]
can be slice which would be convert to list.
new_items (str|list|tuple) – items corresponding index.
- Returns
new TextField object.
-
insert_after_indices(indices, new_items)[source]¶ Insert words after indices.
- Parameters
indices ([int]) –
can be int indicate replace single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items
from 0 to 3(not included) or their list like [(0, 3), (5,6)]
can be slice which would be convert to list.
new_items ([str|list|tuple]) – items corresponding index.
- Returns
new TextField object.
-
insert_after_index(index, new_items)[source]¶ Insert words before index and remove their mask value.
- Parameters
index (int) –
can be int indicate replace single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items
from 0 to 3(not included) or their list like [(0, 3), (5,6)]
can be slice which would be convert to list.
new_items (str|list|tuple) – items corresponding index.
- Returns
new TextField object.
-
swap_at_index(first_index, second_index)[source]¶ Swap items between first_index and second_index of origin_list
- Parameters
first_index (int) – index of first item
second_index (int) – index of second item
- Returns
Modified TextField object.
-
property
pos_tagging¶ Get POS tags.
Example:
given sentence 'All things in their being are good for something.' >> [('All', 'DT'), ('things', 'NNS'), ('in', 'IN'), ('their', 'PRP$'), ('being', 'VBG'), ('are', 'VBP'), ('good', 'JJ'), ('for', 'IN'), ('something', 'NN'), ('.', '.')]
- Returns
Tokenized tokens with their POS tags.
-
property
ner¶ Get NER tags.
Example:
given sentence 'Lionel Messi is a football player from Argentina.' >>[('Lionel Messi', 0, 2, 'PERSON'), ('Argentina', 7, 8, 'LOCATION')]
- Returns
A list of tuples, (entity, start, end, label)
-
property
dependency_parsing¶ Dependency parsing.
Example:
given sentence: 'The quick brown fox jumps over the lazy dog.' >> The DT 4 det quick JJ 4 amod brown JJ 4 amod fox NN 5 nsubj jumps VBZ 0 root over IN 9 case the DT 9 det lazy JJ 9 amod dog NN 5 obl
- Returns
A list of tuples, (token, pos, target, type)
-
-
textflint.input_layer.component.sample.coref_sample.concat(xss)[source]¶ Concat list of list to be a list. Usage:
concat([[1, 2], [2, 3]]) == [1, 2, 2, 3]
- Parameters
xss (list) – the list to be concat
-
textflint.input_layer.component.sample.coref_sample.recur_ap(f, ls)[source]¶ Apply f to every elem in ls (a nested list) recursively. Usages:
recur_ap(lambda x: x+2, 1) = 3 recur_ap(lambda x: x+2, [2, [3, 4]]) = [4, [5, 6]]
- Parameters
f (FunctionType) – the function to be applied to ls
ls – the value or the nested list to be processed
- Returns
process result
-
textflint.input_layer.component.sample.coref_sample.shift_collector(shifts)[source]¶ - Collect and compose shift`s to a general `shift, to be applied to
each span (or sth else).
- Parameters
shifts (list) – the shift functions
- Return ~types.FunctionType
the collected shift function
-
textflint.input_layer.component.sample.coref_sample.shift_decor(shift_func)[source]¶ - Make shift error-free on non-int values. Decorated shift keeps non-int
values original.
- Parameters
FunctionType – a shift function that only processes int values
- Return ~types.FunctionType
a shift function that processes all types of values