textflint.input_layer.component.sample.coref_sample

Coref Sample Class

class textflint.input_layer.component.sample.coref_sample.CorefSample(data, origin=None, sample_id=None)[source]

Bases: textflint.input_layer.component.sample.sample.Sample

Coref Sample

check_data(data)[source]

Check if data is a conll-dict and is ready to be predicted.

Parameters

data (None|dict) – Must have key: sentences, clusters May have key: doc_key, speakers, constituents, ner

Returns

Validate whether the sample is legal.

load(data)[source]

Convert a conll-dict to CorefSample.

Parameters

data (None|dict) – None, or a conll-style dict Must have key: sentences, clusters May have key: doc_key, speakers, constituents, ner

Returns

dump(with_check=True)[source]

Dump a CorefSample to a conll-dict.

Parameters

with_check (bool) – whether the dumped conll-dict should be checked

Return dict ret_dict

a conll-style dict

pretty_print(show='Sample:')[source]

A pretty-printer for CorefSample. Print useful sample information by calling this function.

Parameters

show (str) – optional, the welcome information of printing this sample

num_sentences()[source]

the number of sentences in this sample

Param

Return int

the number of sentences in this sample

get_kth_sen(k)[source]

get the kth sen as a word list

Parameters

k (int) – sen id

Return list

kth sen, word list

eqlen_sen_map()[source]

Generate [0, 0, 1, 1, 1, 2, 2] from self.sen_map = [2, 3, 2]

Param

Return list

sentence mapping with equal length to x, like [0, 0, 1, 1, 1, 2, 2]

index_in_sen(idx)[source]

For the given word idx, determine which sen it is in.

Parameters

idx (int) – word idx

Return int

sen_idx, which sentence is word idx in

static sens2doc(sens)[source]

Given an 2nd list of str (word list list), concat it and records the length of each sentence

Parameters

sens (list) – 2nd list of str (word list list)

Returns (list, list)

x as list of str (word list), sen_map as list of int (sen len list)

static doc2sens(x, sen_map)[source]

Given x and sen_map, return sens. Inverse to sens2doc.

Parameters
  • x (list) – list of str (word list)

  • sen_map (list) – list of int (sen len list)

Return list

sens as 2nd list of str (word list list)

insert_field_before_indices(field, indices, items)[source]

Insert items of given scopes before indices of field value simutaneously

Parameters
  • field (str) – transformed field

  • indices (list) – indices of insert positions

  • items (list) – insert items

Return ~textflint.CorefSample

modified sample

insert_field_after_indices(field, indices, items)[source]

Insert items of given scopes after indices of field value simutaneously.

Parameters
  • field (str) – transformed field

  • indices (list) – indices of insert positions

  • items (list) – insert items

Return ~textflint.CorefSample

modified sample

delete_field_at_indices(field, indices)[source]

Delete items of given scopes of field value.

Parameters
  • field (str) – transformed field

  • indices (list) – indices of delete positions

Return ~textflint.CorefSample

modified sample

replace_field_at_indices(field, indices, items)[source]

Replace scope items of field value with items. :param str field: transformed field :param list indices: indices of delete positions :param list items: insert items :return ~textflint.CorefSample: modified sample

static concat_conlls(*args)[source]

Given several CorefSamples, concat the values key by key.

Param

Some CorefSamples

Return ~textflint.input_layer.component.sample.CorefSample

A CorefSample, as the docs are concanated to form one x

shuffle_conll(sen_idxs)[source]

Given a CorefSample and shuffled sentence indexes, reproduce a CorefSample with respect to the indexes.

Parameters

sen_idxs (list) – a list of ints. the indexes in a shuffled order we expect sen_idxs is like [1, 3, 0, 4, 2, 5] when sen_num = 6

Return ~textflint.input_layer.component.sample.CorefSample

a CorefSample with respect to the shuffled index

part_conll(pres_idxs)[source]

Only sentences with indexs will be kept, and all the structures of clusters are kept for convenience of concat.

Parameters

pres_idxs (list) – a list of ints. the indexes to be preserved we expect pres_idxs is from [0..num_sen], and is in ascending order, like [0, 1, 3, 5] when num_sen = 6

Return ~textflint.input_layer.component.sample.CorefSample

a CorefPartSample of a conll-part

part_before_conll(sen_idx)[source]

Only sentences [0, sen_idx) will be kept, and all the structures of clusters are kept for convenience of concat.

Parameters

sen_idx (int) – sentences with idx < sen_idx will be preserved

Return ~textflint.input_layer.component.sample.CorefSample

a CorefPartSample of a conll-part

part_after_conll(sen_idx)[source]

Only sentences [sen_idx:] will be kept, and all the structures of clusters are kept for convenience of concat.

Parameters

sen_idx (int) – sentences with idx < sen_idx will be preserved

Return ~textflint.input_layer.component.sample.CorefSample

a CorefPartSample of a conll-part

class textflint.input_layer.component.sample.coref_sample.CorefPartSample(data, origin=None, sample_id=None)[source]

Bases: textflint.input_layer.component.sample.coref_sample.CorefSample

Coref Part Sample: corresponed to a part of a Coref Sample

check_data(data)[source]

Check if data is a conll-part. The condition is looser than conll

Parameters

data (None|dict) – Must have key: sentences, clusters May have key: doc_key, speakers, constituents, ner

Returns

remove_invalid_corefs_from_part()[source]

conll parts may contain clusters that has only 0 or 1 span, which is not a valid one.

This function remove these invalid clusters from self.clusters.

Return ~textflint.input_layer.component.sample.CorefSample

a CorefSample that passes check_data

static concat_conll_parts(*args)[source]

concat conll parts

Param

many CorefPartSamples elements in which are assumed to be parts from the same conll, generated by part_conll. Merge result is still treated as a conll part, which should be postprocessed by remove_invalid_corefs_from_part to form a valid CorefSample.

Return ~textflint.input_layer.component.sample.CorefPartSample

a CorefPartSample of a conll-part

class textflint.input_layer.component.sample.coref_sample.ListField(field_value, **kwargs)[source]

Bases: textflint.input_layer.component.field.field.Field

A helper class that represents input list values that to be modified.

Operations which modify field_value would generate new Field instance.

__init__(field_value, **kwargs)[source]
Parameters

field_value ([str]) – The list that ListField represents.

replace_at_indices(indices, new_items)[source]

Replace items at indices.

Notice: just support isometric replace.

Parameters
  • indices (list[int|list|slice]) – each index can be int indicate replace single item or their list like [1, 2, 3]. each index can be list like (0,3) indicate replace items from 0 to 3(not included) or their list like [(0, 3), (5,6)] each index can be slice which would be convert to list.

  • new_items (list) – items corresponding indices.

Returns

new field object.

replace_at_index(index, new_items)[source]

Replace item at index.

Parameters
  • index (int|list|slice) –

    can be int indicate replace single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items from 0

    to 3(not included) or their list like [(0, 3), (5,6)]

    can be slice which would be convert to list.

  • new_items (list) – items corresponding index.

Returns

new field object.

delete_at_indices(indices)[source]

Delete items at indices.

Parameters

indices (list[int|list|slice]) – each index can be int indicate delete single item or their list like [1, 2, 3]. each index can be list like (0,3) indicate replace items from 0 to 3(not included) or their list like [(0, 3), (5,6)] each index can be slice which would be convert to list.

Returns

new field object.

delete_at_index(index)[source]

Delete item at index.

Parameters

index (int|list|slice) –

can be int indicate delete single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items from 0

to 3(not included) or their list like [(0, 3), (5,6)]

can be slice which would be convert to list.

Returns

new field object.

insert_before_indices(indices, new_items)[source]

Insert items before indices.

Parameters
  • indices (list[int|list|slice]) – each index can be int indicate insert single item or their list like [1, 2, 3]. each index can be list like (0,3) indicate replace items from 0 to 3(not included) or their list like [(0, 3), (5,6)] each index can be slice which would be convert to list.

  • new_items (list) – items corresponding indices.

Returns

new field object.

insert_before_index(index, new_items)[source]

Insert items before index.

Parameters
  • index (int|list|slice) –

    can be int indicate insert single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items from 0

    to 3(not included) or their list like [(0, 3), (5,6)]

    can be slice which would be convert to list.

  • new_items (list) – items corresponding index.

Returns

new field object.

insert_after_indices(indices, new_items)[source]

Insert item after index.

Parameters
  • indices (list[int|list|slice]) – each index can be int indicate insert single item or their list like [1, 2, 3]. each index can be list like (0,3) indicate replace items from 0 to 3(not included) or their list like [(0, 3), (5,6)] each index can be slice which would be convert to list.

  • new_items (list) – items corresponding indices.

Returns

new field object.

insert_after_index(index, new_items)[source]

Insert item after index.

Parameters
  • index (int|list|slice) –

    can be int indicate insert single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items from 0

    to 3(not included) or their list like [(0, 3), (5,6)]

    can be slice which would be convert to list.

  • new_items (list) – items corresponding index

Returns

new field object.

swap_at_index(first_index, second_index)[source]

Swap item between first_index and second_index.

Parameters
  • first_index (int) – index of first item

  • second_index (int) – index of second item

Returns

new field object.

class textflint.input_layer.component.sample.coref_sample.Sample(data, origin=None, sample_id=None)[source]

Bases: abc.ABC

Base Sample class to hold the necessary info and provide atomic operations

text_processor = <textflint.common.preprocess.en_processor.EnProcessor object>
__init__(data, origin=None, sample_id=None)[source]
Parameters
  • data (dict) – The dict obj that contains data info.

  • origin (sample) – original sample obj.

  • sample_id (int) – sampleindex

get_value(field)[source]

Get field value by field_str.

Parameters

field (str) – field name

Returns

field value

get_words(field)[source]

Get tokenized words of given textfield

Parameters

field (str) – field name

Returns

tokenized words

get_text(field)[source]

Get text string of given textfield

Parameters

field (str) – field name

Return string

text

get_mask(field)[source]

Get word masks of given textfield

Parameters

field (str) – field name

Returns

list of mask values

get_sentences(field)[source]

Get split sentences of given textfield

Parameters

field (str) – field name

Returns

list of sentences

get_pos(field)[source]

Get text field pos tags. :param str field: field name :return: pos tag list

get_ner(field)[source]

Get text field ner tags

Parameters

field (str) – field name

Returns

ner tag list

replace_fields(fields, field_values, field_masks=None)[source]

Fully replace multi fields at the same time and return new sample. Notice: Not suggest use this API as it will set mask values of TextField to MODIFIED_MASK.

Parameters
  • fields (list) – field str list

  • field_values (list) – field value list

  • field_masks (list) – indicate mask values, useful for printable text

Returns

Modified Sample

replace_field(field, field_value, field_mask=None)[source]

Fully replace single field and return new sample. Notice: Not suggest use this API as it will set mask values of TextField to MODIFIED_MASK.

Parameters
  • field (str) – field str

  • field_value – field_type

  • field_mask (list) – indicate mask value of field

Returns

Modified Sample

replace_field_at_indices(field, indices, items)[source]

Replace items of multi given scopes of field value at the same time. Stay away from the complex function !!!

Be careful of your input list shape.

Parameters
  • field (str) – field name

  • of int|list|slice indices (list) –

    each index can be int indicate replace single item or their list

    like [1, 2, 3],

    can be list like (0,3) indicate replace items from

    0 to 3(not included),

    can be slice which would be convert to list.

  • items

Returns

Modified Sample

replace_field_at_index(field, index, items)[source]

Replace items of given scope of field value.

Be careful of your input list shape.

Parameters
  • field (str) – field name

  • index (int|list|slice) –

    can be int indicate replace single item or list like [1, 2, 3], can be list like (0,3) indicate replace items

    from 0 to 3(not included),

    can be slice which would be convert to list.

  • items (str|list) – shape: indices_num, correspond to field_sub_items

Returns

Modified Sample

unequal_replace_field_at_indices(field, indices, rep_items)[source]

Replace scope items of field value with rep_items which may not equal with scope.

Parameters
  • field – field str

  • indices – list of int/tupe/list

  • rep_items – list

Returns

Modified Sample

delete_field_at_indices(field, indices)[source]

Delete items of given scopes of field value.

Parameters
  • field (str) – field name

  • of int|list|slice indices (list) –

    shape:indices_num each index can be int indicate delete single item or their list

    like [1, 2, 3],

    can be list like (0,3) indicate replace items

    from 0 to 3(not included),

    can be slice which would be convert to list.

Returns

Modified Sample

delete_field_at_index(field, index)[source]

Delete items of given scopes of field value.

Parameters
  • field (str) – field value

  • index (int|list|slice) –

    can be int indicate delete single item or their list like [1, 2, 3], can be list like (0,3) indicate replace items

    from 0 to 3(not included),

    can be slice which would be convert to list.

Returns

Modified Sample

insert_field_before_indices(field, indices, items)[source]

Insert items of multi given scopes before indices of field value at the same time.

Stay away from the complex function !!! Be careful of your input list shape.

Parameters
  • field (str) – field name

  • indices – list of int, shape:indices_num, list like [1, 2, 3]

  • items – list of str/list, shape: indices_num, correspond to indices

Returns

Modified Sample

insert_field_before_index(field, index, items)[source]

Insert items of multi given scope before index of field value.

Parameters
  • field (str) – field name

  • index (int) – indicate which index to insert items

  • items (str|list) – items to insert

Returns

Modified Sample

insert_field_after_indices(field, indices, items)[source]

Insert items of multi given scopes after indices of field value at the same time.

Stay away from the complex function !!! Be careful of your input list shape.

Parameters
  • field (str) – field name

  • indices – list of int, shape:indices_num, like [1, 2, 3]

  • items – list of str/list shape: indices_num, correspond to indices

Returns

Modified Sample

insert_field_after_index(field, index, items)[source]

Insert items of multi given scope after index of field value

Parameters
  • field (str) – field name

  • index (int) – indicate where to apply insert

  • items (str|list) – shape: indices_num, correspond to field_sub_items

Returns

Modified Sample

swap_field_at_index(field, first_index, second_index)[source]

Swap items between first_index and second_index of field value.

Parameters
  • field (str) – field name

  • first_index (int) –

  • second_index (int) –

Returns

Modified Sample

abstract check_data(data)[source]

Check rare data format

Parameters

data – rare data input

Returns

abstract load(data)[source]

Parse data into sample field value.

Parameters

data – rare data input

abstract dump()[source]

Convert sample info to input data json format.

Returns

dict object.

classmethod clone(original_sample)[source]

Deep copy self to a new sample

Parameters

original_sample – sample to be copied

Returns

Sample instance

property is_origin

Return whether the sample is original Sample.

class textflint.input_layer.component.sample.coref_sample.TextField(field_value, mask=None, is_one_sent=False, split_by_space=False, **kwargs)[source]

Bases: textflint.input_layer.component.field.field.Field

A helper class that represents input string that to be modified.

Text that Sample contains parsed in data set, TextField provides multiple methods for Sample to modify.

Support sentence level and word level modification, default using word level API.

text_processor = <textflint.common.preprocess.en_processor.EnProcessor object>
__init__(field_value, mask=None, is_one_sent=False, split_by_space=False, **kwargs)[source]
Parameters
  • field_value (str|list) – Sentence string or tokenized words.

  • mask (list) – list of mask values

  • is_one_sent (bool) – whether input is a sentence

  • split_by_space (boo) – whether tokenize sentence by split space

  • kwargs

pos_of_word_index(desired_word_idx)[source]

Get pos tag of given index.

Parameters

desired_word_idx (int) – desire index to get pos tag

Returns

pos tag of word of desired_word_idx.

replace_at_indices(indices, new_items)[source]

Replace words at indices and set their mask to MODIFIED_MASK.

Parameters
  • indices ([int|listslice]) –

    each index can be int indicate replace single item

    or their list like [1, 2, 3].

    each index can be list like (0,3) indicate replace items

    from 0 to 3(not included) or their list like [(0, 3), (5,6)]

    each index can be slice which would be convert to list.

  • new_items ([str|list|tuple]) – items corresponding indices.

Returns

Replaced TextField object.

replace_at_index(index, new_items)[source]

Replace words at indices and set their mask to MODIFIED_MASK.

Parameters
  • index (intlistslice) –

    can be int indicate replace single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items

    from 0 to 3(not included) or their list like [(0, 3), (5,6)]

    can be slice which would be convert to list.

  • new_items (str|listtuple) – items corresponding index.

Returns

Replaced TextField object.

delete_at_indices(indices)[source]

Delete words at indices and remove their mask value.

Parameters

indices ([int|list|slice]) –

each index can be int indicate replace single item

or their list like [1, 2, 3].

each index can be list like (0,3) indicate replace items

from 0 to 3(not included) or their list like [(0, 3), (5,6)]

each index can be slice which would be convert to list.

Returns

Modified TextField object.

delete_at_index(index)[source]

Delete words at index and remove their mask value.

Parameters

index (int|list|slice) –

can be int indicate replace single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items

from 0 to 3(not included) or their list like [(0, 3), (5,6)]

can be slice which would be convert to list.

Returns

Modified TextField object.

insert_before_indices(indices, new_items)[source]

Insert words before indices.

Parameters
  • indices ([int]) –

    can be int indicate replace single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items

    from 0 to 3(not included) or their list like [(0, 3), (5,6)]

    can be slice which would be convert to list.

  • new_items ([str|list|tuple]) – items corresponding index.

Returns

new TextField object.

insert_before_index(index, new_items)[source]

Insert words before index and remove their mask value.

Parameters
  • index (int) –

    can be int indicate replace single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items

    from 0 to 3(not included) or their list like [(0, 3), (5,6)]

    can be slice which would be convert to list.

  • new_items (str|list|tuple) – items corresponding index.

Returns

new TextField object.

insert_after_indices(indices, new_items)[source]

Insert words after indices.

Parameters
  • indices ([int]) –

    can be int indicate replace single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items

    from 0 to 3(not included) or their list like [(0, 3), (5,6)]

    can be slice which would be convert to list.

  • new_items ([str|list|tuple]) – items corresponding index.

Returns

new TextField object.

insert_after_index(index, new_items)[source]

Insert words before index and remove their mask value.

Parameters
  • index (int) –

    can be int indicate replace single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items

    from 0 to 3(not included) or their list like [(0, 3), (5,6)]

    can be slice which would be convert to list.

  • new_items (str|list|tuple) – items corresponding index.

Returns

new TextField object.

swap_at_index(first_index, second_index)[source]

Swap items between first_index and second_index of origin_list

Parameters
  • first_index (int) – index of first item

  • second_index (int) – index of second item

Returns

Modified TextField object.

property pos_tagging

Get POS tags.

Example:

given sentence 'All things in their being are good for something.'

>> [('All', 'DT'),
    ('things', 'NNS'),
    ('in', 'IN'),
    ('their', 'PRP$'),
    ('being', 'VBG'),
    ('are', 'VBP'),
    ('good', 'JJ'),
    ('for', 'IN'),
    ('something', 'NN'),
    ('.', '.')]
Returns

Tokenized tokens with their POS tags.

property ner

Get NER tags.

Example:

given sentence 'Lionel Messi is a football player from Argentina.'

>>[('Lionel Messi', 0, 2, 'PERSON'),
   ('Argentina', 7, 8, 'LOCATION')]
Returns

A list of tuples, (entity, start, end, label)

property dependency_parsing

Dependency parsing.

Example:

given sentence: 'The quick brown fox jumps over the lazy dog.'

>>
    The     DT      4       det
    quick   JJ      4       amod
    brown   JJ      4       amod
    fox     NN      5       nsubj
    jumps   VBZ     0       root
    over    IN      9       case
    the     DT      9       det
    lazy    JJ      9       amod
    dog     NN      5       obl
Returns

A list of tuples, (token, pos, target, type)

textflint.input_layer.component.sample.coref_sample.concat(xss)[source]

Concat list of list to be a list. Usage:

concat([[1, 2], [2, 3]]) == [1, 2, 2, 3]
Parameters

xss (list) – the list to be concat

textflint.input_layer.component.sample.coref_sample.recur_ap(f, ls)[source]

Apply f to every elem in ls (a nested list) recursively. Usages:

recur_ap(lambda x: x+2, 1) = 3
recur_ap(lambda x: x+2, [2, [3, 4]]) = [4, [5, 6]]
Parameters
  • f (FunctionType) – the function to be applied to ls

  • ls – the value or the nested list to be processed

Returns

process result

textflint.input_layer.component.sample.coref_sample.shift_collector(shifts)[source]
Collect and compose shift`s to a general `shift, to be applied to

each span (or sth else).

Parameters

shifts (list) – the shift functions

Return ~types.FunctionType

the collected shift function

textflint.input_layer.component.sample.coref_sample.shift_decor(shift_func)[source]
Make shift error-free on non-int values. Decorated shift keeps non-int

values original.

Parameters

FunctionType – a shift function that only processes int values

Return ~types.FunctionType

a shift function that processes all types of values

textflint.input_layer.component.sample.coref_sample.shift_maker(sign_idx, shf)[source]
Makes shift, which is a basic shift function to be composed.

shift: if idx >= sign_idx, right shift shf for the idx.

Parameters
  • sign_idx (int) – word after this idx should shift

  • shf (int) – word shift