textflint.input_layer.component.field.text_field

Text Field Class

A helper class that represents input string that to be modified.

class textflint.input_layer.component.field.text_field.TextField(field_value, mask=None, is_one_sent=False, split_by_space=False, **kwargs)[source]

Bases: textflint.input_layer.component.field.field.Field

A helper class that represents input string that to be modified.

Text that Sample contains parsed in data set, TextField provides multiple methods for Sample to modify.

Support sentence level and word level modification, default using word level API.

text_processor = <textflint.common.preprocess.en_processor.EnProcessor object>
__init__(field_value, mask=None, is_one_sent=False, split_by_space=False, **kwargs)[source]
Parameters
  • field_value (str|list) – Sentence string or tokenized words.

  • mask (list) – list of mask values

  • is_one_sent (bool) – whether input is a sentence

  • split_by_space (boo) – whether tokenize sentence by split space

  • kwargs

pos_of_word_index(desired_word_idx)[source]

Get pos tag of given index.

Parameters

desired_word_idx (int) – desire index to get pos tag

Returns

pos tag of word of desired_word_idx.

replace_at_indices(indices, new_items)[source]

Replace words at indices and set their mask to MODIFIED_MASK.

Parameters
  • indices ([int|listslice]) –

    each index can be int indicate replace single item

    or their list like [1, 2, 3].

    each index can be list like (0,3) indicate replace items

    from 0 to 3(not included) or their list like [(0, 3), (5,6)]

    each index can be slice which would be convert to list.

  • new_items ([str|list|tuple]) – items corresponding indices.

Returns

Replaced TextField object.

replace_at_index(index, new_items)[source]

Replace words at indices and set their mask to MODIFIED_MASK.

Parameters
  • index (intlistslice) –

    can be int indicate replace single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items

    from 0 to 3(not included) or their list like [(0, 3), (5,6)]

    can be slice which would be convert to list.

  • new_items (str|listtuple) – items corresponding index.

Returns

Replaced TextField object.

delete_at_indices(indices)[source]

Delete words at indices and remove their mask value.

Parameters

indices ([int|list|slice]) –

each index can be int indicate replace single item

or their list like [1, 2, 3].

each index can be list like (0,3) indicate replace items

from 0 to 3(not included) or their list like [(0, 3), (5,6)]

each index can be slice which would be convert to list.

Returns

Modified TextField object.

delete_at_index(index)[source]

Delete words at index and remove their mask value.

Parameters

index (int|list|slice) –

can be int indicate replace single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items

from 0 to 3(not included) or their list like [(0, 3), (5,6)]

can be slice which would be convert to list.

Returns

Modified TextField object.

insert_before_indices(indices, new_items)[source]

Insert words before indices.

Parameters
  • indices ([int]) –

    can be int indicate replace single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items

    from 0 to 3(not included) or their list like [(0, 3), (5,6)]

    can be slice which would be convert to list.

  • new_items ([str|list|tuple]) – items corresponding index.

Returns

new TextField object.

insert_before_index(index, new_items)[source]

Insert words before index and remove their mask value.

Parameters
  • index (int) –

    can be int indicate replace single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items

    from 0 to 3(not included) or their list like [(0, 3), (5,6)]

    can be slice which would be convert to list.

  • new_items (str|list|tuple) – items corresponding index.

Returns

new TextField object.

insert_after_indices(indices, new_items)[source]

Insert words after indices.

Parameters
  • indices ([int]) –

    can be int indicate replace single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items

    from 0 to 3(not included) or their list like [(0, 3), (5,6)]

    can be slice which would be convert to list.

  • new_items ([str|list|tuple]) – items corresponding index.

Returns

new TextField object.

insert_after_index(index, new_items)[source]

Insert words before index and remove their mask value.

Parameters
  • index (int) –

    can be int indicate replace single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items

    from 0 to 3(not included) or their list like [(0, 3), (5,6)]

    can be slice which would be convert to list.

  • new_items (str|list|tuple) – items corresponding index.

Returns

new TextField object.

swap_at_index(first_index, second_index)[source]

Swap items between first_index and second_index of origin_list

Parameters
  • first_index (int) – index of first item

  • second_index (int) – index of second item

Returns

Modified TextField object.

property pos_tagging

Get POS tags.

Example:

given sentence 'All things in their being are good for something.'

>> [('All', 'DT'),
    ('things', 'NNS'),
    ('in', 'IN'),
    ('their', 'PRP$'),
    ('being', 'VBG'),
    ('are', 'VBP'),
    ('good', 'JJ'),
    ('for', 'IN'),
    ('something', 'NN'),
    ('.', '.')]
Returns

Tokenized tokens with their POS tags.

property ner

Get NER tags.

Example:

given sentence 'Lionel Messi is a football player from Argentina.'

>>[('Lionel Messi', 0, 2, 'PERSON'),
   ('Argentina', 7, 8, 'LOCATION')]
Returns

A list of tuples, (entity, start, end, label)

property dependency_parsing

Dependency parsing.

Example:

given sentence: 'The quick brown fox jumps over the lazy dog.'

>>
    The     DT      4       det
    quick   JJ      4       amod
    brown   JJ      4       amod
    fox     NN      5       nsubj
    jumps   VBZ     0       root
    over    IN      9       case
    the     DT      9       det
    lazy    JJ      9       amod
    dog     NN      5       obl
Returns

A list of tuples, (token, pos, target, type)