textflint.input_layer.component.sample.cws_sample

CWS Sample Class

class textflint.input_layer.component.sample.cws_sample.CWSSample(data, origin=None, sample_id=None)[source]

Bases: textflint.input_layer.component.sample.sample.Sample

Our segmentation rules are based on ctb6.

the input x can be a list or a sentence the input y is segmentation label include:B,M,E,S the y also can automatic generation,if you want automatic generation

you must input an empty list and x must each word in x is separated by a space or split into each element of the list

Note that punctuation should be separated into a single word

Example:

1. input {'x':'小明好想送Jo圣诞礼物', 'y' = ['B', 'E', 'B', 'E', 'S', 'B',
    'E', 'B', 'E', 'B', 'E']}
2. input {'x':['小明','好想送Jo圣诞礼物'], 'y' = ['B', 'E', 'B', 'E', 'S',
    'B', 'E', 'B', 'E', 'B', 'E']}
3. input {'x':'小明 好想 送 Jo 圣诞 礼物', 'y' = []}
4. input {'x':['小明', '好想', '送', 'Jo', '圣诞', '礼物'], 'y' = []}
__init__(data, origin=None, sample_id=None)[source]
Parameters
  • data (dict) – The dict obj that contains data info

  • sample_id (int) – the id of sample

  • origin (bool) – if the sample is origin

check_data(data)[source]

Check the whether the data legitimate but we don’t check that the label is correct if the data is not legal but acceptable format, change the format of data

Parameters

data (dict) – The dict obj that contains data info

load(data)[source]

Convert data dict which contains essential information to CWSSample.

Parameters

data (dict) – The dict obj that contains data info

get_words()[source]

Get the words from the sentence.

Return list

the words in sentence

replace_at_ranges(indices, new_items, y_new_items=None)[source]

Replace words at indices and set their mask to MODIFIED_MASK.

Parameters
  • indices (list) – The list of the pos need to be changed.

  • new_items (list) – The list of the item need to be changed.

  • y_new_items (list) – The list of the mask info need to be changed.

Returns

replaced CWSSample object.

update(x, y)[source]

Replace words at indices and set their mask to MODIFIED_MASK.

Parameters
  • x (str) – the new sentence.

  • y (list) – the new labels.

Returns

new CWSSample object.

check(indices, new_items, y_new_items=None)[source]

Check whether the position of change is legal.

Parameters
  • indices (list) – The list of the pos need to be changed.

  • new_items (list) – The list of the item need to be changed.

  • y_new_items (list) – The list of the mask info need to be changed.

Return three list

legal position, change items, change labels.

static get_labels(words)[source]

Get the label of the word.

Parameters

words (str) – The word you want to get labels.

Return list

the label of the words.

class textflint.input_layer.component.sample.cws_sample.CnTextField(field_value, mask=None)[source]

Bases: textflint.input_layer.component.field.field.Field

A helper class that represents input string that to be modified.

Parameters
  • or list field_value (str) – the value of the field.

  • mask (int) – mask label.

cn_processor = <textflint.common.preprocess.cn_processor.CnProcessor object>
ner()[source]

ner fiction

Returns

ner tags

pos_tags()[source]

pos tags fiction

Returns

ner tags

class textflint.input_layer.component.sample.cws_sample.ListField(field_value, **kwargs)[source]

Bases: textflint.input_layer.component.field.field.Field

A helper class that represents input list values that to be modified.

Operations which modify field_value would generate new Field instance.

__init__(field_value, **kwargs)[source]
Parameters

field_value ([str]) – The list that ListField represents.

replace_at_indices(indices, new_items)[source]

Replace items at indices.

Notice: just support isometric replace.

Parameters
  • indices (list[int|list|slice]) – each index can be int indicate replace single item or their list like [1, 2, 3]. each index can be list like (0,3) indicate replace items from 0 to 3(not included) or their list like [(0, 3), (5,6)] each index can be slice which would be convert to list.

  • new_items (list) – items corresponding indices.

Returns

new field object.

replace_at_index(index, new_items)[source]

Replace item at index.

Parameters
  • index (int|list|slice) –

    can be int indicate replace single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items from 0

    to 3(not included) or their list like [(0, 3), (5,6)]

    can be slice which would be convert to list.

  • new_items (list) – items corresponding index.

Returns

new field object.

delete_at_indices(indices)[source]

Delete items at indices.

Parameters

indices (list[int|list|slice]) – each index can be int indicate delete single item or their list like [1, 2, 3]. each index can be list like (0,3) indicate replace items from 0 to 3(not included) or their list like [(0, 3), (5,6)] each index can be slice which would be convert to list.

Returns

new field object.

delete_at_index(index)[source]

Delete item at index.

Parameters

index (int|list|slice) –

can be int indicate delete single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items from 0

to 3(not included) or their list like [(0, 3), (5,6)]

can be slice which would be convert to list.

Returns

new field object.

insert_before_indices(indices, new_items)[source]

Insert items before indices.

Parameters
  • indices (list[int|list|slice]) – each index can be int indicate insert single item or their list like [1, 2, 3]. each index can be list like (0,3) indicate replace items from 0 to 3(not included) or their list like [(0, 3), (5,6)] each index can be slice which would be convert to list.

  • new_items (list) – items corresponding indices.

Returns

new field object.

insert_before_index(index, new_items)[source]

Insert items before index.

Parameters
  • index (int|list|slice) –

    can be int indicate insert single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items from 0

    to 3(not included) or their list like [(0, 3), (5,6)]

    can be slice which would be convert to list.

  • new_items (list) – items corresponding index.

Returns

new field object.

insert_after_indices(indices, new_items)[source]

Insert item after index.

Parameters
  • indices (list[int|list|slice]) – each index can be int indicate insert single item or their list like [1, 2, 3]. each index can be list like (0,3) indicate replace items from 0 to 3(not included) or their list like [(0, 3), (5,6)] each index can be slice which would be convert to list.

  • new_items (list) – items corresponding indices.

Returns

new field object.

insert_after_index(index, new_items)[source]

Insert item after index.

Parameters
  • index (int|list|slice) –

    can be int indicate insert single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items from 0

    to 3(not included) or their list like [(0, 3), (5,6)]

    can be slice which would be convert to list.

  • new_items (list) – items corresponding index

Returns

new field object.

swap_at_index(first_index, second_index)[source]

Swap item between first_index and second_index.

Parameters
  • first_index (int) – index of first item

  • second_index (int) – index of second item

Returns

new field object.

class textflint.input_layer.component.sample.cws_sample.Sample(data, origin=None, sample_id=None)[source]

Bases: abc.ABC

Base Sample class to hold the necessary info and provide atomic operations

text_processor = <textflint.common.preprocess.en_processor.EnProcessor object>
__init__(data, origin=None, sample_id=None)[source]
Parameters
  • data (dict) – The dict obj that contains data info.

  • origin (sample) – original sample obj.

  • sample_id (int) – sampleindex

get_value(field)[source]

Get field value by field_str.

Parameters

field (str) – field name

Returns

field value

get_words(field)[source]

Get tokenized words of given textfield

Parameters

field (str) – field name

Returns

tokenized words

get_text(field)[source]

Get text string of given textfield

Parameters

field (str) – field name

Return string

text

get_mask(field)[source]

Get word masks of given textfield

Parameters

field (str) – field name

Returns

list of mask values

get_sentences(field)[source]

Get split sentences of given textfield

Parameters

field (str) – field name

Returns

list of sentences

get_pos(field)[source]

Get text field pos tags. :param str field: field name :return: pos tag list

get_ner(field)[source]

Get text field ner tags

Parameters

field (str) – field name

Returns

ner tag list

replace_fields(fields, field_values, field_masks=None)[source]

Fully replace multi fields at the same time and return new sample. Notice: Not suggest use this API as it will set mask values of TextField to MODIFIED_MASK.

Parameters
  • fields (list) – field str list

  • field_values (list) – field value list

  • field_masks (list) – indicate mask values, useful for printable text

Returns

Modified Sample

replace_field(field, field_value, field_mask=None)[source]

Fully replace single field and return new sample. Notice: Not suggest use this API as it will set mask values of TextField to MODIFIED_MASK.

Parameters
  • field (str) – field str

  • field_value – field_type

  • field_mask (list) – indicate mask value of field

Returns

Modified Sample

replace_field_at_indices(field, indices, items)[source]

Replace items of multi given scopes of field value at the same time. Stay away from the complex function !!!

Be careful of your input list shape.

Parameters
  • field (str) – field name

  • of int|list|slice indices (list) –

    each index can be int indicate replace single item or their list

    like [1, 2, 3],

    can be list like (0,3) indicate replace items from

    0 to 3(not included),

    can be slice which would be convert to list.

  • items

Returns

Modified Sample

replace_field_at_index(field, index, items)[source]

Replace items of given scope of field value.

Be careful of your input list shape.

Parameters
  • field (str) – field name

  • index (int|list|slice) –

    can be int indicate replace single item or list like [1, 2, 3], can be list like (0,3) indicate replace items

    from 0 to 3(not included),

    can be slice which would be convert to list.

  • items (str|list) – shape: indices_num, correspond to field_sub_items

Returns

Modified Sample

unequal_replace_field_at_indices(field, indices, rep_items)[source]

Replace scope items of field value with rep_items which may not equal with scope.

Parameters
  • field – field str

  • indices – list of int/tupe/list

  • rep_items – list

Returns

Modified Sample

delete_field_at_indices(field, indices)[source]

Delete items of given scopes of field value.

Parameters
  • field (str) – field name

  • of int|list|slice indices (list) –

    shape:indices_num each index can be int indicate delete single item or their list

    like [1, 2, 3],

    can be list like (0,3) indicate replace items

    from 0 to 3(not included),

    can be slice which would be convert to list.

Returns

Modified Sample

delete_field_at_index(field, index)[source]

Delete items of given scopes of field value.

Parameters
  • field (str) – field value

  • index (int|list|slice) –

    can be int indicate delete single item or their list like [1, 2, 3], can be list like (0,3) indicate replace items

    from 0 to 3(not included),

    can be slice which would be convert to list.

Returns

Modified Sample

insert_field_before_indices(field, indices, items)[source]

Insert items of multi given scopes before indices of field value at the same time.

Stay away from the complex function !!! Be careful of your input list shape.

Parameters
  • field (str) – field name

  • indices – list of int, shape:indices_num, list like [1, 2, 3]

  • items – list of str/list, shape: indices_num, correspond to indices

Returns

Modified Sample

insert_field_before_index(field, index, items)[source]

Insert items of multi given scope before index of field value.

Parameters
  • field (str) – field name

  • index (int) – indicate which index to insert items

  • items (str|list) – items to insert

Returns

Modified Sample

insert_field_after_indices(field, indices, items)[source]

Insert items of multi given scopes after indices of field value at the same time.

Stay away from the complex function !!! Be careful of your input list shape.

Parameters
  • field (str) – field name

  • indices – list of int, shape:indices_num, like [1, 2, 3]

  • items – list of str/list shape: indices_num, correspond to indices

Returns

Modified Sample

insert_field_after_index(field, index, items)[source]

Insert items of multi given scope after index of field value

Parameters
  • field (str) – field name

  • index (int) – indicate where to apply insert

  • items (str|list) – shape: indices_num, correspond to field_sub_items

Returns

Modified Sample

swap_field_at_index(field, first_index, second_index)[source]

Swap items between first_index and second_index of field value.

Parameters
  • field (str) – field name

  • first_index (int) –

  • second_index (int) –

Returns

Modified Sample

abstract check_data(data)[source]

Check rare data format

Parameters

data – rare data input

Returns

abstract load(data)[source]

Parse data into sample field value.

Parameters

data – rare data input

abstract dump()[source]

Convert sample info to input data json format.

Returns

dict object.

classmethod clone(original_sample)[source]

Deep copy self to a new sample

Parameters

original_sample – sample to be copied

Returns

Sample instance

property is_origin

Return whether the sample is original Sample.

textflint.input_layer.component.sample.cws_sample.delete_at_scope(origin_list, scope)[source]

Delete items of origin_list of given scope.

Parameters
  • origin_list (list) –

  • scope (int|list|tuple|slice) –

    can be int indicate replace single item or their list like [1, 2, 3]. can be list like (0,3) indicate replace items from 0 to 3(not included)

    or their list like [5,6]

    can be slice which would be convert to list or their list.

Returns

textflint.input_layer.component.sample.cws_sample.delete_at_scopes(origin_list, scopes)[source]

Delete items of origin_list of given scopes.

Parameters
  • origin_list (list) –

  • scopes (list) –

    list of int/list/tuple/slice can be int indicate replace single item or their list like [1, 2, 3]. can be list like (0,3) indicate replace items from 0 to 3(not included)

    or their list like [5,6]

    can be slice which would be convert to list or their list.

Return list

new list

textflint.input_layer.component.sample.cws_sample.descartes(calculation_items, n)[source]
Parameters
  • calculation_items (list) –

  • n (int) – quantity to select

Return list

list items which we random choice from Cartesian product.

textflint.input_layer.component.sample.cws_sample.get_align_seq(align_items, value)[source]

Get values which shape align with align items.

Parameters
  • align_items (list) –

  • value (str) –

Return list

list which align with align_items.

textflint.input_layer.component.sample.cws_sample.handle_empty_insertion(new_items)[source]

Handle inserting new items to an empty list, by concatenating all new items Warning if multiple items are fed.

Parameters

new_items (list) – list

Return list

new list

textflint.input_layer.component.sample.cws_sample.insert_after_index(origin_list, index, new_items)[source]

Insert items to origin_list after given index.

Parameters
  • origin_list (list) –

  • index (int) –

  • new_items (list) –

Returns

textflint.input_layer.component.sample.cws_sample.insert_after_indices(origin_list, indices, new_items)[source]

Insert items to origin_list after given indices.

Parameters
  • origin_list (list) –

  • indices (list) –

  • new_items (list) –

Return list

textflint.input_layer.component.sample.cws_sample.insert_before_index(origin_list, index, new_items)[source]

Insert items to origin_list before given index.

Parameters
  • origin_list (list) –

  • index (int) –

  • new_items (list) –

Return list

textflint.input_layer.component.sample.cws_sample.insert_before_indices(origin_list, indices, new_items)[source]

Insert items to origin_list before given indices.

Parameters
  • origin_list (list) –

  • indices (list) –

  • new_items (list) –

Return list

textflint.input_layer.component.sample.cws_sample.normalize_scope(scope)[source]

Convert various scope input to list format of [left_bound, right_bound]

Parameters

scope (int|list|tuple|slice) – can be int indicate replace single item like 1 or 3. can be list like (0,3) indicate replace items from 0 to 3(not included) or their list like [5,6] can be slice which would be convert to list or their list.

Return list

[left_bound, right_bound]

textflint.input_layer.component.sample.cws_sample.replace_at_scope(origin_list, scope, new_items)[source]

Replace items of given list instance.

Parameters
  • origin_list (list) –

  • scope (int|list|slice) –

    can be int indicate replace single item or their list like 1. can be list like (0,3) indicate replace items from 0 to 3(not included)

    or their list like [0, 3]

    can be slice which would be convert to list or their list.

  • new_items – list

Return list

new list

textflint.input_layer.component.sample.cws_sample.replace_at_scopes(origin_list, scopes, new_items)[source]

Replace items of given list. Notice: just support isometric replace.

Parameters
  • origin_list (list) –

  • scopes (list) –

    list of int/list/slice can be int indicate replace single item or their list like [1, 2, 3]. can be list like (0,3) indicate replace items from 0 to 3(not included)

    or their list like [(0, 3), (5,6)]

    can be slice which would be convert to list or their list. Watch out! Each range must be the same type!

  • new_items (list) – items corresponding scopes.

Returns

textflint.input_layer.component.sample.cws_sample.swap_at_index(origin_list, first_index, second_index)[source]

Swap items between first_index and second_index of origin_list

Parameters
  • origin_list (list) –

  • first_index (int) –

  • second_index (int) –

Return list

textflint.input_layer.component.sample.cws_sample.trade_off_sub_words(sub_words, sub_indices, trans_num=None, n=1)[source]

Select proper candidate words to maximum number of transform result. Select words of top n substitutes words number.

Parameters
  • sub_words (list) – list of substitutes word of each legal word

  • sub_indices (list) – list of indices of each legal word

  • trans_num (int) – max number of words to apply substitution

  • n (int) –

Returns

sub_words after alignment + indices of sub_words

textflint.input_layer.component.sample.cws_sample.unequal_replace_at_scopes(origin_list, scopes, new_items)[source]

Replace items of given list.

Notice: support unequal replace. :param list origin_list: :param list scopes: list of int/list/slice

can be int indicate replace single item or their list like [1, 2, 3]. can be list like (0,3) indicate replace items from 0 to 3(not included)

or their list like [(0, 3), (5,6)]

can be slice which would be convert to list or their list.

Parameters

new_items

:return list :