textflint.input_layer.component.sample.cws_sample¶
CWS Sample Class¶
-
class
textflint.input_layer.component.sample.cws_sample.CWSSample(data, origin=None, sample_id=None)[source]¶ Bases:
textflint.input_layer.component.sample.sample.SampleOur segmentation rules are based on ctb6.
the input x can be a list or a sentence the input y is segmentation label include:B,M,E,S the y also can automatic generation,if you want automatic generation
you must input an empty list and x must each word in x is separated by a space or split into each element of the list
Note that punctuation should be separated into a single word
Example:
1. input {'x':'小明好想送Jo圣诞礼物', 'y' = ['B', 'E', 'B', 'E', 'S', 'B', 'E', 'B', 'E', 'B', 'E']} 2. input {'x':['小明','好想送Jo圣诞礼物'], 'y' = ['B', 'E', 'B', 'E', 'S', 'B', 'E', 'B', 'E', 'B', 'E']} 3. input {'x':'小明 好想 送 Jo 圣诞 礼物', 'y' = []} 4. input {'x':['小明', '好想', '送', 'Jo', '圣诞', '礼物'], 'y' = []}
-
__init__(data, origin=None, sample_id=None)[source]¶ - Parameters
data (dict) – The dict obj that contains data info
sample_id (int) – the id of sample
origin (bool) – if the sample is origin
-
check_data(data)[source]¶ Check the whether the data legitimate but we don’t check that the label is correct if the data is not legal but acceptable format, change the format of data
- Parameters
data (dict) – The dict obj that contains data info
-
load(data)[source]¶ Convert data dict which contains essential information to CWSSample.
- Parameters
data (dict) – The dict obj that contains data info
-
replace_at_ranges(indices, new_items, y_new_items=None)[source]¶ Replace words at indices and set their mask to MODIFIED_MASK.
- Parameters
indices (list) – The list of the pos need to be changed.
new_items (list) – The list of the item need to be changed.
y_new_items (list) – The list of the mask info need to be changed.
- Returns
replaced CWSSample object.
-
update(x, y)[source]¶ Replace words at indices and set their mask to MODIFIED_MASK.
- Parameters
x (str) – the new sentence.
y (list) – the new labels.
- Returns
new CWSSample object.
-
check(indices, new_items, y_new_items=None)[source]¶ Check whether the position of change is legal.
- Parameters
indices (list) – The list of the pos need to be changed.
new_items (list) – The list of the item need to be changed.
y_new_items (list) – The list of the mask info need to be changed.
- Return three list
legal position, change items, change labels.
-
-
class
textflint.input_layer.component.sample.cws_sample.CnTextField(field_value, mask=None)[source]¶ Bases:
textflint.input_layer.component.field.field.FieldA helper class that represents input string that to be modified.
- Parameters
or list field_value (str) – the value of the field.
mask (int) – mask label.
-
cn_processor= <textflint.common.preprocess.cn_processor.CnProcessor object>¶
pos tags fiction
- Returns
ner tags
-
class
textflint.input_layer.component.sample.cws_sample.ListField(field_value, **kwargs)[source]¶ Bases:
textflint.input_layer.component.field.field.FieldA helper class that represents input list values that to be modified.
Operations which modify field_value would generate new Field instance.
-
__init__(field_value, **kwargs)[source]¶ - Parameters
field_value ([str]) – The list that ListField represents.
-
replace_at_indices(indices, new_items)[source]¶ Replace items at indices.
Notice: just support isometric replace.
- Parameters
indices (list[int|list|slice]) – each index can be int indicate replace single item or their list like [1, 2, 3]. each index can be list like (0,3) indicate replace items from 0 to 3(not included) or their list like [(0, 3), (5,6)] each index can be slice which would be convert to list.
new_items (list) – items corresponding indices.
- Returns
new field object.
-
replace_at_index(index, new_items)[source]¶ Replace item at index.
- Parameters
index (int|list|slice) –
can be int indicate replace single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items from 0
to 3(not included) or their list like [(0, 3), (5,6)]
can be slice which would be convert to list.
new_items (list) – items corresponding index.
- Returns
new field object.
-
delete_at_indices(indices)[source]¶ Delete items at indices.
- Parameters
indices (list[int|list|slice]) – each index can be int indicate delete single item or their list like [1, 2, 3]. each index can be list like (0,3) indicate replace items from 0 to 3(not included) or their list like [(0, 3), (5,6)] each index can be slice which would be convert to list.
- Returns
new field object.
-
delete_at_index(index)[source]¶ Delete item at index.
- Parameters
index (int|list|slice) –
can be int indicate delete single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items from 0
to 3(not included) or their list like [(0, 3), (5,6)]
can be slice which would be convert to list.
- Returns
new field object.
-
insert_before_indices(indices, new_items)[source]¶ Insert items before indices.
- Parameters
indices (list[int|list|slice]) – each index can be int indicate insert single item or their list like [1, 2, 3]. each index can be list like (0,3) indicate replace items from 0 to 3(not included) or their list like [(0, 3), (5,6)] each index can be slice which would be convert to list.
new_items (list) – items corresponding indices.
- Returns
new field object.
-
insert_before_index(index, new_items)[source]¶ Insert items before index.
- Parameters
index (int|list|slice) –
can be int indicate insert single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items from 0
to 3(not included) or their list like [(0, 3), (5,6)]
can be slice which would be convert to list.
new_items (list) – items corresponding index.
- Returns
new field object.
-
insert_after_indices(indices, new_items)[source]¶ Insert item after index.
- Parameters
indices (list[int|list|slice]) – each index can be int indicate insert single item or their list like [1, 2, 3]. each index can be list like (0,3) indicate replace items from 0 to 3(not included) or their list like [(0, 3), (5,6)] each index can be slice which would be convert to list.
new_items (list) – items corresponding indices.
- Returns
new field object.
-
insert_after_index(index, new_items)[source]¶ Insert item after index.
- Parameters
index (int|list|slice) –
can be int indicate insert single item or their list like [1, 2, 3] can be list like (0,3) indicate replace items from 0
to 3(not included) or their list like [(0, 3), (5,6)]
can be slice which would be convert to list.
new_items (list) – items corresponding index
- Returns
new field object.
-
-
class
textflint.input_layer.component.sample.cws_sample.Sample(data, origin=None, sample_id=None)[source]¶ Bases:
abc.ABCBase Sample class to hold the necessary info and provide atomic operations
-
text_processor= <textflint.common.preprocess.en_processor.EnProcessor object>¶
-
__init__(data, origin=None, sample_id=None)[source]¶ - Parameters
data (dict) – The dict obj that contains data info.
origin (sample) – original sample obj.
sample_id (int) – sampleindex
-
get_value(field)[source]¶ Get field value by field_str.
- Parameters
field (str) – field name
- Returns
field value
-
get_words(field)[source]¶ Get tokenized words of given textfield
- Parameters
field (str) – field name
- Returns
tokenized words
-
get_text(field)[source]¶ Get text string of given textfield
- Parameters
field (str) – field name
- Return string
text
-
get_mask(field)[source]¶ Get word masks of given textfield
- Parameters
field (str) – field name
- Returns
list of mask values
-
get_sentences(field)[source]¶ Get split sentences of given textfield
- Parameters
field (str) – field name
- Returns
list of sentences
-
get_ner(field)[source]¶ Get text field ner tags
- Parameters
field (str) – field name
- Returns
ner tag list
-
replace_fields(fields, field_values, field_masks=None)[source]¶ Fully replace multi fields at the same time and return new sample. Notice: Not suggest use this API as it will set mask values of TextField to MODIFIED_MASK.
- Parameters
fields (list) – field str list
field_values (list) – field value list
field_masks (list) – indicate mask values, useful for printable text
- Returns
Modified Sample
-
replace_field(field, field_value, field_mask=None)[source]¶ Fully replace single field and return new sample. Notice: Not suggest use this API as it will set mask values of TextField to MODIFIED_MASK.
- Parameters
field (str) – field str
field_value – field_type
field_mask (list) – indicate mask value of field
- Returns
Modified Sample
-
replace_field_at_indices(field, indices, items)[source]¶ Replace items of multi given scopes of field value at the same time. Stay away from the complex function !!!
Be careful of your input list shape.
- Parameters
field (str) – field name
of int|list|slice indices (list) –
- each index can be int indicate replace single item or their list
like [1, 2, 3],
- can be list like (0,3) indicate replace items from
0 to 3(not included),
can be slice which would be convert to list.
items –
- Returns
Modified Sample
-
replace_field_at_index(field, index, items)[source]¶ Replace items of given scope of field value.
Be careful of your input list shape.
- Parameters
field (str) – field name
index (int|list|slice) –
can be int indicate replace single item or list like [1, 2, 3], can be list like (0,3) indicate replace items
from 0 to 3(not included),
can be slice which would be convert to list.
items (str|list) – shape: indices_num, correspond to field_sub_items
- Returns
Modified Sample
-
unequal_replace_field_at_indices(field, indices, rep_items)[source]¶ Replace scope items of field value with rep_items which may not equal with scope.
- Parameters
field – field str
indices – list of int/tupe/list
rep_items – list
- Returns
Modified Sample
-
delete_field_at_indices(field, indices)[source]¶ Delete items of given scopes of field value.
- Parameters
field (str) – field name
of int|list|slice indices (list) –
shape:indices_num each index can be int indicate delete single item or their list
like [1, 2, 3],
- can be list like (0,3) indicate replace items
from 0 to 3(not included),
can be slice which would be convert to list.
- Returns
Modified Sample
-
delete_field_at_index(field, index)[source]¶ Delete items of given scopes of field value.
- Parameters
field (str) – field value
index (int|list|slice) –
can be int indicate delete single item or their list like [1, 2, 3], can be list like (0,3) indicate replace items
from 0 to 3(not included),
can be slice which would be convert to list.
- Returns
Modified Sample
-
insert_field_before_indices(field, indices, items)[source]¶ Insert items of multi given scopes before indices of field value at the same time.
Stay away from the complex function !!! Be careful of your input list shape.
- Parameters
field (str) – field name
indices – list of int, shape:indices_num, list like [1, 2, 3]
items – list of str/list, shape: indices_num, correspond to indices
- Returns
Modified Sample
-
insert_field_before_index(field, index, items)[source]¶ Insert items of multi given scope before index of field value.
- Parameters
field (str) – field name
index (int) – indicate which index to insert items
items (str|list) – items to insert
- Returns
Modified Sample
-
insert_field_after_indices(field, indices, items)[source]¶ Insert items of multi given scopes after indices of field value at the same time.
Stay away from the complex function !!! Be careful of your input list shape.
- Parameters
field (str) – field name
indices – list of int, shape:indices_num, like [1, 2, 3]
items – list of str/list shape: indices_num, correspond to indices
- Returns
Modified Sample
-
insert_field_after_index(field, index, items)[source]¶ Insert items of multi given scope after index of field value
- Parameters
field (str) – field name
index (int) – indicate where to apply insert
items (str|list) – shape: indices_num, correspond to field_sub_items
- Returns
Modified Sample
-
swap_field_at_index(field, first_index, second_index)[source]¶ Swap items between first_index and second_index of field value.
- Parameters
field (str) – field name
first_index (int) –
second_index (int) –
- Returns
Modified Sample
-
classmethod
clone(original_sample)[source]¶ Deep copy self to a new sample
- Parameters
original_sample – sample to be copied
- Returns
Sample instance
-
property
is_origin¶ Return whether the sample is original Sample.
-
-
textflint.input_layer.component.sample.cws_sample.delete_at_scope(origin_list, scope)[source]¶ Delete items of origin_list of given scope.
- Parameters
origin_list (list) –
scope (int|list|tuple|slice) –
can be int indicate replace single item or their list like [1, 2, 3]. can be list like (0,3) indicate replace items from 0 to 3(not included)
or their list like [5,6]
can be slice which would be convert to list or their list.
- Returns
-
textflint.input_layer.component.sample.cws_sample.delete_at_scopes(origin_list, scopes)[source]¶ Delete items of origin_list of given scopes.
- Parameters
origin_list (list) –
scopes (list) –
list of int/list/tuple/slice can be int indicate replace single item or their list like [1, 2, 3]. can be list like (0,3) indicate replace items from 0 to 3(not included)
or their list like [5,6]
can be slice which would be convert to list or their list.
- Return list
new list
-
textflint.input_layer.component.sample.cws_sample.descartes(calculation_items, n)[source]¶ - Parameters
calculation_items (list) –
n (int) – quantity to select
- Return list
list items which we random choice from Cartesian product.
-
textflint.input_layer.component.sample.cws_sample.get_align_seq(align_items, value)[source]¶ Get values which shape align with align items.
- Parameters
align_items (list) –
value (str) –
- Return list
list which align with align_items.
-
textflint.input_layer.component.sample.cws_sample.handle_empty_insertion(new_items)[source]¶ Handle inserting new items to an empty list, by concatenating all new items Warning if multiple items are fed.
- Parameters
new_items (list) – list
- Return list
new list
-
textflint.input_layer.component.sample.cws_sample.insert_after_index(origin_list, index, new_items)[source]¶ Insert items to origin_list after given index.
- Parameters
origin_list (list) –
index (int) –
new_items (list) –
- Returns
-
textflint.input_layer.component.sample.cws_sample.insert_after_indices(origin_list, indices, new_items)[source]¶ Insert items to origin_list after given indices.
- Parameters
origin_list (list) –
indices (list) –
new_items (list) –
- Return list
-
textflint.input_layer.component.sample.cws_sample.insert_before_index(origin_list, index, new_items)[source]¶ Insert items to origin_list before given index.
- Parameters
origin_list (list) –
index (int) –
new_items (list) –
- Return list
-
textflint.input_layer.component.sample.cws_sample.insert_before_indices(origin_list, indices, new_items)[source]¶ Insert items to origin_list before given indices.
- Parameters
origin_list (list) –
indices (list) –
new_items (list) –
- Return list
-
textflint.input_layer.component.sample.cws_sample.normalize_scope(scope)[source]¶ Convert various scope input to list format of [left_bound, right_bound]
- Parameters
scope (int|list|tuple|slice) – can be int indicate replace single item like 1 or 3. can be list like (0,3) indicate replace items from 0 to 3(not included) or their list like [5,6] can be slice which would be convert to list or their list.
- Return list
[left_bound, right_bound]
-
textflint.input_layer.component.sample.cws_sample.replace_at_scope(origin_list, scope, new_items)[source]¶ Replace items of given list instance.
- Parameters
origin_list (list) –
scope (int|list|slice) –
can be int indicate replace single item or their list like 1. can be list like (0,3) indicate replace items from 0 to 3(not included)
or their list like [0, 3]
can be slice which would be convert to list or their list.
new_items – list
- Return list
new list
-
textflint.input_layer.component.sample.cws_sample.replace_at_scopes(origin_list, scopes, new_items)[source]¶ Replace items of given list. Notice: just support isometric replace.
- Parameters
origin_list (list) –
scopes (list) –
list of int/list/slice can be int indicate replace single item or their list like [1, 2, 3]. can be list like (0,3) indicate replace items from 0 to 3(not included)
or their list like [(0, 3), (5,6)]
can be slice which would be convert to list or their list. Watch out! Each range must be the same type!
new_items (list) – items corresponding scopes.
- Returns
-
textflint.input_layer.component.sample.cws_sample.swap_at_index(origin_list, first_index, second_index)[source]¶ Swap items between first_index and second_index of origin_list
- Parameters
origin_list (list) –
first_index (int) –
second_index (int) –
- Return list
-
textflint.input_layer.component.sample.cws_sample.trade_off_sub_words(sub_words, sub_indices, trans_num=None, n=1)[source]¶ Select proper candidate words to maximum number of transform result. Select words of top n substitutes words number.
- Parameters
sub_words (list) – list of substitutes word of each legal word
sub_indices (list) – list of indices of each legal word
trans_num (int) – max number of words to apply substitution
n (int) –
- Returns
sub_words after alignment + indices of sub_words
-
textflint.input_layer.component.sample.cws_sample.unequal_replace_at_scopes(origin_list, scopes, new_items)[source]¶ Replace items of given list.
Notice: support unequal replace. :param list origin_list: :param list scopes: list of int/list/slice
can be int indicate replace single item or their list like [1, 2, 3]. can be list like (0,3) indicate replace items from 0 to 3(not included)
or their list like [(0, 3), (5,6)]
can be slice which would be convert to list or their list.
- Parameters
new_items –
:return list :