textflint.generation_layer.transformation.RE.insert_clause

AddClause class for adding entity description transformation

class textflint.generation_layer.transformation.RE.insert_clause.InsertClause(**kwargs)[source]

Bases: textflint.generation_layer.transformation.transformation.Transformation

Add extra entity-related clause to text

search_list(query)[source]

Retrieve entity id from wikidata

Parameters

query (string) – name of query entity

Return list

information of query entity

get_clause(query)[source]

obtain entity description

Parameters

query (string) – name of query entity

Return string

entity description

class textflint.generation_layer.transformation.RE.insert_clause.Client(base_url: str = 'https://www.wikidata.org/', opener: Optional[urllib.request.OpenerDirector] = None, datavalue_decoder: Optional[Union[Decoder, Callable[[Client, str, Mapping[str, object]], object]]] = None, entity_type_guess: bool = True, cache_policy: wikidata.cache.CachePolicy = <wikidata.cache.NullCachePolicy object>, repr_string: Optional[str] = None)[source]

Bases: object

Wikidata client session.

Parameters
  • base_url (str) – The base url of the Wikidata. WIKIDATA_BASE_URL is used by default.

  • opener (urllib.request.OpenerDirector) – The opener for urllib.request. If omitted or None the default opener is used.

  • entity_type_guess (bool) – Whether to guess type of Entity from its id for less HTTP requests. True by default.

  • cache_poliy – A caching policy for API calls. No cache (NullCachePolicy) by default.

New in version 0.5.0: The cache_policy option.

Changed in version 0.3.0: The meaning of base_url parameter changed. It originally meant https://www.wikidata.org/wiki/ which contained the trailing path wiki/, but now it means only https://www.wikidata.org/.

New in version 0.2.0: The entity_type_guess option.

entity_type_guess = True

(bool) Whether to guess type of Entity from its id for less HTTP requests.

New in version 0.2.0.

cache_policy = <wikidata.cache.NullCachePolicy object>

(CachePolicy) A caching policy for API calls.

New in version 0.5.0.

get(entity_id: EntityId, load: bool = False)wikidata.entity.Entity[source]

Get a Wikidata entity by its EntityId.

Parameters
  • entity_id – The id of the Entity to find.

  • load (bool) – Eager loading on True. Lazy loading (False) by default.

Returns

The found entity.

Return type

Entity

New in version 0.3.0: The load option.

guess_entity_type(entity_id: EntityId)Optional[wikidata.entity.EntityType][source]

Guess EntityType from the given EntityId. It could return None when it fails to guess.

Note

It always fails to guess when entity_type_guess is configued to False.

Returns

The guessed EntityId, or None if it fails to guess.

Return type

Optional[EntityType]

New in version 0.2.0.

decode_datavalue(datatype: str, datavalue: Mapping[str, object])object[source]

Decode the given datavalue using the configured datavalue_decoder.

New in version 0.3.0.

exception textflint.generation_layer.transformation.RE.insert_clause.FlintError[source]

Bases: RuntimeError

Default error thrown by textflint functions. FlintError will be raised if you do not give any error type specification,

class textflint.generation_layer.transformation.RE.insert_clause.RESample(data, origin=None, sample_id=None)[source]

Bases: textflint.input_layer.component.sample.sample.Sample

transform and retrieve features of RESample

check_data(data)[source]

check whether type of data is correct

Parameters

data (dict) – data dict containing ‘x’, ‘subj’, ‘obj’ and ‘y’

Validate whether the sample is legal

get_sent_ids()[source]

Generate sentence ID

Returns

string: sentence ID

load(data)[source]

Convert data dict which contains essential information to SASample.

Params

dict data: contains ‘token’, ‘subj’ ,’obj’, ‘relation’ keys.

get_dp()[source]

get dependency parsing

Return Tuple(list, list)

dependency tag of sentence and head of sentence

get_en()[source]

get entity index

Return Tuple(int, int, int, int)

start index of subject entity, end index of subject entity, start index of object entity and end index of object entity

get_type()[source]

get entity type

Return Tuple(string, string)

entity type of subject and entity type of object

get_sent()[source]

get tokenized sentence

Return Tuple(list, string)

tokenized sentence and relation

delete_field_at_indices(field, indices)[source]

delete word of given indices in sentence

Parameters
  • field (string) – field to be operated on

  • indices (list) – a list of index to be deleted

Return dict

contains ‘token’, ‘subj’ ,’obj’ keys

insert_field_after_indices(field, indices, new_item)[source]

insert word before given indices in sentence

Parameters
  • field (string) – field to be operated on

  • indices (list) – a list of index to be inserted

  • new_item (list) – list of items to be inserted

Return dict

contains ‘token’, ‘subj’ ,’obj’ keys

insert_field_before_indices(field, indices, new_item)[source]

insert word after given indices in sentence

Parameters
  • field (string) – field to be operated on

  • indices (list) – a list of index to be inserted

  • new_item (list) – list of items to be inserted

Return dict

contains ‘token’, ‘subj’ ,’obj’ keys

replace_sample_fields(data)[source]

replace sample fields for RE transformation

Parameters

data (dict) – contains transformed x, subj, obj keys

Return RESample

transformed sample

stan_ner_transform()[source]

Generate ner list

Return list

ner tags

get_pos()[source]

get pos tagging of sentence

Return list

pos tags

dump()[source]

output data sample

Return dict

containing x, subj, obj, y and sample_id

class textflint.generation_layer.transformation.RE.insert_clause.Transformation(**kwargs)[source]

Bases: abc.ABC

An abstract class for transforming a sequence of text to produce a list of potential adversarial example.

processor = <textflint.common.preprocess.en_processor.EnProcessor object>
transform(sample, n=1, field='x', **kwargs)[source]

Transform data sample to a list of Sample.

Parameters
  • sample (Sample) – Data sample for augmentation.

  • n (int) – Max number of unique augmented output, default is 5.

  • field (str|list) – Indicate which fields to apply transformations.

  • **kwargs (dict) –

    other auxiliary params.

Returns

list of Sample

classmethod sample_num(x, num)[source]

Get ‘num’ samples from x.

Parameters
  • x (list) – list to sample

  • num (int) – sample number

Returns

max ‘num’ unique samples.