textflint.input_layer.model.tokenizers.glove_tokenizer

Glove Tokenizer

class textflint.input_layer.model.tokenizers.glove_tokenizer.WordLevelTokenizer(word_id_map={}, pad_token_id=None, unk_token_id=None, unk_token='[UNK]', sep_token='[SEP]', cls_token='[CLS]', pad_token='[PAD]', lowercase: bool = False, unicode_normalizer=None)[source]

Bases: tokenizers.implementations.base_tokenizer.BaseTokenizer

WordLevelTokenizer.

Represents a simple word level tokenization using the internals of BERT’s tokenizer.

Based off the tokenizers BertWordPieceTokenizer (https://github.com/huggingface/tokenizers/)

class textflint.input_layer.model.tokenizers.glove_tokenizer.GloveTokenizer(word_id_map={}, pad_token_id=None, unk_token_id=None, max_length=256)[source]

Bases: textflint.input_layer.model.tokenizers.glove_tokenizer.WordLevelTokenizer

A word-level tokenizer with GloVe 200-dimensional vectors.

Lowercased, since GloVe vectors are lowercased.

batch_encode(input_text_list)[source]

The batch equivalent of encode.