textflint.input_layer.model.tokenizers.glove_tokenizer¶
Glove Tokenizer¶
-
class
textflint.input_layer.model.tokenizers.glove_tokenizer.WordLevelTokenizer(word_id_map={}, pad_token_id=None, unk_token_id=None, unk_token='[UNK]', sep_token='[SEP]', cls_token='[CLS]', pad_token='[PAD]', lowercase: bool = False, unicode_normalizer=None)[source]¶ Bases:
tokenizers.implementations.base_tokenizer.BaseTokenizerWordLevelTokenizer.
Represents a simple word level tokenization using the internals of BERT’s tokenizer.
Based off the tokenizers BertWordPieceTokenizer (https://github.com/huggingface/tokenizers/)
-
class
textflint.input_layer.model.tokenizers.glove_tokenizer.GloveTokenizer(word_id_map={}, pad_token_id=None, unk_token_id=None, max_length=256)[source]¶ Bases:
textflint.input_layer.model.tokenizers.glove_tokenizer.WordLevelTokenizerA word-level tokenizer with GloVe 200-dimensional vectors.
Lowercased, since GloVe vectors are lowercased.