neluca/tinybpe
🐍This is a fast, lightweight, and clean CPython extension for the Byte Pair Encoding (BPE) algorithm, which is commonly used in LLM tokenization and NLP tasks.
GitHub repository with 5 stars and 0 forks.
Language: Python
Topics: bpe, bpe-tokenizer, byte-level, cpython-extensions, llm, tokenizer