KazBERT

Published:

KazBERT is a robust BERT-based model specifically designed and fine-tuned for Kazakh language tasks.

Achievements:

  • Over 14,000 downloads
  • 14 likes on Hugging Face

The model is trained using Masked Language Modeling (MLM) on a rich multilingual text corpus comprising Kazakh, Russian, and English texts.

Scientific Citations & Impact: KazBERT has been recognized and utilized by the academic community in several peer-reviewed publications:

  • LLM-Assisted Weak Supervision for Low-Resource Kazakh Sequence Labeling: Synthetic Annotation and CRF-Refined NER/POS Models (MDPI Applied Sciences)
  • Hybrid artificial intelligence architectures for automatic text correction in the Kazakh language (Frontiers in Artificial Intelligence)
  • Application of Vector Models in Intelligent Information Retrieval Systems (Academic Scientific Journal of Computer Science)

Available on Hugging Face