MNBVC

MNBVC

esbatmop

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

1.3k Stars
83 Forks
1.3k Watchers
mit License
Cost to Build
$6.7K
Market Value
$17.1K

Growth over time

2 data points  ·  2023-02-15 → 2023-07-07
Stars Forks Watchers
💬

How do you feel about this project?

Ask AI about MNBVC

Question copied to clipboard

What is the esbatmop/MNBVC GitHub project? Description: "MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。". Explain what it does, its main use cases, key features, and who would benefit from using it.

Question is copied to clipboard — paste it after the AI opens.

How to clone MNBVC

Clone via HTTPS

git clone https://github.com/esbatmop/MNBVC.git

Clone via SSH

[email protected]:esbatmop/MNBVC.git

Download ZIP

Download master.zip

Found an issue?

Report bugs or request features on the MNBVC issue tracker:

Open GitHub Issues