Most popular nlp repositories and open source projects

Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.

ChineseGLUE

Language Understanding Evaluation benchmark for Chinese: datasets, bas...

246   1732   1732  

ChineseNLP

Datasets, SOTA results of every fields of Chinese NLP

276   1724   1724  

NLPer-Arsenal

收录NLP竞赛策略实现、各任务baseline、相关竞赛经验贴(当前赛事、往期赛...

225   1719   1719  

nltk_data

NLTK Data

1096   1716   1716  

NeuroNER

Named-entity recognition using neural networks. Easy-to-use and state-...

474   1713   1713  

gpt2-ml

GPT2 for Multiple Languages, including pretrained models. GPT2 多语言...

333   1711   1711  

cakechat

CakeChat: Emotional Generative Dialog System

921   1708   1708  

NLP-Knowledge-Graph

自然语言处理、知识图谱、对话系统三大技术研究与应用。

369   1708   1708  

zero_nlp

中文nlp解决方案(大模型、数据、模型、训练、推理)

225   1693   1693  

graph4nlp

Graph4nlp is the library for the easy use of Graph Neural Networks for...

204   1688   1688  

transformer-deploy

Efficient, scalable and enterprise-grade CPU/GPU inference server for...

153   1687   1687  

sense2vec

🦆 Contextually-keyed word vectors

242   1657   1657  

news-please

news-please - an integrated web crawler and information extractor for...

379   1655   1655  

magnitude

A fast, efficient universal vector embedding utility package.

119   1650   1650  

Chinese-XLNet

Pre-Trained Chinese XLNet(中文XLNet预训练模型)

281   1650   1650  

pet

This repository contains the code for "Exploiting Cloze Questions for...

281   1628   1628  

TextInfoExp

自然语言处理实验(sougou数据集),TF-IDF,文本分类、聚类、词向量、情感...

776   1615   1615  

Transformers-Recipe

🧠 A study guide to learn about Transformers

157   1604   1604  

delta

DELTA is a deep learning based natural language and speech processing...

287   1592   1592  

usaddress

:us: a python library for parsing unstructured United States address s...

304   1591   1591  

Recognizers-Text

Microsoft.Recognizers.Text provides recognition and resolution of numb...

423   1574   1574  

awesome-ai-ml-dl

Awesome Artificial Intelligence, Machine Learning and Deep Learning as...

362   1569   1569  

underthesea

Underthesea - Vietnamese NLP Toolkit

287   1565   1565  

fastRAG

Efficient Retrieval Augmentation and Generation Framework

145   1565   1565  

TAADpapers

Must-read Papers on Textual Adversarial Attack and Defense

194   1561   1561  

deepsparse

Inference runtime offering GPU-class performance on CPUs and APIs to i...

95   1556   1556  

entity-recognition-datasets

A collection of corpora for named entity recognition (NER) and entity...

248   1547   1547  

TigerBot

TigerBot: A multi-language multi-task LLM

150   1545   1545  

Keras-TextClassification

中文长文本分类、短句子分类、多标签分类、两句子相似度(Chinese Text Cla...

398   1541   1541  

torchdistill

A coding-free framework built on PyTorch for reproducible deep learnin...

135   1539   1539  

bi-att-flow

Bi-directional Attention Flow (BiDAF) network is a multi-stage hierarc...

678   1538   1538  

jiant

jiant is an nlp toolkit

287   1534   1534  

StudyBook

Study E-Book(ComputerVision DeepLearning MachineLearning Math NLP Pyth...

122   1523   1523  

nlp-lang

这个项目是一个基本包.封装了大多数nlp项目中常用工具

498   1499   1499  

tensorflow-nlp

NLP and Text Generation Experiments in TensorFlow 2.x / 1.x

425   1487   1487  

similarity

similarity: Text similarity calculation Toolkit for Java. 文本相似度计...

333   1486   1486  

nlpcda

一键中文数据增强包 ; NLP数据增强、bert数据增强、EDA:pip install nlpc...

162   1478   1478  

nlp_xiaojiang

自然语言处理(nlp),小姜机器人(闲聊检索式chatbot),BERT句向量-相似...

393   1473   1473  

setfit

Efficient few-shot learning with Sentence Transformers

158   1468   1468  

BotSharp

The Open Source Chatbot Framework in .NET

336   1466   1466  

eda_nlp

Data augmentation for NLP, presented at EMNLP 2019

306   1455   1455  

lingua-py

The most accurate natural language detection library for Python, suita...

49   1455   1455  

nlp_paper_summaries

✍️ A carefully curated list of NLP paper summaries

248   1453   1453  

refinery

The data scientist's open-source choice to scale, assess and maintain...

72   1452   1452  

awesome-document-understanding

A curated list of resources for Document Understanding (DU) topic

162   1449   1449  

TextBrewer

A PyTorch-based knowledge distillation toolkit for natural language pr...

229   1428   1428  

scispacy

A full spaCy pipeline and models for scientific/biomedical documents.

196   1428   1428  

Chinese-ELECTRA

Pre-trained Chinese ELECTRA(中文ELECTRA预训练模型)

172   1424   1424  

QA-Survey-CN

北京航空航天大学大数据高精尖中心自然语言处理研究团队开展了智能问答的研...

240   1412   1412  

projects

🪐 End-to-end NLP workflows from prototype to production

466   1395   1395