Most popular natural-language-processing repositories and open source projects

Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.

stanza

Stanford NLP Python library for tokenization, sentence segmentation, N...

909   7562   7562  

WantWords

An open-source online reverse dictionary.

624   7091   7091  

models

Officially maintained, supported by PaddlePaddle, including CV, NLP, S...

2884   6930   6930  

mycroft-core

Mycroft Core, the Mycroft Artificial Intelligence platform.

1301   6601   6601  

awesome-multimodal-ml

Reading list for research topics in multimodal machine learning

884   6577   6577  

nlp.js

An NLP library for building bots, with entity extraction, sentiment an...

632   6486   6486  

big-AGI

AI suite powered by state-of-the-art models and providing advanced AI/...

1514   6464   6464  

nlp-recipes

Natural Language Processing Best Practices & Examples

915   6424   6424  

awesome-self-supervised-learning

A curated list of awesome self-supervised methods

833   6314   6314  

ML-Course-Notes

🎓 Sharing machine learning course / lecture notes.

830   6274   6274  

courses

This repository is a curated collection of links to various courses an...

557   6144   6144  

ai-deadlines

:alarm_clock: AI conference deadline countdowns

1024   5917   5917  

ERNIE

Official implementations for various pre-training models of ERNIE-fami...

1252   5898   5898  

AI-Job-Notes

AI算法岗求职攻略(涵盖准备攻略、刷题指南、内推和AI公司清单等资料)

661   5751   5751  

Baichuan-7B

A large-scale 7B pretraining language model developed by BaiChuan-Inc.

503   5685   5685  

ltp

Language Technology Platform

1057   5163   5163  

datascience

This repository is a compilation of free resources for learning Data S...

529   5131   5131  

marqo

Unified embedding generation and search engine. Also available on clou...

212   4918   4918  

OpenPrompt

An Open-Source Framework for Prompt-Learning.

476   4699   4699  

argilla

Argilla is a collaboration tool for AI engineers and domain experts to...

445   4625   4625  

nlpaug

Data augmentation for NLP

470   4600   4600  

practical-pytorch

Go to https://github.com/pytorch/tutorials - this repo is deprecated a...

1092   4550   4550  

pytorch-sentiment-analysis

Tutorials on getting started with PyTorch and TorchText for sentiment...

1181   4534   4534  

libpostal

A C library for parsing/normalizing street addresses around the world....

450   4532   4532  

autotrain-advanced

🤗 AutoTrain Advanced

605   4473   4473  

textract

extract text from any document. no muss. no fuss.

635   4244   4244  

FLAML

A fast library for AutoML and tuning. Join our Discord: https://discor...

541   4185   4185  

Data-science

Collection of useful data science topics along with articles, videos,...

1036   4123   4123  

Baichuan2

A series of large language models developed by Baichuan Intelligent Te...

295   4122   4122  

trafilatura

Python & Command-line tool to gather text and metadata on the Web: Cra...

288   4109   4109  

cs230-code-examples

Code examples in pyTorch and Tensorflow for CS230

1005   4068   4068  

spark-nlp

State of the Art Natural Language Processing

729   4023   4023  

arXivTimes

repository to research & share the machine learning articles

201   3918   3918  

LLMBook-zh.github.io

《大语言模型》作者:赵鑫,李军毅,周昆,唐天一,文继荣

286   3873   3873  

MatchZoo

Facilitating the design, comparison and sharing of deep text matching...

900   3856   3856  

olivia

💁‍♀️Your new best friend powered by an artificial neural network

347   3712   3712  

JioNLP

中文 NLP 预处理、解析工具包,准确、高效、易用 A Chinese NLP Preprocess...

436   3685   3685  

AI-Engineer-Headquarters

A collection of scientific methods, processes, algorithms, and systems...

686   3626   3626  

lit

The Learning Interpretability Tool: Interactively analyze ML models to...

366   3583   3583  

zhihu

This repo contains the source code in my personal column (https://zhua...

2126   3520   3520  

catalyst

Accelerated deep learning R&D

394   3363   3363  

vale

:pencil: A syntax-aware linter for prose built with speed and extensib...

121   3265   3265  

nlp-roadmap

ROADMAP(Mind Map) and KEYWORD for students those who have interest in...

516   3265   3265  

TextAttack

TextAttack 🐙 is a Python framework for adversarial attacks, data aug...

431   3237   3237  

pyhanlp

中文分词

803   3198   3198  

ml-course

Open Machine Learning course

1233   3193   3193  

mlops-course

Learn how to design, develop, deploy and iterate on production-grade M...

559   3176   3176  

fastNLP

fastNLP: A Modularized and Extensible NLP Framework. Currently still i...

449   3134   3134  

torchscale

Foundation Architecture for (M)LLMs

221   3101   3101  

UER-py

Open Source Pre-training Model Framework in PyTorch & Pre-trained Mode...

524   3074   3074