Most popular nlp repositories and open source projects

Synonyms chatopera Python

:herb: 中文近义词：聊天机器人，智能问答工具包

5.1k 892 5.1k

AutoGPTQ AutoGPTQ Python

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

5.1k 538 5.1k

ai-engineering-from-scratch rohitg00 Python

Learn it. Build it. Ship it for others.

5k 1.1k 5k

machine_learning_complete Nyandwi Jupyter Notebook

A comprehensive machine learning repository containing 30+ notebooks on different concepts, algorithms and techniques.

5k 840 5k

Huatuo-Llama-Med-Chinese SCIR-HI Python

Repo for BenCao [original name: HuaTuo (华驼)], Instruction-tuning Large Language Models with Chinese Medical Knowledge. 本草（原名：华驼）模型仓库，...

5k 500 5k

text2vec shibing624 Python

text2vec, text to vector. 文本向量表征工具，把文本转化为向量矩阵，实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型，开箱...

5k 426 5k

argilla argilla-io Python

Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets

4.9k 481 4.9k

EasyR1 hiyouga Python

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

4.9k 366 4.9k

OpenPrompt thunlp Python

An Open-Source Framework for Prompt-Learning.

4.9k 486 4.9k

libpostal openvenues C

A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.

4.8k 466 4.8k

ml-road yanshengjia Python

Machine Learning and Agentic AI Resources, Practice and Research

4.7k 1.7k 4.7k

Learn_Prompting trigaten MDX

Prompt Engineering, Generative AI, and LLM Guide by Learn Prompting | Join our discord for the largest Prompt Engineering learning community

4.7k 662 4.7k

nlpaug makcedward Jupyter Notebook

Data augmentation for NLP

4.7k 476 4.7k

pytorch-sentiment-analysis bentrevett Jupyter Notebook

Tutorials on getting started with PyTorch and TorchText for sentiment analysis.

4.6k 1.2k 4.6k

Promptify promptslab Python

Prompt Engineering | Prompt Versioning | Use GPT or other prompt based models to get structured output. Join our discord for Prompt-Engineering, LLMs...

4.6k 362 4.6k

practical-pytorch spro Jupyter Notebook

Go to https://github.com/pytorch/tutorials - this repo is deprecated and no longer maintained

4.5k 1.1k 4.5k

Awesome-AIGC-Tutorials luban-agi

Curated tutorials and resources for Large Language Models, AI Painting, and more.

4.5k 301 4.5k

LLMBook-zh.github.io LLMBook-zh Python

《大语言模型》作者：赵鑫，李军毅，周昆，唐天一，文继荣

4.4k 333 4.4k

CLUEDatasetSearch CLUEbenchmark Python

搜索所有中文NLP数据集，附常用英文NLP数据集

4.4k 626 4.4k

llm-foundry mosaicml Python

LLM training code for Databricks foundation models

4.4k 588 4.4k

franc wooorm JavaScript

Natural language detection

4.4k 182 4.4k

DeepKE zjunlp Python

[EMNLP 2022] An Open Toolkit for Knowledge Graph Extraction and Construction

4.4k 742 4.4k

LMOps microsoft Python

General technology for enabling AI capabilities w/ LLMs and MLLMs

4.4k 371 4.4k

d2l-pytorch dsgiitr Jupyter Notebook

This project reproduces the book Dive Into Deep Learning (https://d2l.ai/), adapting the code from MXNet into PyTorch.

4.3k 1.2k 4.3k

PromptPapers thunlp

Must-read papers on prompt-based tuning for pre-trained language models.

4.3k 390 4.3k

Data-Science-Roadmap Moataz-Elmesmary

Data Science Roadmap from A to Z

4.2k 597 4.2k

Awesome-ChatGPT dalinvip

ChatGPT资料汇总学习，持续更新......

4.2k 386 4.2k

MNBVC esbatmop

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化，也包括各个小众文化甚至火...

4.2k 289 4.2k

spark-nlp JohnSnowLabs Scala

State of the Art Natural Language Processing

4.1k 741 4.1k

AdalFlow SylphAI-Inc Python

AdalFlow: The library to build & auto-optimize LLM applications.

4.1k 369 4.1k

LightLLM ModelTC Python

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-...

4k 321 4k

snips-nlu snipsco Python

Snips Python library to extract meaning from text

4k 504 4k

course huggingface MDX

The Hugging Face course on Transformers

3.9k 1.3k 3.9k

JioNLP dongrixinyu Python

中文 NLP 预处理、解析工具包，准确、高效、易用 A Chinese NLP Preprocessing & Parsing Package www.jionlp.com

3.8k 444 3.8k

Dive-into-DL-TensorFlow2.0 TrickyGo Jupyter Notebook

本项目将《动手学深度学习》(Dive into Deep Learning)原书中的MXNet实现改为TensorFlow 2.0实现，项目已得到李沐老师的认可

3.8k 819 3.8k

zero_nlp yuanzhoulvpi2017 Jupyter Notebook

中文nlp解决方案(大模型、数据、模型、训练、推理)

3.8k 445 3.8k

sumy miso-belica Python

Module for automatic summarization of text documents and HTML pages.

3.7k 545 3.7k

SimCSE princeton-nlp Python

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821

3.6k 533 3.6k

daily-interview datawhalechina

Datawhale成员整理的面经，内容包括机器学习，CV，NLP，推荐，开发等，欢迎大家star

3.6k 495 3.6k

awesome-DeepLearning PaddlePaddle Jupyter Notebook

深度学习入门课、资深课、特色课、学术案例、产业实践案例、深度学习知识百科及面试题库The course, case and knowledge of Deep Learning and AI

3.6k 859 3.6k

Awesome-Text2SQL eosphoros-ai

Curated tutorials and resources for Large Language Models, Text2SQL, Text2DSL、Text2API、Text2Vis and more.

3.6k 244 3.6k

text pytorch Python

Models, data loaders and abstractions for language processing, powered by PyTorch

3.6k 812 3.6k

ml-workspace ml-tooling Jupyter Notebook

🛠 All-in-one web-based IDE specialized for machine learning and data science.

3.5k 459 3.5k

AiLearning-Theory-Applying ben1234560 Jupyter Notebook

快速上手AI理论及应用实战：基础知识、Transformer、NLP、ML、DL、竞赛。含大量注释及数据集，力求每一位能看懂并复现。

3.5k 480 3.5k

course-nlp fastai Jupyter Notebook

A Code-First Introduction to NLP course

3.5k 1.5k 3.5k

picoGPT jaymody Python

An unnecessarily tiny implementation of GPT-2 in NumPy.

3.5k 457 3.5k

semantic-router aurelio-labs Python

Superfast AI decision making and intelligent processing of multi-modal data.

3.5k 326 3.5k

Jiagu ownthink Python

Jiagu深度学习自然语言处理工具知识图谱关系抽取中文分词词性标注命名实体识别情感分析新词发现关键词文本摘要文本聚类

3.4k 609 3.4k

TextAttack QData Python

TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master...

3.4k 445 3.4k

Awesome-Code-LLM codefuse-ai

[TMLR] A curated list of language modeling researches for code (and other software engineering activities), plus related datasets.

3.3k 229 3.3k

nlp

Repositories (1462)