Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.
:herb: 中文近义词:聊天机器人,智能问答工具包
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Learn it. Build it. Ship it for others.
A comprehensive machine learning repository containing 30+ notebooks on different concepts, algorithms and techniques.
Repo for BenCao [original name: HuaTuo (华驼)], Instruction-tuning Large Language Models with Chinese Medical Knowledge. 本草(原名:华驼)模型仓库,...
text2vec, text to vector. 文本向量表征工具,把文本转化为向量矩阵,实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型,开箱...
Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
An Open-Source Framework for Prompt-Learning.
A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
Machine Learning and Agentic AI Resources, Practice and Research
Prompt Engineering, Generative AI, and LLM Guide by Learn Prompting | Join our discord for the largest Prompt Engineering learning community
Data augmentation for NLP
Tutorials on getting started with PyTorch and TorchText for sentiment analysis.
Prompt Engineering | Prompt Versioning | Use GPT or other prompt based models to get structured output. Join our discord for Prompt-Engineering, LLMs...
Go to https://github.com/pytorch/tutorials - this repo is deprecated and no longer maintained
Curated tutorials and resources for Large Language Models, AI Painting, and more.
《大语言模型》作者:赵鑫,李军毅,周昆,唐天一,文继荣
搜索所有中文NLP数据集,附常用英文NLP数据集
LLM training code for Databricks foundation models
Natural language detection
[EMNLP 2022] An Open Toolkit for Knowledge Graph Extraction and Construction
General technology for enabling AI capabilities w/ LLMs and MLLMs
This project reproduces the book Dive Into Deep Learning (https://d2l.ai/), adapting the code from MXNet into PyTorch.
Must-read papers on prompt-based tuning for pre-trained language models.
Data Science Roadmap from A to Z
ChatGPT资料汇总学习,持续更新......
MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火...
State of the Art Natural Language Processing
AdalFlow: The library to build & auto-optimize LLM applications.
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-...
Snips Python library to extract meaning from text
The Hugging Face course on Transformers
中文 NLP 预处理、解析工具包,准确、高效、易用 A Chinese NLP Preprocessing & Parsing Package www.jionlp.com
本项目将《动手学深度学习》(Dive into Deep Learning)原书中的MXNet实现改为TensorFlow 2.0实现,项目已得到李沐老师的认可
中文nlp解决方案(大模型、数据、模型、训练、推理)
Module for automatic summarization of text documents and HTML pages.
[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
Datawhale成员整理的面经,内容包括机器学习,CV,NLP,推荐,开发等,欢迎大家star
深度学习入门课、资深课、特色课、学术案例、产业实践案例、深度学习知识百科及面试题库The course, case and knowledge of Deep Learning and AI
Curated tutorials and resources for Large Language Models, Text2SQL, Text2DSL、Text2API、Text2Vis and more.
Models, data loaders and abstractions for language processing, powered by PyTorch
🛠 All-in-one web-based IDE specialized for machine learning and data science.
快速上手AI理论及应用实战:基础知识、Transformer、NLP、ML、DL、竞赛。含大量注释及数据集,力求每一位能看懂并复现。
A Code-First Introduction to NLP course
An unnecessarily tiny implementation of GPT-2 in NumPy.
Superfast AI decision making and intelligent processing of multi-modal data.
Jiagu深度学习自然语言处理工具 知识图谱关系抽取 中文分词 词性标注 命名实体识别 情感分析 新词发现 关键词 文本摘要 文本聚类
TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master...
[TMLR] A curated list of language modeling researches for code (and other software engineering activities), plus related datasets.