Topic

nlp

Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.

Repositories (1462)

bigscience
bigscience bigscience-workshop Shell

Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.

1k
clean-text
clean-text jfilter Python

🧹 Python package for text cleaning

1k
Summarization-Papers
Summarization-Papers xcfcode TeX

Summarization Papers

1k
Prompt4ReasoningPapers
Prompt4ReasoningPapers zjunlp

[ACL 2023] Reasoning with Language Model Prompting: A Survey

1k
Python-ai-assistant
Python-ai-assistant ggeop Python

Python AI assistant 🧠

1k
CLUECorpus2020
CLUECorpus2020 CLUEbenchmark

Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料

1k
taranis-ai
taranis-ai taranis-ai Python

Taranis AI is an advanced Open-Source Intelligence (OSINT) tool, leveraging Artificial Intelligence to revolutionize information gathering and situati...

1k
splade
splade naver Python

SPLADE: sparse neural search (SIGIR21, SIGIR22)

992
lawyer-llama
lawyer-llama AndrewZhe Python

中文法律LLaMA (LLaMA for Chinese legel domain)

992
Jackrong-llm-finetuning-guide
Jackrong-llm-finetuning-guide R6410418 Jupyter Notebook
988
K-BERT
K-BERT autoliuweijie Python

Source code of K-BERT (AAAI2020)

987
QANet
QANet localminimum Python

A Tensorflow implementation of QANet for machine reading comprehension

985
langkit
langkit whylabs Jupyter Notebook

🔍 LangKit: An open-source toolkit for monitoring Large Language Models (LLMs). 📚 Extracts signals from prompts & responses, ensuring safety & securi...

985
soynlp
soynlp lovit Python

한국어 자연어처리를 위한 파이썬 라이브러리입니다. 단어 추출/ 토크나이저 / 품사판별/ 전처리의 기능을 제공합니다.

983
plato-research-dialogue-system
plato-research-dialogue-system uber-archive Python

This is the Plato Research Dialogue System, a flexible platform for developing conversational AI agents.

980
YouTokenToMe
YouTokenToMe VKCOM C++

Unsupervised text tokenizer focused on computational efficiency

977
TinyLLaVA_Factory
TinyLLaVA_Factory TinyLLaVA Python

A Framework of Small-scale Large Multimodal Models

977
keras-hub
keras-hub keras-team Python

Pretrained model hub for Keras 3.

976
Llama-2-Open-Source-LLM-CPU-Inference
Llama-2-Open-Source-LLM-CPU-Inference kennethleungty Python

Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A

976
bert_language_understanding
bert_language_understanding brightmart Python

Pre-training of Deep Bidirectional Transformers for Language Understanding: pre-train TextCNN

966
rasa-ui
rasa-ui paschmann JavaScript

Rasa UI is a frontend for the Rasa Framework

966
SurveyX
SurveyX IAAR-Shanghai TeX

Academic Survey Paper Generation.

965
wikipedia2vec
wikipedia2vec wikipedia2vec Python

A tool for learning vector representations of words and entities from Wikipedia

964
gector
gector grammarly Python

Official implementation of the papers "GECToR – Grammatical Error Correction: Tag, Not Rewrite" (BEA-20) and "Text Simplification by Tagging" (BEA-21)

964
Transformers-for-NLP-2nd-Edition
Transformers-for-NLP-2nd-Edition Denis2054 Jupyter Notebook

Transformer models from BERT to GPT-4, environments from Hugging Face to OpenAI. Fine-tuning, training, and prompt engineering examples. A bonus secti...

961
bolt
bolt huawei-noah C++

Bolt is a deep learning library with high performance and heterogeneous flexibility.

957
WEB_KG
WEB_KG lixiang0 Python

爬取百度百科中文页面,抽取三元组信息,构建中文知识图谱

956
pyresparser
pyresparser OmkarPathak Python

A simple resume parser used for extracting information from resumes

955
NLP-Tutorials
NLP-Tutorials MorvanZhou Python

Simple implementations of NLP models. Tutorials are written in Chinese on my website https://mofanpy.com

952
weibo-analysis-and-visualization
weibo-analysis-and-visualization HUANGZHIHAO1994 Python

使用python抓取微博数据并对微博文本分析和可视化,LDA(树图)、关系图、词云、时间趋势(折线图)、热度地图、词典情感分析(饼图和3D柱状图)、词向量神经网...

949
awesome-japanese-nlp-resources
awesome-japanese-nlp-resources taishi-i

A curated list of resources dedicated to Python libraries, LLMs, dictionaries, and corpora of NLP for Japanese

946
StarryDivineSky
StarryDivineSky wuwenjie1992

精选了10K+项目,包括机器学习、深度学习、NLP、GNN、推荐系统、生物医药、机器视觉、前后端开发等内容。Selected more than 10k+ projects, including machine...

933
awesome-sentiment-analysis
awesome-sentiment-analysis xiamx

😀😄😂😭 A curated list of Sentiment Analysis methods, implementations and misc. 😥😟😱😤

930
Bert-Multi-Label-Text-Classification
Bert-Multi-Label-Text-Classification lonePatient Python

This repo contains a PyTorch implementation of a pretrained BERT model for multi-label text classification.

924
LightAutoML
LightAutoML sberbank-ai-lab Python

LAMA - automatic model creation framework

924
jieba-rs
jieba-rs messense Rust

The Jieba Chinese Word Segmentation Implemented in Rust

923
jcseg
jcseg lionsoul2014 Java

Jcseg is a light weight NLP framework developed with Java. Provide CJK and English segmentation based on MMSEG algorithm, With also keywords extractio...

922
mindnlp
mindnlp mindspore-lab Python

MindSpore + 🤗Huggingface: Run any Transformers/Diffusers model on MindSpore with seamless compatibility and acceleration.

918
pointer_summarizer
pointer_summarizer atulkum Python

pytorch implementation of "Get To The Point: Summarization with Pointer-Generator Networks"

915
awesome-gcn
awesome-gcn Jiakui

resources for graph convolutional networks (图卷积神经网络相关资源)

913
iowncode
iowncode anupamchugh Swift

A curated collection of iOS, ML, AR resources sprinkled with some UI additions

909
TextGAN-PyTorch
TextGAN-PyTorch williamSYSU Python

TextGAN is a PyTorch framework for Generative Adversarial Networks (GANs) based text generation models.

909
langfun
langfun google Python

OO for LLMs

909
self-attentive-parser
self-attentive-parser nikitakit Python

High-accuracy NLP parser with models for 11 languages.

907
DISC-LawLLM
DISC-LawLLM FudanDISC Python

[中文法律大模型] DISC-LawLLM: an intelligent legal system powered by large language models (LLMs) to provide a wide range of legal services.

906
cltk
cltk cltk Python

The Classical Language Toolkit

905
similarities
similarities shibing624 Python

Similarities: a toolkit for similarity calculation and semantic search. 相似度计算、匹配搜索工具包,支持亿级数据文搜文、文搜图、图搜图,python3开发,...

902
curated-transformers
curated-transformers explosion Python

🤖 A PyTorch library of curated Transformer models and their composable components

896
datacamp-python-data-science-track
datacamp-python-data-science-track AmoDinho Python

All the slides, accompanying code and exercises all stored in this repo. 🎈

891
Chatito
Chatito rodrigopivi TypeScript

🎯🗯 Dataset generation for AI chatbots, NLP tasks, named entity recognition or text classification models using a simple DSL!

888