Most popular nlp repositories and open source projects

bigscience bigscience-workshop Shell

Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.

1k 101 1k

clean-text jfilter Python

🧹 Python package for text cleaning

1k 82 1k

Summarization-Papers xcfcode TeX

Summarization Papers

1k 147 1k

Prompt4ReasoningPapers zjunlp

[ACL 2023] Reasoning with Language Model Prompting: A Survey

1k 67 1k

Python-ai-assistant ggeop Python

Python AI assistant 🧠

1k 246 1k

CLUECorpus2020 CLUEbenchmark

Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料

1k 83 1k

taranis-ai taranis-ai Python

Taranis AI is an advanced Open-Source Intelligence (OSINT) tool, leveraging Artificial Intelligence to revolutionize information gathering and situati...

1k 129 1k

splade naver Python

SPLADE: sparse neural search (SIGIR21, SIGIR22)

992 96 992

lawyer-llama AndrewZhe Python

中文法律LLaMA (LLaMA for Chinese legel domain)

992 131 992

Jackrong-llm-finetuning-guide R6410418 Jupyter Notebook

988 175 988

K-BERT autoliuweijie Python

Source code of K-BERT (AAAI2020)

987 216 987

QANet localminimum Python

A Tensorflow implementation of QANet for machine reading comprehension

985 298 985

langkit whylabs Jupyter Notebook

🔍 LangKit: An open-source toolkit for monitoring Large Language Models (LLMs). 📚 Extracts signals from prompts & responses, ensuring safety & securi...

985 73 985

soynlp lovit Python

한국어 자연어처리를 위한 파이썬 라이브러리입니다. 단어 추출/ 토크나이저 / 품사판별/ 전처리의 기능을 제공합니다.

983 184 983

plato-research-dialogue-system uber-archive Python

This is the Plato Research Dialogue System, a flexible platform for developing conversational AI agents.

980 189 980

YouTokenToMe VKCOM C++

Unsupervised text tokenizer focused on computational efficiency

977 109 977

TinyLLaVA_Factory TinyLLaVA Python

A Framework of Small-scale Large Multimodal Models

977 99 977

keras-hub keras-team Python

Pretrained model hub for Keras 3.

976 334 976

Llama-2-Open-Source-LLM-CPU-Inference kennethleungty Python

Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A

976 207 976

bert_language_understanding brightmart Python

Pre-training of Deep Bidirectional Transformers for Language Understanding: pre-train TextCNN

966 211 966

rasa-ui paschmann JavaScript

Rasa UI is a frontend for the Rasa Framework

966 325 966

SurveyX IAAR-Shanghai TeX

Academic Survey Paper Generation.

965 96 965

wikipedia2vec wikipedia2vec Python

A tool for learning vector representations of words and entities from Wikipedia

964 101 964

gector grammarly Python

Official implementation of the papers "GECToR – Grammatical Error Correction: Tag, Not Rewrite" (BEA-20) and "Text Simplification by Tagging" (BEA-21)

964 219 964

Transformers-for-NLP-2nd-Edition Denis2054 Jupyter Notebook

Transformer models from BERT to GPT-4, environments from Hugging Face to OpenAI. Fine-tuning, training, and prompt engineering examples. A bonus secti...

961 358 961

bolt huawei-noah C++

Bolt is a deep learning library with high performance and heterogeneous flexibility.

957 163 957

WEB_KG lixiang0 Python

爬取百度百科中文页面，抽取三元组信息，构建中文知识图谱

956 195 956

pyresparser OmkarPathak Python

A simple resume parser used for extracting information from resumes

955 448 955

NLP-Tutorials MorvanZhou Python

Simple implementations of NLP models. Tutorials are written in Chinese on my website https://mofanpy.com

952 315 952

weibo-analysis-and-visualization HUANGZHIHAO1994 Python

使用python抓取微博数据并对微博文本分析和可视化，LDA（树图）、关系图、词云、时间趋势（折线图）、热度地图、词典情感分析（饼图和3D柱状图）、词向量神经网...

949 142 949

awesome-japanese-nlp-resources taishi-i

A curated list of resources dedicated to Python libraries, LLMs, dictionaries, and corpora of NLP for Japanese

946 40 946

StarryDivineSky wuwenjie1992

精选了10K+项目，包括机器学习、深度学习、NLP、GNN、推荐系统、生物医药、机器视觉、前后端开发等内容。Selected more than 10k+ projects, including machine...

933 146 933

awesome-sentiment-analysis xiamx

😀😄😂😭 A curated list of Sentiment Analysis methods, implementations and misc. 😥😟😱😤

930 163 930

Bert-Multi-Label-Text-Classification lonePatient Python

This repo contains a PyTorch implementation of a pretrained BERT model for multi-label text classification.

924 207 924

LightAutoML sberbank-ai-lab Python

LAMA - automatic model creation framework

924 98 924

jieba-rs messense Rust

The Jieba Chinese Word Segmentation Implemented in Rust

923 60 923

jcseg lionsoul2014 Java

Jcseg is a light weight NLP framework developed with Java. Provide CJK and English segmentation based on MMSEG algorithm, With also keywords extractio...

922 211 922