Most popular nlp repositories and open source projects

caiss ChunelFeng C++

一款简单好用的跨平台/多语言的相似向量/相似词/相似句高性能检索引擎。欢迎star & fork。Build together! Power another !

547 66 547

camel_tools CAMeL-Lab Python

A suite of Arabic natural language processing tools developed by the CAMeL Lab at New York University Abu Dhabi.

546 88 546

pinferencia underneathall Python

Python + Inference - Model Deployment library in Python. Simplest model inference server ever.

545 83 545

LMaaS-Papers txsun1997

Awesome papers on Language-Model-as-a-Service (LMaaS)

545 32 545

happy-transformer EricFillion Python

Happy Transformer makes it easy to fine-tune and perform inference with NLP Transformer models.

545 69 545

Mengzi Langboat

Mengzi Pretrained Models

544 62 544

m3tl JayYip Jupyter Notebook

BERT for Multitask Learning

544 123 544

japanese-pretrained-models rinnakk Python

Code for producing Japanese pretrained models provided by rinna Co., Ltd.

543 40 543

codequestion neuml Python

🔎 Semantic search for developers

543 46 543

chinese_dictionary guotong1988

同义词表，反义词表，否定词表

542 200 542

ai-web-extensions adamlui JavaScript

🤖 AI browser extensions & userscripts to augment your web experience

540 58 540

nlp-notebook jasoncao11 Python

NLP 领域常见任务的实现，包括新词发现、以及基于pytorch的词向量、中文文本分类、实体识别、摘要文本生成、句子相似度判断、三元组抽取、预训练模型等。

536 113 536

Wordless BLKSerene Python

An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation

536 82 536

HiRAG hhy-huang Python

[EMNLP'25 findings] This is the official repo for the paper, HiRAG: Retrieval-Augmented Generation with Hierarchical Knowledge.

535 83 535

MedCAT CogStack Python

Medical Concept Annotation Tool

531 118 531

php-text-analysis yooper PHP

PHP Text Analysis is a library for performing Information Retrieval (IR) and Natural Language Processing (NLP) tasks using the PHP language

531 91 531

Giveme5W1H fhamborg HTML

Extraction of the journalistic five W and one H questions (5W1H) from news articles: who did what, when, where, why, and how?

530 85 530

text_summurization_abstractive_methods theamrzaki Jupyter Notebook

Multiple implementations for abstractive text summurization , using google colab

530 219 530

poplar synyi TypeScript

A web-based annotation tool for natural language processing (NLP)

529 140 529

similarity-search-kit ZachNagengast Swift

🔎 SimilaritySearchKit is a Swift package providing on-device text embeddings and semantic search functionality for iOS and macOS applications.

524 51 524

awesome-tensorflow-2 Amin-Tgz

👉 Tensorflow 2.x resources such as tutorial, blog, code and videos

524 101 524

headlines udibr Jupyter Notebook

Automatically generate headlines to short articles

524 145 524

pytextclassifier shibing624 Python

pytextclassifier is a toolkit for text classification. 文本分类，LR，Xgboost，TextCNN，FastText，TextRNN，BERT等分类模型实现，开箱即用。

523 77 523

WebShop princeton-nlp Python

[NeurIPS 2022] 🛒WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents

523 99 523

Deep-Semantic-Similarity-Model airalcorn2 Python

My Keras implementation of the Deep Semantic Similarity Model (DSSM)/Convolutional Latent Semantic Model (CLSM) described here: http://research.micros...

521 182 521

examples towhee-io Jupyter Notebook

Analyze the unstructured data with Towhee, such as reverse image search, reverse video search, audio classification, question and answer systems, mole...

520 123 520

German-NLP adbar

Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German

520 66 520

Goopt jokenox JavaScript

🔍 Search Engine for a Procedural Simulation of the Web with GPT-3.

519 37 519

Teamlinker Teamlinker TypeScript

Teamlinker is a team collaboration platform that integrates multi-functional modules. Users can process tasks in parallel, including six functional mo...

519 34 519

language_tool_python jxmorris12 Python

a free, non-AI python grammar checker 📝✅

518 71 518

fugashi polm C++

A Cython MeCab wrapper for fast, pythonic Japanese tokenization and morphological analysis.

518 39 518

python-stanford-corenlp stanfordnlp Python

Python interface to CoreNLP using a bidirectional server-client interface.

518 104 518

OmniNet subho406 Python

Official Pytorch implementation of "OmniNet: A unified architecture for multi-modal multi-task learning" | Authors: Subhojeet Pramanik, Priyanka Agraw...

514 59 514

awesome-llms-fine-tuning Curated-Awesome-Lists

Explore a comprehensive collection of resources, tutorials, papers, tools, and best practices for fine-tuning Large Language Models (LLMs). Perfect fo...

513 75 513

mergoo Leeroo-AI Python

A library for easily merging multiple LLM experts, and efficiently train the merged LLM.

511 33 511

XPretrain microsoft Python

Multi-modality pre-training

511 35 511

edgar-crawler lefterisloukas Python

The only open-source toolkit that can download SEC EDGAR financial reports and extract textual data from specific item sections into nice & clean stru...

511 131 511

BertSimilarity Brokenwind Python

Computing similarity of two sentences with google's BERT algorithm。利用Bert计算句子相似度。语义相似度计算。文本相似度计算。

509 68 509

machine-learning-articles Mybridge

Monthly Series - Top 10 Machine Learning Articles

507 39 507

prodigy-recipes explosion Jupyter Notebook

🍳 Recipes for the Prodigy, our fully scriptable annotation tool

507 113 507

agency neurocult Go

🕵️‍♂️ Library designed for developers eager to explore the potential of Large Language Models (LLMs) and other generative AI through a clean, effectiv...

506 35 506