Topic

nlp

Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.

Repositories (1462)

spacy-layout
spacy-layout explosion Python

📚 Process PDFs, Word documents and more with spaCy

886
uniem
uniem wangyuxinwhy Python

unified embedding model

877
pypostal
pypostal openvenues C

Python bindings to libpostal for fast international address parsing/normalization

874
LLM-Finetuning-Toolkit
LLM-Finetuning-Toolkit georgian-io Python

Toolkit for fine-tuning, ablating and unit-testing open-source LLMs.

872
DeepGit
DeepGit zamalali Python

Deep research agent to help you find the best GitHub repositories 🕵️!

866
transformers-tutorials
transformers-tutorials abhimishra91 Jupyter Notebook

Github repo with tutorials to fine tune transformers for diff NLP tasks

864
ontogpt
ontogpt monarch-initiative Jupyter Notebook

LLM-based ontological extraction tools, including SPIRES

860
Natural-Language-Processing-Specialization
Natural-Language-Processing-Specialization amanjeetsahu Jupyter Notebook

This repo contains my coursework, assignments, and Slides for Natural Language Processing Specialization by deeplearning.ai on Coursera

857
dataset-viewer
dataset-viewer huggingface Python

Backend that powers the dataset viewer on Hugging Face dataset pages through a public API.

856
language-detection
language-detection patrickschur PHP

A language detection library for PHP. Detects the language from a given text string.

855
bookcorpus
bookcorpus soskek Python

Crawl BookCorpus

855
ifopt
ifopt ethz-adrl C++

An Eigen-based, light-weight C++ Interface to Nonlinear Programming Solvers (Ipopt, Snopt)

853
spacy-streamlit
spacy-streamlit explosion Python

👑 spaCy building blocks and visualizers for Streamlit apps

853
PIXIU
PIXIU The-FinAI Jupyter Notebook

This repository introduces PIXIU, an open-source resource featuring the first financial large language models (LLMs), instruction tuning data, and eva...

852
AutoCoder
AutoCoder bin123apple Python

We introduced a new model designed for the Code generation task. Its test accuracy on the HumanEval base dataset surpasses that of GPT-4 Turbo (April...

849
magpie
magpie magpie-align Python

[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data generation pi...

848
byaldi
byaldi AnswerDotAI Python

Use late-interaction multi-modal models such as ColPali in just a few lines of code.

847
catalyst
catalyst curiosity-ai C#

🚀 Catalyst is a C# Natural Language Processing library built for speed. Inspired by spaCy's design, it brings pre-trained models, out-of-the box supp...

846
aiva
aiva kengz JavaScript

AIVA (A.I. Virtual Assistant): General-purpose virtual assistant for developers.

845
seq2seq-chatbot
seq2seq-chatbot tensorlayer Python

Chatbot in 200 lines of code using TensorLayer

842
inltk
inltk goru001 Python

Natural Language Toolkit for Indic Languages aims to provide out of the box support for various NLP tasks that an application developer might need

840
hate-speech-and-offensive-language
hate-speech-and-offensive-language t-davidson Jupyter Notebook

Repository for the paper "Automated Hate Speech Detection and the Problem of Offensive Language", ICWSM 2017

839
Daily-LLM
Daily-LLM zkywsg Python

🔥机器学习/深度学习/Python/大模型/多模态/LLM/deeplearning/Python/Algorithm interview/NLP Tutorial

836
lightNLP
lightNLP smilelight Python

基于Pytorch和torchtext的自然语言处理深度学习框架。

835
ChatIE
ChatIE cocacola-lab Python

The online version is temporarily unavailable because we cannot afford the key. You can clone and run it locally. Note: we set defaul openai key. If k...

828
MemN2N-tensorflow
MemN2N-tensorflow carpedm20 Python

"End-To-End Memory Networks" in Tensorflow

826
BERT-keras
BERT-keras Separius Python

Keras implementation of BERT with pre-trained weights

815
Robby-chatbot
Robby-chatbot yvann-ba Python

AI chatbot 🤖 for chat with CSV, PDF, TXT files 📄 and YTB videos 🎥 | using Langchain🦜 | OpenAI | Streamlit ⚡

813
TextClassification-Keras
TextClassification-Keras ShawnyXiao Python

Text classification models implemented in Keras, including: FastText, TextCNN, TextRNN, TextBiRNN, TextAttBiRNN, HAN, RCNN, RCNNVariant, etc.

812
PURE
PURE princeton-nlp Python

[NAACL 2021] A Frustratingly Easy Approach for Entity and Relation Extraction https://arxiv.org/abs/2010.12812

811
MiNLP
MiNLP XiaoMi Scala

XiaoMi Natural Language Processing Toolkits

811
lingua
lingua pemistahl Kotlin

The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike

807
OCTIS
OCTIS MIND-Lab Python

OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)

802
CodeT5
CodeT5 salesforce Python

Code for CodeT5: a new code-aware pre-trained encoder-decoder model.

800
IncarnaMind
IncarnaMind junruxiong Python

Connect and chat with your multiple documents (pdf and txt) through GPT 3.5, GPT-4 Turbo, Claude and Local Open-Source LLMs

800
trankit
trankit nlp-uoregon Python

Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing

795
long-range-arena
long-range-arena google-research Python

Long Range Arena for Benchmarking Efficient Transformers

788
RocketQA
RocketQA PaddlePaddle Python

🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.

785
lstm-char-cnn-tensorflow
lstm-char-cnn-tensorflow carpedm20 Python

in progress

779
PatrickStar
PatrickStar Tencent Python

PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP and democratizes AI for everyone.

776
lexpredict-lexnlp
lexpredict-lexnlp LexPredict Jupyter Notebook

LexNLP by LexPredict

775
OpenAttack
OpenAttack thunlp Python

An Open-Source Package for Textual Adversarial Attack.

774
RAG-FiT
RAG-FiT IntelLabs Python

Framework for enhancing LLMs for RAG tasks using fine-tuning.

769
RLSeq2Seq
RLSeq2Seq yaserkl Python

Deep Reinforcement Learning For Sequence to Sequence Models

768
awesome-qa
awesome-qa seriousran

😎 A curated list of the Question Answering (QA)

767
awesome_deep_learning_interpretability
awesome_deep_learning_interpretability oneTaken

深度学习近年来关于神经网络模型解释性的相关高引用/顶会论文(附带代码)

766
mordecai
mordecai openeventdata Python

Full text geoparsing as a Python library

763
babyai
babyai mila-iqia Python

BabyAI platform. A testbed for training agents to understand and execute language commands.

760
dbpedia-spotlight
dbpedia-spotlight dbpedia-spotlight Scala

DBpedia Spotlight is a tool for automatically annotating mentions of DBpedia resources in text.

759
text-dedup
text-dedup ChenghaoMou Python

All-in-one text de-duplication

753