Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.
📚 Process PDFs, Word documents and more with spaCy
unified embedding model
Python bindings to libpostal for fast international address parsing/normalization
Toolkit for fine-tuning, ablating and unit-testing open-source LLMs.
Deep research agent to help you find the best GitHub repositories 🕵️!
Github repo with tutorials to fine tune transformers for diff NLP tasks
LLM-based ontological extraction tools, including SPIRES
This repo contains my coursework, assignments, and Slides for Natural Language Processing Specialization by deeplearning.ai on Coursera
Backend that powers the dataset viewer on Hugging Face dataset pages through a public API.
A language detection library for PHP. Detects the language from a given text string.
Crawl BookCorpus
An Eigen-based, light-weight C++ Interface to Nonlinear Programming Solvers (Ipopt, Snopt)
👑 spaCy building blocks and visualizers for Streamlit apps
This repository introduces PIXIU, an open-source resource featuring the first financial large language models (LLMs), instruction tuning data, and eva...
We introduced a new model designed for the Code generation task. Its test accuracy on the HumanEval base dataset surpasses that of GPT-4 Turbo (April...
[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data generation pi...
Use late-interaction multi-modal models such as ColPali in just a few lines of code.
🚀 Catalyst is a C# Natural Language Processing library built for speed. Inspired by spaCy's design, it brings pre-trained models, out-of-the box supp...
AIVA (A.I. Virtual Assistant): General-purpose virtual assistant for developers.
Chatbot in 200 lines of code using TensorLayer
Natural Language Toolkit for Indic Languages aims to provide out of the box support for various NLP tasks that an application developer might need
Repository for the paper "Automated Hate Speech Detection and the Problem of Offensive Language", ICWSM 2017
🔥机器学习/深度学习/Python/大模型/多模态/LLM/deeplearning/Python/Algorithm interview/NLP Tutorial
基于Pytorch和torchtext的自然语言处理深度学习框架。
The online version is temporarily unavailable because we cannot afford the key. You can clone and run it locally. Note: we set defaul openai key. If k...
"End-To-End Memory Networks" in Tensorflow
Keras implementation of BERT with pre-trained weights
AI chatbot 🤖 for chat with CSV, PDF, TXT files 📄 and YTB videos 🎥 | using Langchain🦜 | OpenAI | Streamlit ⚡
Text classification models implemented in Keras, including: FastText, TextCNN, TextRNN, TextBiRNN, TextAttBiRNN, HAN, RCNN, RCNNVariant, etc.
[NAACL 2021] A Frustratingly Easy Approach for Entity and Relation Extraction https://arxiv.org/abs/2010.12812
XiaoMi Natural Language Processing Toolkits
The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike
OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)
Code for CodeT5: a new code-aware pre-trained encoder-decoder model.
Connect and chat with your multiple documents (pdf and txt) through GPT 3.5, GPT-4 Turbo, Claude and Local Open-Source LLMs
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
Long Range Arena for Benchmarking Efficient Transformers
🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.
in progress
PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP and democratizes AI for everyone.
LexNLP by LexPredict
An Open-Source Package for Textual Adversarial Attack.
Framework for enhancing LLMs for RAG tasks using fine-tuning.
Deep Reinforcement Learning For Sequence to Sequence Models
😎 A curated list of the Question Answering (QA)
深度学习近年来关于神经网络模型解释性的相关高引用/顶会论文(附带代码)
Full text geoparsing as a Python library
BabyAI platform. A testbed for training agents to understand and execute language commands.
DBpedia Spotlight is a tool for automatically annotating mentions of DBpedia resources in text.
All-in-one text de-duplication