Most popular nlp repositories and open source projects

Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.

ZZZ-RETIRED__openstt

RETIRED - OpenSTT is now retired. If you would like more information o...

11   145   145  

ml-nlp-paper-discussions

📄 A repo containing notes and discussions for our weekly NLP/ML paper...

9   145   145  

Guide-to-Swift-Strings-Sample-Code

Xcode Playground Sample Code for the Flight School Guide to Swift Stri...

10   145   145  

pubmed-rct

PubMed 200k RCT dataset: a large dataset for sequential sentence class...

40   145   145  

summarus

Models for automatic abstractive summarization

22   145   145  

ASTRA

Self-training with Weak Supervision (NAACL 2021)

19   145   145  

fcc_nn_research

(somewhat) cleaned-up notebooks used in researching public comments fo...

24   143   143  

BREDS

"Bootstrapping Relationship Extractors with Distributional Semantics"...

38   143   143  

jProcessing

Japanese Natural Langauge Processing Libraries

31   142   142  

Scattertext-PyData

Notebooks for the Seattle PyData 2017 talk on Scattertext

53   142   142  

indra

INDRA (Integrated Network and Dynamical Reasoning Assembler) is an aut...

55   142   142  

dialogflow-ruby-client

Ruby SDK for Dialogflow

29   141   141  

Lango

Language Lego

15   141   141  

hubot-natural

Natural Language Processing Chatbot for RocketChat

44   140   140  

are-16-heads-really-better-than-1

Code for the paper "Are Sixteen Heads Really Better than One?"

14   140   140  

UnilmChatchitRobot

Unilm for Chinese Chitchat Robot.基于Unilm模型的夸夸式闲聊机器人项目。

27   140   140  

spokestack-python

Spokestack is a library that allows a user to easily incorporate a voi...

14   139   139  

NL2SQL-RULE

Content Enhanced BERT-based Text-to-SQL Generation https://arxiv.org/a...

42   139   139  

matilda

MATILDA: Multi-AnnoTator multi-language Interactive Lightweight Dialog...

31   138   138  

fnc-1-baseline

A baseline implementation for FNC-1

103   138   138  

w2n

Convert number words (eg. twenty one) to numeric digits (21)

62   138   138  

getlang

Natural language detection package in pure Go

20   138   138  

stanza-old

Stanford NLP group's shared Python tools.

34   137   137  

MnemonicReader

A PyTorch implementation of Mnemonic Reader for the Machine Comprehens...

40   137   137  

Echo

Python package containing all custom layers used in Neural Networks (C...

29   137   137  

kaggle-quora-dup

Solution to Kaggle's Quora Duplicate Question Detection Competition

51   137   137  

RDRPOSTagger

A fast and accurate POS and morphological tagging toolkit (EACL 2014)

49   137   137  

NLPnote

Gitbook Address: https://app.gitbook.com/@nlpgroup/s/nlpnote/

69   137   137  

clojure-dsl-resources

A curated list of Clojure resources for dealing with domain-specific l...

2   136   136  

NLP

Natural Language Processing For Everyone

100   136   136  

python-sutime

Python wrapper for Stanford CoreNLP's SUTime

40   135   135  

steppy

Lightweight, Python library for fast and reproducible experimentation...

32   134   134  

ID-CNN-CWS

Source codes and corpora of paper "Iterated Dilated Convolutions for C...

41   133   133  

FusionNet-NLI

An example for applying FusionNet to Natural Language Inference

38   133   133  

word-checker

🇨🇳🇬🇧Chinese and English word spelling corrector.(中文易错别字检测,中...

35   133   133  

nlp_estimator_tutorial

Educational material on using the TensorFlow Estimator framework for t...

53   132   132  

ruijin_round1

瑞金医院MMC人工智能辅助构建知识图谱大赛初赛

30   132   132  

Lenta.Ru-News-Dataset

Corpus of Russian news articles collected from Lenta.Ru

20   132   132  

clam

Quickly turn command-line applications into RESTful webservices with a...

17   132   132  

Question-Answering

TensorFlow implementation of Match-LSTM and Answer pointer for the pop...

70   131   131  

chinese-law-bert-similarity

bert chinese similarity

31   131   131  

NegBio

:newspaper: High-performance tool for negation and uncertainty detecti...

35   131   131  

awesome-bert-japanese

📝 A list of pre-trained BERT models for Japanese with word/subword to...

7   131   131  

TAKG

The official implementation of ACL 2019 paper "Topic-Aware Neural Keyp...

31   130   130  

nlp-gym

NLPGym - A toolkit to develop RL agents to solve NLP tasks.

12   130   130  

R-text-data

List of textual data sources to be used for text mining in R

14   130   130  

emotion_dataset

:smile: Dataset for Emotion Classification

16   130   130  

neural-question-generation

Pytorch implementation of Paragraph-level Neural Question Generation...

31   129   129  

JapaneseTokenizers

aim to use JapaneseTokenizer as easy as possible

21   128   128  

phrase-at-scale

Detect common phrases in large amounts of text using a data-driven app...

45   128   128