Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.
RETIRED - OpenSTT is now retired. If you would like more information on Mycroft AI's open source STT projects, please visit:
📄 A repo containing notes and discussions for our weekly NLP/ML paper discussions.
Xcode Playground Sample Code for the Flight School Guide to Swift Strings
PubMed 200k RCT dataset: a large dataset for sequential sentence classification.
Models for automatic abstractive summarization
Self-training with Weak Supervision (NAACL 2021)
(somewhat) cleaned-up notebooks used in researching public comments for FCC Proceeding 17-108 (Net Neutrality Repeal)
"Bootstrapping Relationship Extractors with Distributional Semantics" (Batista et al., 2015) in EMNLP'15 - Python implementation
Japanese Natural Langauge Processing Libraries
Notebooks for the Seattle PyData 2017 talk on Scattertext
INDRA (Integrated Network and Dynamical Reasoning Assembler) is an automated model assembly system interfacing with NLP systems and databases to colle...
Ruby SDK for Dialogflow
Language Lego
Natural Language Processing Chatbot for RocketChat
Code for the paper "Are Sixteen Heads Really Better than One?"
Unilm for Chinese Chitchat Robot.基于Unilm模型的夸夸式闲聊机器人项目。
Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application with a focus on embedded systems.
Content Enhanced BERT-based Text-to-SQL Generation https://arxiv.org/abs/1910.07179
MATILDA: Multi-AnnoTator multi-language Interactive Lightweight Dialogue Annotator
A baseline implementation for FNC-1
Convert number words (eg. twenty one) to numeric digits (21)
Natural language detection package in pure Go
Stanford NLP group's shared Python tools.
A PyTorch implementation of Mnemonic Reader for the Machine Comprehension task
Python package containing all custom layers used in Neural Networks (Compatible with PyTorch, TensorFlow and MegEngine)
Solution to Kaggle's Quora Duplicate Question Detection Competition
A fast and accurate POS and morphological tagging toolkit (EACL 2014)
Gitbook Address: https://app.gitbook.com/@nlpgroup/s/nlpnote/
A curated list of Clojure resources for dealing with domain-specific languages.
Natural Language Processing For Everyone
Python wrapper for Stanford CoreNLP's SUTime
Lightweight, Python library for fast and reproducible experimentation :microscope:
Source codes and corpora of paper "Iterated Dilated Convolutions for Chinese Word Segmentation"
An example for applying FusionNet to Natural Language Inference
🇨🇳🇬🇧Chinese and English word spelling corrector.(中文易错别字检测,中文拼写检测纠正。英文单词拼写校验工具)
Educational material on using the TensorFlow Estimator framework for text classification
瑞金医院MMC人工智能辅助构建知识图谱大赛初赛
Corpus of Russian news articles collected from Lenta.Ru
Quickly turn command-line applications into RESTful webservices with a web-application front-end. You provide a specification of your command line app...
TensorFlow implementation of Match-LSTM and Answer pointer for the popular SQuAD dataset.
bert chinese similarity
:newspaper: High-performance tool for negation and uncertainty detection in radiology reports
📝 A list of pre-trained BERT models for Japanese with word/subword tokenization + vocabulary construction algorithm information
The official implementation of ACL 2019 paper "Topic-Aware Neural Keyphrase Generation for Social Media Language"
NLPGym - A toolkit to develop RL agents to solve NLP tasks.
List of textual data sources to be used for text mining in R
:smile: Dataset for Emotion Classification
Pytorch implementation of Paragraph-level Neural Question Generation with Maxout Pointer and Gated Self-attention Networks
aim to use JapaneseTokenizer as easy as possible
Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages oth...