Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.
NLP pipeline using word2vec (preprocessing/embedding/prediction/clustering)
[EMNLP 2019] Mixture Content Selection for Diverse Sequence Generation (Question Generation / Abstractive Summarization)
My solution to Kaggle Quora Question Pairs competition (Top 2%, Private LB log loss 0.13497).
Code for NAACL 2019 paper: Distant Supervision Relation Extraction with Intra-Bag and Inter-Bag Attentions
TOEIC(Test of English for International Communication) solving using pytorch-pretrained-BERT model.
On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines
自注意力与文本分类
This is a list of open-source projects at Microsoft Research NLP Group
Accelerated NLP pipelines for fast inference on CPU. Built with Transformers and ONNX runtime.
PyTorch Language Model for 1-Billion Word (LM1B / GBW) Dataset
Streaming tweets with spark, language detection & sentiment analysis, dashboard with Kibana
Unofficial implementation of Perceiver IO
Multilingual Retrieval on Yelp Search Engine ⚡
English-French Machine Language Translation in Tensorflow
A PureScript, browser-based implementation of LDA topic modeling.
Interpretable Models for NLP using PyTorch
NLP framework in python for entity recognition and relationship extraction
A collection of PyTorch notebooks for learning and practicing deep learning
Text summarizer for golang using LexRank
FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.github.io/foli...
Source code for "Head-Driven Phrase Structure Grammar Parsing on Penn Treebank" published at ACL 2019
R wrapper for fastText
The Self-dialogue Corpus - a collection of self-dialogues across music, movies and sports
Botfuel SDK to build highly conversational chatbots
One million English sentences, each split into two sentences that together preserve the original meaning, extracted from Wikipedia edits.
A natural language event parser for java and android.
Applying NLP transfer learning techniques to predict Tweet stance toward a topic
R package to Embed All the Things! using StarSpace
[AKBC 19] Improving Relation Extraction by Pre-trained Language Representations
Lemmatizer for text in English. Inspired by Python's nltk.corpus.reader.wordnet.morphy
Build AI-powered semantic search applications in JavaScript
Big Data Analytics Using Interactive Workflows
PyTorch tutorials A to Z
Uses a distributed word representation to finds words along the hyperchord of two input words.
Implementation of ULMFit algorithm for text classification via transfer learning
Tensorflow and Keras implementation of the state of the art researches in Dialog System NLU
Relation Extraction using Deep learning(CNN)
Excel Integration with spaCy. Training NER using Excel/XLSX from PDF, DOCX, PPT, PNG or JPG.
NLP research experiments, built on PyTorch within the AllenNLP framework.
Methods about Deep Learning for Text Matching
:no_entry: ARCHIVED :no_entry: Accesses the Monkeylearn API for Text Classifiers and Extractors
Make it easier to compare and cross-reference the names of companies and people by applying strong normalisation.
This is where I put all my work in Natural Language Processing
Cross-Lingual Alignment of Contextual Word Embeddings
Tokenizers and lemmatizers for Go
LockeBot: a demonstration of implementing a basic question answering bot with use of Rasa and a database
Implementation of algorithm in keyword extraction,including TextRank,TF-IDF and the combination of both
document classification using LSTM + self attention
word2vec++ is a Distributed Representations of Words (word2vec) library and tools implementation, written in C++11 from the scratch
:pencil2: Hunspell extension for spaCy 2.0.