Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.
Indic-BERT-v1: BERT-based Multilingual Model for 11 Indic Languages and Indian-English. For latest Indic-BERT v2, check: https://github.com/AI4Bharat/...
Prosodic: a metrical-phonological parser, written in Python. For English and Finnish, with flexible language support.
All NLP you Need Here. 目前包含15个NLP demo的pytorch实现(大量代码借鉴于其他开源项目,原先是自己玩的,后来干脆也开源出来)
The hanzi similar tool.(汉字相似度计算工具,中文形近字算法。可用于手写汉字识别纠正,文本混淆等。)
Komputation is a neural network framework for the Java Virtual Machine written in Kotlin and CUDA C.
NLP library designed for reproducible experimentation management
LDA topic modeling for node.js
Data Augmentation for NLP. NLP数据增强
ML based projects such as Spam Classification, Time Series Analysis, Text Classification using Random Forest, Deep Learning, Bayesian, Xgboost in Pyth...
Machine Learning and NLP: Text Classification using python, scikit-learn and NLTK
Text analysis with networks.
RETVec is an efficient, multilingual, and adversarially-robust text vectorizer.
This repo consists of multiple machine learning based projects with frontend
CNN/Daily Mail Reading Comprehension Task
This repository is the collection of research papers in Deep learning, computer vision and NLP.
最强接口测试平台
BOND: BERT-Assisted Open-Domain Name Entity Recognition with Distant Supervision
A modern, interlingual wordnet interface for Python
Neural Sentiment Classification
Japanese text normalizer for mecab-neologd
RNNSharp is a toolkit of deep recurrent neural network which is widely used for many different kinds of tasks, such as sequence labeling, sequence-to-...
Web scrapping and related analytics using Python tools
:owl: Snow Owl Terminology Server - a production-ready, scalable, FHIR Terminology Service compliant server that supports SNOMED CT International and...
Implementation of Chinese ChatGPT
State-of-the-art, lightweight NLP tools for Turkish language. Developed by VNGRS.
A Topic Modeling System Toolkit (ACL 2024 Demo)
LanguageCrunch NLP server docker image
NLP for human. A fast and easy-to-use natural language processing (NLP) toolkit, satisfying your imagination about NLP.
Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition
InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning
Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.
The code to reproduce results from paper "MultiFiT: Efficient Multi-lingual Language Model Fine-tuning" https://arxiv.org/abs/1909.04761
A Repo to store the Google Colaboratory Notebooks that I have created and shared
spaCy pipeline object for negating concepts in text
Data Science E-books, Interview Resources and Cheat-sheets
Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity R...
PyTorch implementation of batched bi-RNN encoder and attention-decoder.
A curated list of NLP resources for Hungarian
Agentic AI platform that harnesses Visual LLM Chaining to build proactive digital assistants
Tutorial: Natural Language Processing in Python
A Package of Keyphrase Extraction and Social Tag Suggestion
Deep learning with text doesn't have to be scary.
Important paper implementations for Question Answering using PyTorch
Fuzzy and semantic search for captioned YouTube videos.
A web-based document annotation tool, powered by GPT-4 :rocket:
ExtremeBERT is a toolkit that accelerates the pretraining of customized language models on customized datasets, described in the paper “ExtremeBERT: A...
📛 Fuzzy Name Matching with Machine Learning
Set of vectorizers that extract keyphrases with part-of-speech patterns from a collection of text documents and convert them into a document-keyphrase...
A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way :chestnut:
Implementation of the paper: Text Segmentation as a Supervised Learning Task