Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.
Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.
🧹 Python package for text cleaning
Summarization Papers
[ACL 2023] Reasoning with Language Model Prompting: A Survey
Python AI assistant 🧠
Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料
Taranis AI is an advanced Open-Source Intelligence (OSINT) tool, leveraging Artificial Intelligence to revolutionize information gathering and situati...
SPLADE: sparse neural search (SIGIR21, SIGIR22)
中文法律LLaMA (LLaMA for Chinese legel domain)
Source code of K-BERT (AAAI2020)
A Tensorflow implementation of QANet for machine reading comprehension
🔍 LangKit: An open-source toolkit for monitoring Large Language Models (LLMs). 📚 Extracts signals from prompts & responses, ensuring safety & securi...
한국어 자연어처리를 위한 파이썬 라이브러리입니다. 단어 추출/ 토크나이저 / 품사판별/ 전처리의 기능을 제공합니다.
This is the Plato Research Dialogue System, a flexible platform for developing conversational AI agents.
Unsupervised text tokenizer focused on computational efficiency
A Framework of Small-scale Large Multimodal Models
Pretrained model hub for Keras 3.
Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A
Pre-training of Deep Bidirectional Transformers for Language Understanding: pre-train TextCNN
Rasa UI is a frontend for the Rasa Framework
Academic Survey Paper Generation.
A tool for learning vector representations of words and entities from Wikipedia
Official implementation of the papers "GECToR – Grammatical Error Correction: Tag, Not Rewrite" (BEA-20) and "Text Simplification by Tagging" (BEA-21)
Transformer models from BERT to GPT-4, environments from Hugging Face to OpenAI. Fine-tuning, training, and prompt engineering examples. A bonus secti...
Bolt is a deep learning library with high performance and heterogeneous flexibility.
爬取百度百科中文页面,抽取三元组信息,构建中文知识图谱
A simple resume parser used for extracting information from resumes
Simple implementations of NLP models. Tutorials are written in Chinese on my website https://mofanpy.com
使用python抓取微博数据并对微博文本分析和可视化,LDA(树图)、关系图、词云、时间趋势(折线图)、热度地图、词典情感分析(饼图和3D柱状图)、词向量神经网...
A curated list of resources dedicated to Python libraries, LLMs, dictionaries, and corpora of NLP for Japanese
精选了10K+项目,包括机器学习、深度学习、NLP、GNN、推荐系统、生物医药、机器视觉、前后端开发等内容。Selected more than 10k+ projects, including machine...
😀😄😂😭 A curated list of Sentiment Analysis methods, implementations and misc. 😥😟😱😤
This repo contains a PyTorch implementation of a pretrained BERT model for multi-label text classification.
LAMA - automatic model creation framework
The Jieba Chinese Word Segmentation Implemented in Rust
Jcseg is a light weight NLP framework developed with Java. Provide CJK and English segmentation based on MMSEG algorithm, With also keywords extractio...
MindSpore + 🤗Huggingface: Run any Transformers/Diffusers model on MindSpore with seamless compatibility and acceleration.
pytorch implementation of "Get To The Point: Summarization with Pointer-Generator Networks"
resources for graph convolutional networks (图卷积神经网络相关资源)
A curated collection of iOS, ML, AR resources sprinkled with some UI additions
TextGAN is a PyTorch framework for Generative Adversarial Networks (GANs) based text generation models.
OO for LLMs
High-accuracy NLP parser with models for 11 languages.
[中文法律大模型] DISC-LawLLM: an intelligent legal system powered by large language models (LLMs) to provide a wide range of legal services.
The Classical Language Toolkit
Similarities: a toolkit for similarity calculation and semantic search. 相似度计算、匹配搜索工具包,支持亿级数据文搜文、文搜图、图搜图,python3开发,...
🤖 A PyTorch library of curated Transformer models and their composable components
All the slides, accompanying code and exercises all stored in this repo. 🎈
🎯🗯 Dataset generation for AI chatbots, NLP tasks, named entity recognition or text classification models using a simple DSL!