Topic

parser

Repositories (1381)

MinerU
MinerU opendatalab Python

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

60.9k
marked
marked markedjs JavaScript

A markdown parser and compiler. Built for speed.

36.8k
swc
swc swc-project Rust

Rust-based platform for the Web

33.4k
cheerio
cheerio cheeriojs TypeScript

The fast, flexible, and elegant library for parsing and manipulating HTML and XML.

30.3k
postcss
postcss postcss TypeScript

Transforming styles with JS plugins

29k
tree-sitter
tree-sitter tree-sitter Rust

An incremental parsing system for programming tools

25k
loguru
loguru Delgan Python

Python logging made (stupidly) simple

23.8k
vector
vector vectordotdev Rust

A high-performance observability data pipeline.

21.7k
oxc
oxc oxc-project Rust

⚓ A collection of high-performance JavaScript tools.

20.9k
PHP-Parser
PHP-Parser nikic PHP

A PHP parser written in PHP

17.4k
parsedown
parsedown erusev PHP

Better Markdown Parser in PHP

15k
go
go json-iterator Go

A high-performance 100% compatible drop-in replacement of "encoding/json"

13.9k
jsoup
jsoup jhy Java

jsoup: the Java HTML parser, built for HTML editing, cleaning, scraping, and XSS safety.

11.4k
craftinginterpreters
craftinginterpreters munificent HTML

Repository for the book "Crafting Interpreters"

10.7k
nom
nom rust-bakery Rust

Rust parser combinator framework

10.4k
terser
terser terser JavaScript

🗜 JavaScript parser, mangler and compressor toolkit for ES6+

9.3k
sqlglot
sqlglot tobymao Python

Python SQL Parser and Transpiler

9.2k
Dolphin
Dolphin bytedance Python

The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.

8.9k
sh
sh mvdan Go

A shell parser, formatter, and interpreter with bash and zsh support; includes shfmt

8.7k
dasel
dasel TomWright Go

Select, put and delete data from JSON, TOML, YAML, XML, INI, HCL and CSV files with a single tool. Also available as a go mod.

7.9k
lightningcss
lightningcss parcel-bundler Rust

An extremely fast CSS parser, transformer, bundler, and minifier written in Rust.

7.5k
MegaParse
MegaParse QuivrHQ Python

File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.

7.4k
boa
boa boa-dev Rust

Boa is an embeddable Javascript engine written in Rust.

7.2k
esprima
esprima jquery TypeScript

ECMAScript parsing infrastructure for multipurpose analysis

7.1k
pdfminer.six
pdfminer.six pdfminer Python

Community maintained fork of pdfminer - we fathom PDF

7k
MailKit
MailKit jstedfast C#

A cross-platform .NET library for IMAP, POP3, and SMTP.

6.8k
astexplorer
astexplorer fkling JavaScript

A web tool to explore the ASTs generated by various parsers.

6.5k
javaparser
javaparser javaparser Java

Java 1-25 Parser and Abstract Syntax Tree for Java with advanced analysis functionalities.

6.1k
JSqlParser
JSqlParser JSQLParser Java

JSqlParser parses an SQL statement and translate it into a hierarchy of Java classes. The generated hierarchy can be navigated using the Visitor Patte...

5.9k
lark
lark lark-parser Python

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.

5.9k
remarkable
remarkable jonschlinkert JavaScript

Markdown parser, done right. Commonmark support, extensions, syntax plugins, high speed - all in one. Gulp and metalsmith plugins available. Used by F...

5.8k
parser
parser postlight JavaScript

📜 Extract meaningful content from the chaos of a web page

5.8k
jsonparser
jsonparser buger Go

One of the fastest alternative JSON parser for Go that does not require schema

5.6k
ohm
ohm ohmjs JavaScript

A library and language for building parsers, interpreters, compilers, etc.

5.5k
body-parser
body-parser expressjs JavaScript

Node.js body parsing middleware

5.5k
AngleSharp
AngleSharp AngleSharp C#

:angel: The ultimate angle brackets parser library parsing HTML5, MathML, SVG and CSS to construct a DOM based on the official W3C specifications.

5.5k
LIEF
LIEF lief-project C++

LIEF - Library to Instrument Executable Formats (C++, Python, Rust)

5.4k
picocli
picocli remkop Java

Picocli is a modern framework for building powerful, user-friendly, GraalVM-enabled command line apps with ease. It supports colors, autocompletion, s...

5.4k
tsdoc
tsdoc microsoft TypeScript

A doc comment standard for TypeScript

4.9k
globalize
globalize globalizejs JavaScript

A JavaScript library for internationalization and localization that leverages the official Unicode CLDR JSON data

4.8k
htmlparser2
htmlparser2 fb55 TypeScript

The fast & forgiving HTML and XML parser

4.8k
json_repair
json_repair mangiucugna Python

Repair malformed JSON from LLMs, APIs, logs, and user input in Python.

4.7k
sweet-core
sweet-core sweet-js JavaScript

Sweeten your JavaScript.

4.6k
chumsky
chumsky zesterer Rust

[Chumsky has moved to Codeberg!] Write expressive, high-performance parsers with ease.

4.5k
ExcelDataReader
ExcelDataReader ExcelDataReader C#

Lightweight and fast library written in C# for reading Microsoft Excel files

4.4k
node-csv
node-csv adaltas JavaScript

Full featured CSV parser with simple api and tested against large datasets.

4.3k
bhai-lang
bhai-lang DulLabs TypeScript

A toy programming language written in Typescript

4.1k
dev-blog
dev-blog nixzhu

翻译、开发心得或学习笔记

3.9k
snoop
snoop snooppr Python

Snoop — инструмент разведки на основе открытых данных (OSINT world)

3.9k
parse5
parse5 inikulin TypeScript

HTML parsing/serialization toolset for Node.js. WHATWG HTML Living Standard (aka HTML5)-compliant.

3.9k