Explore a comprehensive collection of basic theories, applications, papers, and best practices about Large Language Models (LLMs) in genomes.
Base Classes and Functions for Mass Spectrometry and Proteomics
Kun-peng: an ultra-fast, low-memory footprint and accurate taxonomy classifier for all
PANDORA :computer:
Bayesian genotyper for structural variants
Plasmid and primer design software
BWA-MEME: Faster BWA-MEM2 using learned-index
Fast indexing and search of discontinuous motifs in protein structures
[ICLR'25] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery
Cross-type Biomedical Named Entity Recognition with Deep Multi-task Learning (Bioinformatics'19)
Tools for single-cell data processing
Open-source usearch
Comparative Genomics Toolkit 3
APBS - software for biomolecular electrostatics and solvation
python library for working with ontologies and ontology associations
Gene fusion detection and visualization
Parser and database to index the terpene profile of different strains of Cannabis from online databases
An open-access bioinformatics text
BLASR: The PacBio® long read aligner
An app store for scientific workflows, tools, notebooks, and services
MyGene.info: A BioThings API for gene annotations
Extract 3D contacts (.pairs) from sequencing alignments
Bioinformatics toolkits for manipulating sequence, alignment, and phylogenetic tree files
Nature Communications | BASALT (Binning Across a Series of Assemblies Toolkit) for binning and refinement of short- and long-read sequencing data
A high-performance, pure Rust toolkit for standardizing and preparing biomolecular systems (proteins & nucleic acids). It heals missing atoms, resolve...
Cute tricks for SIMD vectorized binary encoding and decoding of nucleotides, in Rust.
PyMOL extension to color AlphaFold structures by confidence (pLDDT).
Ultrafast, comprehensive peptide identification for mass spectrometry–based proteomics
SquiggleKit: A toolkit for manipulating nanopore signal data
Software for biomolecular electrostatics and solvation calculations
Explainability techniques for Graph Networks, applied to a synthetic dataset and an organic chemistry task. Code for the workshop paper "Explainabilit...
BWT construction and search
Generate duplex/single consensus reads to reduce sequencing noises and remove duplications
Conversational & memory-enabled AI research partner for multi-omics analysis. From biological idea to full research paper.
An NGS read trimming tool that is specific, sensitive, and speedy. (production)
CCS: Generate Highly Accurate Single-Molecule Consensus Reads (HiFi Reads)
197 bioinformatics & life science skills for Claude Code and AI agents — BixBench 92.0% accuracy. RNA-seq, single-cell, drug discovery, proteomics, an...
Apache cTAKES is a Natural Language Processing (NLP) platform for clinical text.
:rocket: seqfu - Sequece Fastx Utilities
Calculation of interatomic interactions in molecular structures
Current Challenges and Best Practice Protocols for Microbiome Analysis using Amplicon and Metagenomic Sequencing
PeptideProphet, PTMProphet, ProteinProphet, iProphet, Abacus, and FDR filtering
A tool for cell instance aware segmentation in densely packed 3D volumetric images
BioT5 (EMNLP 2023) and BioT5+ (ACL 2024 Findings)
A bioinformatic toolkit to align genome assemblies into pangenome graphs
Get assembly statistics from FASTA and FASTQ files
A fast 23andMe DNA parser and inferrer for Python
Bio4j abstract model and general entry point to the project
MOLGENIS - for scientific data: management, exploration, integration and analysis.
Mining CRISPRs in Environmental Datasets