Web Resources
Human Gene/Genome Annotations
- CCDS The concensus protein coding regions among NCBI, Ensembl, and Sanger (Havana) annotation
- GENCODE The Encyclopedia of Genes
Human Variome Resources
- Variation DBs
- ExAC (Exome Aggregation Consortium) Exome variation data from >60k individuals
- 1000 Genome Project Catalog of 60 million variant sites (SNV, CNV, SV), 2535 individuals from 26 populations
- UK10K Sequencing 10,000 people (4,000 healthy, 6,000 disease) in England
- Genomics England Sequencing 100,000 people in England focusing on patients with a rare disease and their families and patients with cancer.
- DiscovEHR Collaboration between the Regeneron Genetics Center (WES) and Geisinger Health System (EHR) provides the vcf by 50,000 MyCode participants
- European Variation Archive Most comprehensive and organized by studies (include Clinical variants)
- NCBI Variation Variation DBs (dbSNP, dbVar, dbGaP, ClinVar)
- iJGVD Integrative Japanese Genome Variation Database
- HGV Database The HGV database is a fully searchable online database of genome variations published in peer-reviewed Data Reports in Human Genome Variation
- Variant functional effect estimation
- Eigen Assign functional important scores on genetic variants in coding and noncoding regions (human only, unsupervised integration)
- CADD: Combined Annotation Dependent Depletion a tool for scoring the deleteriousness of SNV and indels (human only, supervised integration)
- GWAVA: Genome Wide Annotation of VAriants a tool which aims to predict the functional impact of non-coding genetic variants (human only, supervised integration)
- VEP Variant Effect Predictor by EBI (very easy to install and use)
- Condel Variant effect score by integration of SIFT, Polyphen2, Massessor, MAPP, Logre
- SIFT(Sorting Intolerent from Tolerent substitution)
- SIFT4G SIFT for many genomes
- PolyPhen-2 (Polymorphism Phenotyping v2) for human coding region only
- RegulomeDB Exploring DNA functional elements for noncoding variants (by Stanford, Snyder lab)
- HaplogReg Exploring DNA functional elements for noncoding variants (by MIT, Kellis lab)
Human Genotype-to-Phenotype Resources
- CGD Clinical Genomic Database
- HGMD The human gene mutation database (The professional version of DB is commercial. The public version of DB is not downloadable.)
- OMIM Germline mutations for genetic diseases
- Roche Cancer Genome Database (RCGDB) Germline/somatic mutations for cancer collected from diverse resourses (not downloadable)
- IDbase Human Immunodeficiency-causing mutation database
- UK Biobank Genotype and extensive phenotype data for ~500k UK people
- GWASdb includes moderate SNPs (p-value < 10^-3) with manual curation from original papers; manually mapped ~1600 GWAS traits to ~500 HPO terms, ~440 DO terms, ~230 DOLite terms
- European Genome-phenome Archive(EGA) Raw data of GWAS, WGS, Exome-seq. A great resource for meta-analysis
- dbGaP The database of Genotypes and Phenotypes (GWAS, WGS, Exome-seq...)
- NCBI ClinVar human variations and their relations to the human health (Not includes unreviewed data from GWAS)
- GWAS catalog now maintained by EBI
- PheGenI Phenotype-Genotype Integrator
- Genome-wide Repository of Associations between SNPs and Phenotypes (GRASP) Better than GWAS catalog, including eQTL,QTLs
- COGS nature resources CollaborativeOncological Gene-environment Study (GOGS): Association study using ~211,000SNPs (iCOGS) for breast, ovarian, prostate cancers.
- DistiLD Diseases and Traits in Linkage Disequilibrium Blocks
- Personal Genome Project
- Human Functional Genomics Project Raw data are available from BBMRI-NL data infrastructure
- NCBI GTex(Genotype-Tissue Expression) browser eQTL data download and analysis
- PGC GWAS rawdata download contains links for other GWAS raw data
- DECIPHER Developmental Diseases to Phenotypes database with public patients (very useful for rare disease genetics research)
Human Pathway/Signature genes and Interactome DBs
- Pathway DBs
- Pathguide.org A very comprehensive list of pathway and network databases
- Gene Ontology by Gene Ontology Consortium
- KEGG pathways and many more
- Biocyc includes Metacyc, Ecocyc, Humancyc, Aracyc, Yeastcyc
- Reactome A manually curated and peer-reviewed pathway DB
- Pathway Interaction Database (PID) Human pathways curated by NCI-Nature/imported from BioCarta/Reactome
- CORUM Comprehensive Resource of Mammalian Protein Complexes
- NetPath A database for signaling pathways (cancer/immune signaling pathways)
- SIGNOR 11000 manually-annotated causal relationships between proteins that participate in signal transduction
- UniProt-GOA by EBI (support multi-species annotation)
- UniPathway a fully manually curated resource of metabolic pathways (cross-linked to KEGG, MetaCyc)
- Signature Gene Set DBs
- MsigDB License required for redistribution
- GeneSigDB
- DSigDB Drug signature database for gene set analysis
- L1000CDS2 Return 50 signature genes for each LINCS L1000 data set using Characteristic Direction (CD) method
- CREEDS CRowd Extracted Expression of Differential Signatures: Signature gene sets from GEO selected by crowdsourcing project using CD method
- Interactome DBs
- iRefWeb a web interface to PPI consolidated from 10 public DB (BIND, BioGRID, CORUM, DIP,IntAct, HPRD, MINT, MPact, MPPI, OPHID(predicted PPIs))
- STRING Known and predicted PPI
- Human Reference Interactome Project Y2H-based human protein interactions
Epigenome and Cistrome Resources
- Epigenomics Consortium projects
- ENCODE Encyclopedia of DNA Elements project (human)
- Road map Epigenomics NIH Roda map Epigenomics project home
- International Human Epigenome Consortium (IHEC) The umbrella organization for international epigenomic efforts
- 4D Nucleome To understand the principles behind the 3D organization of the nucleus in space and time (the 4th dimension)
- TF binding motif DB
- CIS-BP (Catalog of Inferred Sequence Binding Preferences) >300 species, >250 TF families, >160,000 TFs. CisBP collects data from >25 sources, including other database such as HOCOMOCO JASPAR UNIPROBE TRANSFAC
- Promoter DB
- EPD Eukaryotic Promoter Database; Databases of experimentally validated (by either publication or in-house assay) promoters in various organisms
- Enhancer DB
- Enhancer Atlas Human enhancers based on >=3 independent high-throughput experimental datasets (contains 2,534,123 enhancers for 76 cell lines and 29 tissues)
- dbSUPER contains 82,234 super-enhancers in 102 human and 25 mouse tissue/cell types
- Transcriptional Start Site (TSS) DB
- DBTTS contains 491 million TSS tag sequences for 20 tissues and 7 cell cultures in human and mouse
- Chip-seq/DNase-seq DB
- Cistrome DB the most comprehensive DB for Chip-seq and DNase-seq data
- Enhancer-Promoter Interaction DB
- JEME Computationally inferred EPI networks for 935 human primary cells, tissues, and cell lines
miRNA Regulome Resources
- microRNA list and expression atlas
- miRBase miRNA database by Manchester University
- microRNA.org download miRNA expression atlas for human, mouse, rat
- microRNAome microRNA RNA-seq based atlas for 46 primary cell types and 42 cancer or immortalized cell lines
- microRNA-target links (Gold standard)
- miRWalk2.0 Validated links from 4 databases and text minings, Predicted links from 13 prediction data sets
- miRTarBase Experimental-based microRNA-target links (most popular)
- microRNA-disease
- Human microRNA Disease Database(HMDD) Manually curated microRNA-disease links (most comprehensive)
- PhenomiR DB for dysregulated miRNA in diseases
- dbDEMC DB for dysregulated miRNA in Cancer
- miRGator data for miRNA expression, miRNA-mRNA paired expression profile, miRNA perturbation experiments...
- Target prediction software
- TargetScan executable PITA executable miRanda excecutable
- miRmap target prediction by multiple algorithms, excecutable, precalculated, many other related data
- miRDB Pre-calculated miRNA-target associations (based on SVM), not executable
- CLIP-seq database
- StarBase DB for CLIP-seq data
lncRNA Regulome Resources
- FANTOM-CAT An atlas of human long non-coding RNAs with accurate 5' ends
- NONCODE Integrative annotation of long noncoding RNAs
- lncRNAdb a reference DB for long noncoding RNAs
- RAIN RNA–protein Association and Interaction Networks Intro to RAIN
- NPInter ncRNA interaction database (ncRNA and other molecules)
- RAID RNA-associated interaction DB (very comprehensive)
- LncRNADisease a DB for lncRNA associated diseases
- ncFANs a web server for functional annotation of ncRNA
- LincSNP a DB of disease-associated SNP in human lncRNA and their TFBS
- POSTAR a DB of RNA binding protein binding sites in human and mouse transcriptome (experimental and computational methods)
Transcriptome Resources
- Data deposit servers
- SRA Sequence Read Archive by NCBI
- ENA European Neucleotice Archive by EBI
- GEO Gene Expression Omnibus (for processed data only)
- AtGenExpress Arabidopsis gene expression DB by Weigel lab (there are unpublished non-GEO data here)
- ImmGen Immunological Genome Project Ontogenet TF-module networks based on ImmGen data
- Expression Atlas
- TISSUE Tissue Expression Database based on text mining (by Lars Jensen)
- EBI Gene Expression Atlas Gene expression atlas for many organisms collected from various experiments
- Human Cell/tissue-specific gene expression map for 369 different cell and tissue types with 5,372 human samples from GEO
- Illumina Human Body Map Project (HBM) RNA-seq data for 16 human tissue
- GXD The mouse Gene Expression Database (by MGI)
Single Cell Genomics Resources
- Awesome single cell List of software packages for single-cell data analysis, including RNA-seq, ATAC-seq, etc (GitHub)
- Single Cell Portal scRNA-seq database by Broad Institute
- scRNASeqDB scRNA-seq database by UTHSC
- conquer A repository of consistently processed, analysis-ready single-cell RNA-seq data sets
- Jinglebells A repository of standardized single cell RNA-Seq datasets for analysis and visualization at the single cell level
- SCPortalen human and mouse single-cell centric database
- 10X Genomics Datasets by 10X Genomics
Phenome/Diseasome Resources
- DisGeNET MetaDB for disease genes and variants (very comprehensive and open license)
- Open Targets Another very comprehensive DB for disease target (mostly protein coding genes) and related evidence
- denovo-db a compendium of human de novo variants
- DISEASES gene-disease association from text mining (GHR, Uniprot, textmining)
- GHR Genetics Home Reference (by NCBI)
- Disease Ontology Disease ontology files FUNDO DOLite_term-to-genes map
- Human Phenotype Ontology
- OMIM Human disease DB (needs License to distribute)
- OrphaData Open database for rare diseases and orphan drug (by Orphanet)
- GAD Genetic Associationan Database: archive of human genetic association studies of complex diseases and disorders (includes summary data extracted from published candidate gene and GWAS studies).
- UMLS Unified Medical Language Systems
- ICD International Classification of Disease by WHO
- DGA Disease and Gene Annotation, an integrative set of disease-to-gene, gene-to-gene, disease-to-disease relationships
- GenomeRNAi v12 contains 168 human RNAi, 181 D. melanogaster RNAi screen data sets
- OGEE Online GEne Essentiality database
- Human-Mouse Disease Connection a part of MGI
Chemical Biology and Drug Research Resources
- Drug and Bioactive chemical DBs
- Drugable.com by National Library of Medicine, ~1 million chemicals, ~7000 structural pockets, ~4 millions of drug-protein interactions by docking model
- PubChem A DB contains drug structure and function by NCBI
- ChEMBL A DB contains drug structure and functions by EBI
- Drugs@FDA A DB for FDA approved drugs
- DailyMed High quality Information about marketed drugs by NCBI
- SuperDrug A DB contains 3D-structures of drugs
- Clinical Trial Information
- ClinicalTrials.gov DB for clinical trials conducted around the world
- Drug Target DBs
- A curated drug-target map by curation of ChEMBL database, DrugCentral database, canSAR knowledge base (Gold Standard drug-target)
- DGIdb An integrated Drug-Gene Interaction DB (CancerCommons, ChEMBL, CIVIC, Clearity Foundation, DoCM, DrugBank, Guid To Pharmacology MyCancerGenome, PharmGKB, Targeted Agents in Lung Cancer TDG, TEND, TTD); go to help for download batch data file
- KEGG DRUG contains information about only approved drugs
- STITCH DB for known and predicted chemical-protein interaction
- Drugbank A major DB of drug/target
- Therapeutic Target Database (TTD) A major DB of drug/target
- MATADOR Manually Annotated Targets and Drugs Online Resource
- IUPHAR/BPS Guide to Pharmacology A DB of in-depth information of drug targets and ligands
- PDSP Ki DB data warehouse for published and internally-derived Ki, or affinity of drugs at targets
- Drug signature, Pharmacogenomics, Toxicogenomics DBs
- DSigDB Drug signature database for gene set analysis
- CLUE The expanded CMap including 1.3M L1000 profiles for 27,927 perturbagens (476,251 expressions)
- Connectivity Map (CMap) 7,000 expression profiles representing 1,309 compounds
- LINCS Library of Integrated Network-based Cellular Signatures (former Connectivity Map)
- The Comparative Toxicogenomics database(CTD) The major DB of chemical-disease links from literature curation
- TG-GATE Toxicogenomics data for >150 chemicals in rats and the primary cultured hepatocytes of rats and humans
- Chemical Effects in Biological Systems(CEBS) an integrated public repository for toxicogenomics data
- PharmGKB The Parmacogenomics Knowledgebase
- SIDER Side Effect Resource
- Drug-Gene Interaction DBs
- MOSAIC Chemical-genetic interactions in Yeast (cover >13000 compounds)
Cancer Biology Resources
- Cancer Genomics Research Gateway
- NCI Office of Cancer Genomics OCG is dedicated to supporting cancer genomics research by sharing molecular data from its programs to enhance understanding of cancer.
- NCI Genomic Data Commons GDC provides the cancer research community with a unified data repository that enables data sharing across cancer genomic studies.
- CTD2 data portal Data Portal of Cancer Target Discovery and Development program which strives to functionally validate discoveries from large-scale genomic initiatives.
- ICGC data portal raw data from ICGC and TCGA
- Synapse GENIE The largest public cancer genome data by ACCR (see ACCR GENIE project)
- Cancer Program Resource Gateway by Broad
- Processed Genomics data and web server
- cBioPortal Data sets from published studies including TCGA
- MethHC A database of DNA Methylation and gene expression in Human Cancer (use Pan-cancer data)
- Cancer Genome Analysis software
- Cancer genes and mutations DBs
- CGC Cancer Gene Census
- TSGene Literature curated Tumor suppressor genes (~1000 coding, ~200 non-coding); v2 paper also provides ~300 oncogenes in Supple
- NCG The Network of Cancer Genes; a manually curated repository of cancer genes from the literature (1571 cancer genes by v5)
- COSMIC Catalog Of Somatic Mutations In Cancer
- CIViC A knowledgebase for expert-crowdsourcing the clinical interpretation of variants in cancer
- DoCM A database of curated mutations in cancer
- Cancer Pharmacogenomics
- Genomics of Drug Sensitivity in Cancer (GDSC) The largest public DB for drug sensitivity of cancer cell line and biomarkers
- Cancer Cell line Encyclopedia (CCLE) by Broad-Novartis, 1000 cancer cell lines, ~1200 compounds and their combinations
- Cancer cell essential genes
- GenomeCRISPR A database for high-throughput CRISPR/Cas9 screening experiments
- Achilles Project shRNA-based screen for 216 cancer cell lines (v2.4.3) and CRISPR-based screen for 33 cancer cell lines (v3.3.8)
- COLT-cancer database shRNA-based essential gene profiles for 70 breast, pancreatic, ovarian cancer cell lines
- Data for survival predictions
- Synapse TCGA-Pancancer survival prediction analysis-ready TCGA data for survival prediction
- Cancer Radiomics Resources
- TCIA The Cancer Image Archive
- Hugo Aerts Harvard lab radiomics page GitHub Radiomics tools Get Py-Radiomics
- IBEX Imaging Biomarker Explorer
- 3D Slicer open source software platform for medical image informatics
- Radiomics tutorial by Hugo Aerts Very inspiring!
Metagenome DBs and tools
- Metagenomic data central DB
- EBI Metagenomics (EMG) by EBI, UK
- MG-RAST by Argonne National Laboratory, US
- IMG/MG by Joint Genome Institute of DOE, US
- iMicrobe by Gordon Betty Moore Foundation, U of Arizona
- Human microbiome
- Integrated Reference Catalog of the Human Gut Microbiome ~9.9M genes
- Human Microbiome Project (HMP)
- The integrative HMP Microbiome-host interactions during disease progression (longitudinal studies on pregnancy, IBD, T2D)
- MetaHIT Metagenomics of the human intestinal tract
- Huttenhower Lab A great resource for analysis tools
Proteome Resources
- Human Proteome Database
- Human Proteome Map 85 samples from 17 adult tissues, 6 primary hematopoietic cells and 7 fetal tissues
- ProteomicsDB >10,000 raw data files from 60 human tissues, 147 cell lines, and 13 body fluids
- The Human Protein Atlas The tissue-based map of human proteome based on Immunohistochemistry (for 32 different tissues and organs)
- Open stand-alone software for mass spectra database search (search engines)
- MSblender A combined search engine
- MS-GFDB: Its successor MS-GF+ is faster and more sensitive for high-resolution MS data.
- X!TANDEM
- Comet: the direct descendant of Crux, which is an academic version of the commercial software SEQUEST
- MyriMatch
- OMSSA Due to budgetary constraints NCBI has discontinued OMSSA. Historical binaries are available from here.
- Protein localization and Secretome DB
- Vesiclepedia A DB for all types of Extracellular Vesicles (includes Exocarta)
- Exocarta A DB for Exosome
- EVpedia A DB for Extracellular Vesicles with many analysis software
Other Resources
- Academic society
- ASHG American Society of Human Genetics
- AACR American Association for Cancer Research
- IHMC The International Human Microbiome Consortium
- KSBI Korean Society of Bioinformatics
- KCA Korean Cancer Association
- KOGO Korea Genome Organization
- KSMCB Korean Society of Molecular and Cellular Biology
- KSBMB Korean Society of Biochemistry and Molecular Biology
- Cool software
- REVIGO Visualize GO enrichment summary
- UpSetR Shiny App Visualizes set intersections in a matrix layout and introduces aggregates based on groupings and queries; R package is also available from github
- VENNY Drawing Venn diagram
- Data-driven Omics companies
- Arivale Health coaching for wellness based on multi-omics analysis
- Human Longevity Inc.
- Pesonalis Genome-guided Medicine
- Calico Aging-related disease research company
- Numedii New Indications for Medicines
- Enterome Microbiome analysis for healthcare and drug development
- Second Genome Microbiome company
- Seres Health Microbiome company
- Vedanta Biosciences Microbiome company
- Machine Learning
- Scikit learn Open software for Machine Learning
- Machine Learning by Andrew Ng
- An Introduction To Statistical Learning Free textbook and lecture notes
- Neuroscience
- Allen Brain Atlas Data Potal Integrative gene expression and neuroanatomical data base
- Brainmap.org Published functional and structural neuroimaging (by functional MRI) database
- Others
- TEDMED TEDTALK for Medicine and healthcare problems
- Retraction watch
- Conference.city Conference search site
- Yoonsup Choi's Blog