Web Resources
Human Gene/Genome Annotations
- CCDS The concensus protein coding regions among NCBI, Ensembl, and Sanger (Havana) annotation
 - GENCODE The Encyclopedia of Genes
 
Human Variome Resources
- Variation DBs
- ExAC (Exome Aggregation Consortium) Exome variation data from >60k individuals
 - 1000 Genome Project Catalog of 60 million variant sites (SNV, CNV, SV), 2535 individuals from 26 populations
 - UK10K Sequencing 10,000 people (4,000 healthy, 6,000 disease) in England
 - Genomics England Sequencing 100,000 people in England focusing on patients with a rare disease and their families and patients with cancer.
 - DiscovEHR Collaboration between the Regeneron Genetics Center (WES) and Geisinger Health System (EHR) provides the vcf by 50,000 MyCode participants
 - European Variation Archive Most comprehensive and organized by studies (include Clinical variants)
 - NCBI Variation Variation DBs (dbSNP, dbVar, dbGaP, ClinVar)
 - iJGVD Integrative Japanese Genome Variation Database
 - HGV Database The HGV database is a fully searchable online database of genome variations published in peer-reviewed Data Reports in Human Genome Variation
 
- Variant functional effect estimation
- Eigen Assign functional important scores on genetic variants in coding and noncoding regions (human only, unsupervised integration)
 - CADD: Combined Annotation Dependent Depletion a tool for scoring the deleteriousness of SNV and indels (human only, supervised integration)
 - GWAVA: Genome Wide Annotation of VAriants a tool which aims to predict the functional impact of non-coding genetic variants (human only, supervised integration)
 - VEP Variant Effect Predictor by EBI (very easy to install and use)
 - Condel Variant effect score by integration of SIFT, Polyphen2, Massessor, MAPP, Logre
 - SIFT(Sorting Intolerent from Tolerent substitution)
 - SIFT4G SIFT for many genomes
 - PolyPhen-2 (Polymorphism Phenotyping v2) for human coding region only
 - RegulomeDB Exploring DNA functional elements for noncoding variants (by Stanford, Snyder lab)
 - HaplogReg Exploring DNA functional elements for noncoding variants (by MIT, Kellis lab)
 
Disease/Phenotype Resources
- DisGeNET MetaDB for disease genes and variants (very comprehensive and open license)
 - Open Targets Another very comprehensive DB for disease target (mostly protein-coding genes) and related evidence
 - denovo-db a compendium of human de novo variants
 - DISEASES gene-disease association from text mining (GHR, Uniprot, textmining)
 - GHR Genetics Home Reference (by NCBI)
 - Disease Ontology Disease ontology files FUNDO DOLite_term-to-genes map
 - Human Phenotype Ontology
 - OMIM Human disease DB (needs License to distribute)
 - OrphaData Open database for rare diseases and orphan drug (by Orphanet)
 - GAD Genetic Association Database: archive of human genetic association studies of complex diseases and disorders (includes summary data extracted from published candidate gene and GWAS studies).
 - UMLS Unified Medical Language Systems
 - ICD International Classification of Disease by WHO
 - DGA Disease and Gene Annotation, an integrative set of disease-to-gene, gene-to-gene, disease-to-disease relationships
 - GenomeRNAi v12 contains 168 human RNAi, 181 D. melanogaster RNAi screen datasets
 - OGEE Online GEne Essentiality database
 - Human-Mouse Disease Connection a part of MGI
 
Human Genotype-to-Phenotype Resources
- QTL depositories
- GTEx Portal eQTL for ~50 different tissue types in humans
 
- GWAS resources
- PheGenI Phenotype-Genotype Integrator: For a query trait, it return GWAS loci collected from all available data resources (very convenient to make a single GWAS data set for each trait)
 - GWAS catalog Disease-associated variants; Now providing GWAS summary stat data
 - Genome-wide Repository of Associations between SNPs and Phenotypes (GRASP) Better than GWAS catalog, including eQTL,QTLs
 - GWASdb includes moderate SNPs (p-value < 10^-3) with manual curation from original papers; manually mapped ~1600 GWAS traits to ~500 HPO terms, ~440 DO terms, ~230 DOLite terms
 - DistiLD Diseases and Traits in Linkage Disequilibrium Blocks
 
- Genotype raw data depositories
- Human Functional Genomics Project Raw data are available from BBMRI-NL data infrastructure
 - UK Biobank Genotype and extensive phenotype data for ~500k UK people
 - European Genome-phenome Archive(EGA) Raw data of GWAS, WGS, Exome-seq. A great resource for meta-analysis
 - dbGaP The database of Genotypes and Phenotypes (GWAS, WGS, Exome-seq...)
 
- Clinical/Disease variant databases
- CGD Clinical Genomic Database
 - HGMD The human gene mutation database (The professional version of DB is commercial. The public version of DB is not downloadable.)
 - OMIM Germline mutations for genetic diseases
 - Roche Cancer Genome Database (RCGDB) Germline/somatic mutations for cancer collected from diverse resourses (not downloadable)
 - IDbase Human Immunodeficiency-causing mutation database
 - NCBI ClinVar human variations and their relations to the human health (Not includes unreviewed data from GWAS)
 
- Others
- COGS nature resources CollaborativeOncological Gene-environment Study (GOGS): Association study using ~211,000SNPs (iCOGS) for breast, ovarian, prostate cancers.
 - Personal Genome Project
 - DECIPHER Developmental Diseases to Phenotypes database with public patients (very useful for rare disease genetics research)
 
Human Pathway/Signature genes and Interactome DBs
- Pathway DBs
- Pathguide.org A very comprehensive list of pathway and network databases
 - Gene Ontology by Gene Ontology Consortium
 - KEGG pathways and many more
 - Biocyc includes Metacyc, Ecocyc, Humancyc, Aracyc, Yeastcyc
 - Reactome A manually curated and peer-reviewed pathway DB
 - Pathway Interaction Database (PID) Human pathways curated by NCI-Nature/imported from BioCarta/Reactome
 - CORUM Comprehensive Resource of Mammalian Protein Complexes
 - NetPath A database for signaling pathways (cancer/immune signaling pathways)
 - SIGNOR 11000 manually-annotated causal relationships between proteins that participate in signal transduction
 - UniProt-GOA by EBI (support multi-species annotation)
 - UniPathway a fully manually curated resource of metabolic pathways (cross-linked to KEGG, MetaCyc)
 
- Signature Gene Set DBs
- MsigDB License required for redistribution
 - GeneSigDB
 - DSigDB Drug signature database for gene set analysis
 - L1000CDS2 Return 50 signature genes for each LINCS L1000 data set using Characteristic Direction (CD) method
 - CREEDS CRowd Extracted Expression of Differential Signatures: Signature gene sets from GEO selected by crowdsourcing project using CD method
 
- Interactome DBs
- iRefWeb a web interface to PPI consolidated from 10 public DB (BIND, BioGRID, CORUM, DIP,IntAct, HPRD, MINT, MPact, MPPI, OPHID(predicted PPIs))
 - STRING Known and predicted PPI
 - Human Reference Interactome Project Y2H-based human protein interactions
 
Regulome Resources
- TF and motif DB
- The Human Transcription Factors 2765 putative TFs and 1639 confident TFs by manual curation
 - CIS-BP (Catalog of Inferred Sequence Binding Preferences) >300 species, >250 TF families, >160,000 TFs. CisBP collects data from >25 sources, including other database such as HOCOMOCO JASPAR UNIPROBE TRANSFAC
 
- Epigenomics Consortium projects
- ENCODE Encyclopedia of DNA Elements project (human)
 - Road map Epigenomics NIH Roda map Epigenomics project home
 - International Human Epigenome Consortium (IHEC) The umbrella organization for international epigenomic efforts
 - 4D Nucleome To understand the principles behind the 3D organization of the nucleus in space and time (the 4th dimension)
 
- Promoter DB
- EPD Eukaryotic Promoter Database; Databases of experimentally validated (by either publication or in-house assay) promoters in various organisms
 
- Enhancer DB
- Enhancer Atlas Human enhancers based on >=3 independent high-throughput experimental datasets (contains 2,534,123 enhancers for 76 cell lines and 29 tissues)
 - dbSUPER contains 82,234 super-enhancers in 102 human and 25 mouse tissue/cell types
 - HEDD Human Enhancer Disease Database (~2.8M enhancers from ENCODE, FANTOM5, RoadMap and annotations for disease, target, variant, conservation)
 - DiseaseEnhancer manual curation of disease-associated enhancers
 
- Transcriptional Start Site (TSS) DB
- DBTTS contains 491 million TSS tag sequences for 20 tissues and 7 cell cultures in human and mouse
 
- Chip-seq/DNase-seq DB
- Cistrome DB the most comprehensive DB for Chip-seq and DNase-seq data
 
- Enhancer-Promoter Interaction DB
- JEME Computationally inferred EPI networks for 935 human primary cells, tissues, and cell lines
 
- microRNA list and expression atlas
- miRBase miRNA database by Manchester University
 - microRNA.org download miRNA expression atlas for human, mouse, rat
 - microRNAome microRNA RNA-seq based atlas for 46 primary cell types and 42 cancer or immortalized cell lines
 
- microRNA-target links (Gold standard)
- miRWalk2.0 Validated links from 4 databases and text minings, Predicted links from 13 prediction data sets
 - miRTarBase Experimental-based microRNA-target links (most popular)
 
- microRNA-disease
- Human microRNA Disease Database(HMDD) Manually curated microRNA-disease links (most comprehensive)
 - PhenomiR DB for dysregulated miRNA in diseases
 - dbDEMC DB for dysregulated miRNA in Cancer
 - miRGator data for miRNA expression, miRNA-mRNA paired expression profile, miRNA perturbation experiments...
 
- miRNA Target predictions
- mirDIP >150M human miRNA-target predictions collected from 30 resources with integrative score
 
- CLIP-seq database
- StarBase DB for CLIP-seq data
 
- lncRNA Resources
- FANTOM-CAT An atlas of human long non-coding RNAs with accurate 5' ends
 - NONCODE Integrative annotation of long noncoding RNAs
 - lncRNAdb a reference DB for long noncoding RNAs
 - RAIN RNA–protein Association and Interaction Networks Intro to RAIN
 - NPInter ncRNA interaction database (ncRNA and other molecules)
 - RAID RNA-associated interaction DB (very comprehensive)
 - LncRNADisease a DB for lncRNA associated diseases
 - ncFANs a web server for functional annotation of ncRNA
 - LincSNP a DB of disease-associated SNP in human lncRNA and their TFBS
 - POSTAR a DB of RNA binding protein binding sites in human and mouse transcriptome (experimental and computational methods)
 
Single Cell Genomics Resources
- Awesome single cell List of software packages for single-cell data analysis, including RNA-seq, ATAC-seq, etc (GitHub)
 - Single Cell Portal scRNA-seq database by Broad Institute
 - scRNASeqDB scRNA-seq database by UTHSC
 - conquer A repository of consistently processed, analysis-ready single-cell RNA-seq data sets
 - Jinglebells A repository of standardized single cell RNA-Seq datasets for analysis and visualization at the single cell level
 - SCPortalen human and mouse single-cell centric database
 - 10X Genomics Datasets by 10X Genomics
 
Chemical Biology and Drug Research Resources
- Drug and Bioactive chemical DBs
- Drug Repurposing Hub a best-in-class drug screening collection of >3,000 clinical drugs and their annotation (structure, MoA, protein targets)
 - Drugable.com by National Library of Medicine, ~1 million chemicals, ~7000 structural pockets, ~4 millions of drug-protein interactions by docking model
 - PubChem A DB contains drug structure and function by NCBI
 - ChEMBL A DB contains drug structure and functions by EBI
 - Drugs@FDA A DB for FDA approved drugs
 - DailyMed High quality Information about marketed drugs by NCBI
 - SuperDrug A DB contains 3D-structures of drugs
 
- Clinical Trial Information
- ClinicalTrials.gov DB for clinical trials conducted around the world
 
- Drug Target DBs
- A curated drug-target map by curation of ChEMBL database, DrugCentral database, canSAR knowledge base (Gold Standard drug-target)
 - DGIdb An integrated Drug-Gene Interaction DB (CancerCommons, ChEMBL, CIVIC, Clearity Foundation, DoCM, DrugBank, Guid To Pharmacology MyCancerGenome, PharmGKB, Targeted Agents in Lung Cancer TDG, TEND, TTD); go to help for download batch data file
 - KEGG DRUG contains information about only approved drugs
 - STITCH DB for known and predicted chemical-protein interaction
 - Drugbank A major DB of drug/target
 - Therapeutic Target Database (TTD) A major DB of drug/target
 - MATADOR Manually Annotated Targets and Drugs Online Resource
 - IUPHAR/BPS Guide to Pharmacology A DB of in-depth information of drug targets and ligands
 - PDSP Ki DB data warehouse for published and internally-derived Ki, or affinity of drugs at targets
 
- Drug signature, Pharmacogenomics, Toxicogenomics DBs
- DSigDB Drug signature database for gene set analysis
 - CLUE The expanded CMap including 1.3M L1000 profiles for 27,927 perturbagens (476,251 expressions)
 - iLINCS Integrated System to Analyze LINCS and other data
 - Connectivity Map (CMap) 7,000 expression profiles representing 1,309 compounds
 - The Comparative Toxicogenomics database(CTD) The major DB of chemical-disease links from literature curation
 - TG-GATE Toxicogenomics data for >150 chemicals in rats and the primary cultured hepatocytes of rats and humans
 - Chemical Effects in Biological Systems(CEBS) an integrated public repository for toxicogenomics data
 - PharmGKB The Parmacogenomics Knowledgebase
 - SIDER Side Effect Resource
 
- Drug-Gene Interaction DBs
- MOSAIC Chemical-genetic interactions in Yeast (cover >13000 compounds)
 
Cancer Biology Resources
- Cancer Somatic Mutations DBs
- COSMIC(The Catalog Of Somatic Mutations In Cancer) By Sanger with expert curation
 - DoCM A database of functional variants validated in cancer
 - CIViC A knowledgebase for expert-crowdsourcing the clinical interpretation of variants in cancer
 
- Cancer Somatic Mutation Visualization
- Proteinpaint Exploring genomic alteration in pediatric cancer
 
- Cancer Gene DBs
- CGC(Cancer Gene Census} A catalog of genes with mutations that are causally implicated in cancer (by COSMIC)
 - 125 mutation-based drivers see Supple TableS2A (71 TSG and 54 OG by 20/20 rule)
 - TSGene Literature-curated 1217 human TSGs (1018 protein-coding and 199 non-coding genes) and 320 protein-coding oncogenes
 - CCGD(Candidate Cancer Gene Database)A database of cancer driver genes from transposon-based forward genetic screens in mice
 - 77 Cancer Genes by amplification and overexpression see Supple TableS2
 - NCG(The Network of Cancer Genes) (~500) CGC + (~1000) Candidate genes from Panel Seq, WES, WGS studies
 
- Cancer Genomics Research Gateway
- ICGC data portal raw data from ICGC and TCGA
 - TARGET(Therapeutically Applicable Research To Generate Effective Treatments) Childhood Cancer Genome Project by NCI
 - PedPanCan(A Pan-Cancer Study of Childhood Cancers) by Multi-Institutes including St. Jude Children's Research Hospital
 - NCI Office of Cancer Genomics OCG is dedicated to supporting cancer genomics research by sharing molecular data from its programs to enhance understanding of cancer.
 - NCI Genomic Data Commons GDC provides the cancer research community with a unified data repository that enables data sharing across cancer genomic studies.
 - CTD2 data portal Data Portal of Cancer Target Discovery and Development program which strives to functionally validate discoveries from large-scale genomic initiatives.
 - Synapse GENIE The largest public cancer genome data by ACCR (see ACCR GENIE project)
 - Cancer Program Resource Gateway by Broad
 
- Cancer Genomics Data Analysis Cloud Platforms
- ISB-CGC Cancer Genomics Cloud by ISB
 - WebMeV Analysis of large genomic data, particularly for RNASeq and microarray data (TCGA, GEO, or user-uploaded).
 
- Tumor Microenvironment Analysis tools
- TIMER Web server for a comprehensive TME analysis
 - xCell Tumor cellular heterogeneity analysis web server; R package is also available from github
 
- Cancer Pharmacogenomics
- PharmacoDB Integrative database for cancer pharmacogenomics (CCLE, GDSC, CTRP, and more)
 - CTRP The Cancer Therapeutics Response Portal (~550 drugs x ~890 cell lines)
 - Genomics of Drug Sensitivity in Cancer (GDSC) (~250 drugs x ~1110 cell lines)
 - Cancer Cell line Encyclopedia (CCLE) (~20 drugs x ~1060 cell lines)
 
- Cancer cell essential genes
- GenomeCRISPR A database for high-throughput CRISPR/Cas9 screening experiments
 - Achilles Project shRNA-based screen for 216 cancer cell lines (v2.4.3) and CRISPR-based screen for 33 cancer cell lines (v3.3.8)
 - COLT-cancer database shRNA-based essential gene profiles for 70 breast, pancreatic, ovarian cancer cell lines
 
Metagenome DBs and tools
- Metagenomic data central DB
- EBI Metagenomics (EMG) by EBI, UK
 - MG-RAST by Argonne National Laboratory, US
 - IMG/MG by Joint Genome Institute of DOE, US
 - iMicrobe by Gordon Betty Moore Foundation, U of Arizona
 
- Human microbiome
- Integrated Reference Catalog of the Human Gut Microbiome ~9.9M genes
 - Human Microbiome Project (HMP)
 - The integrative HMP Microbiome-host interactions during disease progression (longitudinal studies on pregnancy, IBD, T2D)
 - MetaHIT Metagenomics of the human intestinal tract
 - Huttenhower Lab A great resource for analysis tools
 
Proteome Resources
- Human Proteome Database
- Human Proteome Map 85 samples from 17 adult tissues, 6 primary hematopoietic cells and 7 fetal tissues
 - ProteomicsDB >10,000 raw data files from 60 human tissues, 147 cell lines, and 13 body fluids
 - The Human Protein Atlas The tissue-based map of human proteome based on Immunohistochemistry (for 32 different tissues and organs)
 
- Open stand-alone software for mass spectra database search (search engines)
- MSblender A combined search engine
 - MS-GFDB: Its successor MS-GF+ is faster and more sensitive for high-resolution MS data.
 - X!TANDEM
 - Comet: the direct descendant of Crux, which is an academic version of the commercial software SEQUEST
 - MyriMatch
 - OMSSA Due to budgetary constraints NCBI has discontinued OMSSA. Historical binaries are available from here.
 
- Protein localization and Secretome DB
- Vesiclepedia A DB for all types of Extracellular Vesicles (includes Exocarta)
 - Exocarta A DB for Exosome
 - EVpedia A DB for Extracellular Vesicles with many analysis software
 
Other Resources
- Academic society
- ASHG American Society of Human Genetics
 - AACR American Association for Cancer Research
 - IHMC The International Human Microbiome Consortium
 - KSBI Korean Society of Bioinformatics
 - KCA Korean Cancer Association
 - KOGO Korea Genome Organization
 - KSMCB Korean Society of Molecular and Cellular Biology
 - KSBMB Korean Society of Biochemistry and Molecular Biology
 
- Cool software
- REVIGO Visualize GO enrichment summary
 - UpSetR Shiny App Visualizes set intersections in a matrix layout and introduces aggregates based on groupings and queries; R package is also available from github
 - VENNY Drawing Venn diagram
 
- Data-driven Omics companies
- Arivale Health coaching for wellness based on multi-omics analysis
 - Human Longevity Inc.
 - Pesonalis Genome-guided Medicine
 - Calico Aging-related disease research company
 - Numedii New Indications for Medicines
 - Enterome Microbiome analysis for healthcare and drug development
 - Second Genome Microbiome company
 - Seres Health Microbiome company
 - Vedanta Biosciences Microbiome company
 
- Machine Learning
- Scikit learn Open software for Machine Learning
 - Machine Learning by Andrew Ng
 - An Introduction To Statistical Learning Free textbook and lecture notes
 
- Neuroscience
- Allen Brain Atlas Data Potal Integrative gene expression and neuroanatomical data base
 - Brainmap.org Published functional and structural neuroimaging (by functional MRI) database
 
- Others
- TEDMED TEDTALK for Medicine and healthcare problems
 - Retraction watch
 - Conference.city Conference search site
 - Yoonsup Choi's Blog