Web Resources
Knowledgebase, GATEWAY DBs, Genome/Gene Annotations
- Wikigenes Wiki for Life Science
- Scitable by Nature Learn Science at Nature Education
- Plant Physiology web
- EBI
- NCBI
- GOLD Genome Online Database-Genome project statistics and download links
- UCSC genome browser home
- ENSEMBL genome browser home
- CCDS The concensus protein coding regions among NCBI, Ensembl, and Sanger (Havana) annotation
- GENCODE The Encyclopedia of Genes
- DBTSS Transcription start site DB (with tissue specific information)
- EPD The Eukaryotic Promoter Database
- TiProD The tissue-specific Promoter DB
Genome Sequencing/Re-sequencing Consortium Projects
- Genome 10K Sequencing >16,000 vetebrates
- i5K genome initiative Sequencing >5,000 insects and other anthropods
- Bird 10K project Sequencing >10,000 bird species
- 1000 Plants Sequencing 1,000 plant species
- 1000 Genomes Sequencing 1,000 healthy people from various populations
- UK10K Sequencing 10,000 people (4,000 healthy, 6,000 disease) in England
- Genomics England Sequencing 100,000 people in England focusing on patients with a rare disease and their families and patients with cancer.
NGS data public depositories
- SRA Sequence Read Archive by NCBI
- ENA European Neucleotice Archive by EBI
- GEO Gene Expression Omnibus (for processed data only)
- ENCODE Encyclopedia of DNA Elements project (human)
NGS data analysis tools
- BEDOPSA suite for common genome analysis tasks with high scalability and flexibility
- BEDTools A suite for BED (Browser Extensible Data) and GFF (General Feature Format) format.
- SAMtools A suite for SAM (Sequence Alignment/Map) format
- Homer A suite of tools for Motif Discovery and NGS (ChIP-Seq, RNA-Seq, DNase-Seq, Hi-C). Excellent documentation!
- F-Seq A Feature Density Estimator for High-Throughput Sequence Tags
- IDR Reproducibility and automatic thresholding of ChIP-seq data
The Encyclopedia of DNA Element (ENCODE) Data links
- ENCODE data summary List all released/approved experiments
- ENCODE Encyclopedia of DNA Elements project (human)
- ENCODE explore Access to collected papers exploring ENCODE data
- UWencode ENCODE (human and mouse) data browser and download site
- RNA Dashboard DB for raw transcriptome data from ENCODE
- Mouse ENCODE ENCODE for mouse
- modENCODE ENCODE for animal models (worm, fly)
Epigenomics, Cis-regulatory regions Resources
- GREAT Genomic Regions Enrichment of Annotations Tool; Predict functions for cis-regulatory regions
- CistromeMap A knowledgebase for ChIP-Seq and DNase-Seq studies in mouse and human
- Road map Epigenomics NIH Roda map Epigenomics project home
- BLUEPRINT epigenome Epigenome maps of >100 different blood cell types
- Epigenie An informative web community for epigenetics-related research
- EpGenSys European network to bring together epigenetic and systems biology
- Gene Regulation Info A very useful site for epigenetics and TF-DNA interaction studies (by Dr. Vladimir Teif)
Genomic Variation DBs
- Human genomics variations
- ExAC (Exome Aggregation Consortium) Exome variation data from >60k individuals
- European Variation Archive Most comprehensive and organized by studies (include Clinical variants)
- NCBI Variation Variation DBs (dbSNP, dbVar, dbGaP, ClinVar)
- 1000 Genome Project Catalog of 60 million variant sites (SNV, CNV, SV), 2535 individuals from 26 populations
- Complete Genomics Very accurate 69 human WGS public data and more
- Exome Variant Server by NHLBI GO Exome sequencing project (ESP)
- iJGVD Integrative Japanese Genome Variation Database
- HGV Database The HGV database is a fully searchable online database of genome variations published in peer-reviewed Data Reports in Human Genome Variation
- Human disease associated genomics variations
- CGD Clinical Genomic Database
- HGMD The human gene mutation database (The professional version of DB is commercial. The public version of DB is not downloadable.)
- COSMIC DB for somatic mutations for cancer (largely by manual curation)
- TCGA Germline/somatic mutations for cancer are available as Mutation Analaysis file format (MAF).
- OMIM Germline mutations for genetic diseases
- Roche Cancer Genome Database (RCGDB) Germline/somatic mutations for cancer collected from diverse resourses (not downloadable)
- IDbase Human Immunodeficiency-causing mutation database
- Arabidopsis genomics variations
- AtPolyDB Everything about Arabidopsis natural variants (by Magnus Nordborg, GMI)
- RegMap panel Reginal Mapping Project for Arabidopsis natural variants (by Joy Bergelson, U on Chicago)
- 1001 Genome Project Genetic variation] of natural population of Arabidopsis (by Detlef Weigel, MPI)
Mutation Effect Prediction Tools
- VEP Variant Effect Predictor by EBI (very easy to install and use)
- Condel Variant effect score by integration of SIFT, Polyphen2, Massessor, MAPP, Logre
- SIFT(Sorting Intolerent from Tolerent substitution)
- SIFT4G SIFT for many genomes
- PolyPhen-2 (Polymorphism Phenotyping v2)
- EvoD Evolutionary Diagnosis method
- RegulomeDB Exploring DNA functional elements for noncoding variants (by Stanford, Snyder lab)
- HaplogReg Exploring DNA functional elements for noncoding variants (by MIT, Kellis lab)
Genotype-to-Phenotype Resources
- UK10k Exome Sequencing data for both healthy and disease population
- UK Biobank Genotype and extensive phenotype data for ~500k UK people
- GWASdb includes moderate SNPs (p-value < 10^-3) with manual curation from original papers; manually mapped ~1600 GWAS traits to ~500 HPO terms, ~440 DO terms, ~230 DOLite terms
- European Genome-phenome Archive(EGA) Raw data of GWAS, WGS, Exome-seq. A great resource for meta-analysis
- dbGaP The database of Genotypes and Phenotypes (GWAS, WGS, Exome-seq...)
- NCBI ClinVar human variations and their relations to the human health (Not includes unreviewed data from GWAS)
- GWAS Central contain SNPs for any p-value
- GWAS catalog now maintained by EBI
- PheGenI Phenotype-Genotype Integrator
- Genome-wide Repository of Associations between SNPs and Phenotypes (GRASP) Better than GWAS catalog, including eQTL,QTLs
- COGS nature resources CollaborativeOncological Gene-environment Study (GOGS): Association study using ~211,000SNPs (iCOGS) for breast, ovarian, prostate cancers.
- DistiLD Diseases and Traits in Linkage Disequilibrium Blocks
- Personal Genome Project
- NCBI GTex(Genotype-Tissue Expression) browser eQTL data download and analysis
- GWAPP A web-based tool for GWAS in Arabidopsis
- PGC GWAS rawdata download contains links for other GWAS raw data
- DECIPHER Developmental Diseases to Phenotypes database with public patients (very useful for rare disease genetics research)
Genotype-to-Expression (eQTL) Databases
- GTEx Portal DB Portal for GTEx project
- GTEx eQTL browser genotype-to-tissue expression
Pathway Annotation DBs
- Gene Ontology by Gene Ontology Consortium
- KEGG pathways and many more
- Biocyc includes Metacyc, Ecocyc, Humancyc, Aracyc, Yeastcyc
- Reactome A manually curated and peer-reviewed pathway DB
- Pathway Interaction Database (PID) Human pathways curated by NCI-Nature/imported from BioCarta/Reactome
- CORUM Comprehensive Resource of Mammalian Protein Complexes
- NetPath A database for signaling pathways (cancer/immune signaling pathways)
- UniProt-GOA by EBI (support multi-species annotation)
- UniPathway a fully manually curated resource of metabolic pathways (cross-linked to KEGG, MetaCyc)
- Mapman Metabolic pathway databases
- Plantcyc Plant metabolic network databases
- Gramene A curated DB for grasses
- agriGO A GO databases for agricultural community
- AgBase Curated DB for functional analysis of agriculural animals and plants
Protein/Gene Interaction DBs
- PPIs by curation
- iRefWeb a web interface to PPI consolidated from 10 public DB (BIND, BioGRID, CORUM, DIP,IntAct, HPRD, MINT, MPact, MPPI, OPHID(predicted PPIs))
- IntAct
- BIND the Biomolecular Interaction Network Database
- BioGRID
- HPRD Human Protein Reference Database
- MINT Molecular Interaction DB
- DIP Database of Interacting Proteins
- Mpact Representation of Interaction Data at MIPS
- MPPI Mammalian PPI DB at MIPS
- Inferred gene interactions
TF Regulation DBs
-TFBS motif model DB
- CIS-BP (Catalog of Inferred Sequence Binding Preferences) >300 species, >250 TF families, >160,000 TFs. CisBP collects data from >25 sources, including other database such as HOCOMOCO JASPAR UNIPROBE TRANSFAC
-Tools for MOTIF discovery and searching
- MEME Suite has everything for motif based sequence analysis
-TF-target DB
- TRED a transcriptional regulatory element database (contains curated TF-target links for 36 TF families)
- ORegAnno DNA regulatory regions, TFBS, regulatory variants
-TF ChIP DB
- hmChIP TF-ChIP DB for human and mouse
-Plant TF DB
- AGRIS Arabidopsis Gene Regulatory Information Server (by OSU)
- PlnTFDB Plant TF database by University of Potsdam, Germany
- PlantTFDB Plant TF database by Peking University, China
-Others
- Gene Regulation Info A very useful site for epigenetics and TF-DNA interaction studies (by Dr. Vladimir Teif)
miRNA DBs and target prediction tools
-microRNA list and expression atlas
- miRBase miRNA database by Manchester University
- microRNA.org download miRNA expression atlas for human, mouse, rat
-microRNA-target links (Gold standard)
- miRWalk2.0 Validated links from 4 databases and text minings, Predicted links from 13 prediction data sets
- miRTarBase Manually curated microRNA-target links, miRNA-mRNA paired expression profiles, miRNA-disease links
- miRecords Manually curated microRNA-target links + predicted links (by 11 computational algorithms)
- miRTex Text mining system for miRNA-target, miRNA-gene/gene-miRNA regulation
- mirSel microRNA-target links by text mining
- Comir Combinatorial miRNA target prediction tool
-microRNA-disease
- Human microRNA Disease Database(HMDD) Manually curated microRNA-disease links
- miR2Disease Manually curated microRNA-target links and microRNA-disease links
- PhenomiR A knowledgebase of miRNA expression in disease and biological processes
- miRGator data for miRNA expression, miRNA-mRNA paired expression profile, miRNA perturbation experiments...
-Target prediction software
- TargetScan executable PITA executable miRanda excecutable
- miRmap target prediction by multiple algorithms, excecutable, precalculated, many other related data
- miRDB Pre-calculated miRNA-target associations (based on SVM), not executable
-CLIP-seq database
- StarBase DB for CLIP-seq data
-Plant microRNA DB
- Carrington Lab Resource Various DBs for plant miRNA
- NONCODE Integrative annotation of long noncoding RNAs
- lncRNAdb a reference DB for long noncoding RNAs
- NPInter ncRNA interaction database (ncRNA and other molecules)
- LncRNADisease a DB for lncRNA associated diseases
- ncFANs a web server for functional annotation of ncRNA
Gene Expression DBs (Microarray/RNA-seq)
- GEO
- AtGenExpress Arabidopsis gene expression DB by Weigel lab (there are unpublished non-GEO data here)
- Connectivity Map (CMap) 7,000 expression profiles representing 1,309 compounds
- LINCS Library of Integrated Network-based Cellular Signatures (former Connectivity Map)
- ImmGen Immunological Genome Project Ontogenet TF-module networks based on ImmGen data
- TISSUE Tissue Expression Database based on text mining (by Lars Jensen)
- EBI Gene Expression Atlas Gene expression atlas for many organisms collected from various experiments
- Human Cell/tissue-specific gene expression map for 369 different cell and tissue types with 5,372 human samples from GEO
- Illumina Human Body Map Project (HBM) RNA-seq data for 16 human tissue
- GXD The mouse Gene Expression Database (by MGI)
- FlyAtlas fly gene expression in 25-17 adult and 8 larval tissues
Mass Spectrometer or Immunohistochemistry Proteomics Resources
- Human Proteome Database
- Human Proteome Map 85 samples from 17 adult tissues, 6 primary hematopoietic cells and 7 fetal tissues
- ProteomicsDB >10,000 raw data files from 60 human tissues, 147 cell lines, and 13 body fluids
- The Human Protein Atlas The tissue-based map of human proteome based on Immunohistochemistry (for 32 different tissues and organs)
- Open MS proteomics data analysis suite
- Seattle Proteome Center(SPC)
- Trans-Proteomic Pipeline A open-source suite of proteomic data analysis tools
- MaxQuant All-in-one shotgun proteomics data analysis suite (works in Windows machine)
- Open stand alone software for spectra database search (search engines)
- MSblender A combined search engine
- MS-GFDB: Its successor MS-GF+ is faster and more sensitive for high resolution MS data.
- X!TANDEM
- Comet: the direct descendant of Crux, which is an academic version of the commercial software SEQUEST
- MyriMatch
- OMSSA Due to budgetary constraints NCBI has discontinued OMSSA. Historical binaries are available from here.
-Raw spectra databases
- PRoteomics IDentification Database (PRIDE) by EBI
- Peptide Atlas by Seattle proteome center
- Human Plasma Proteome Project 1929 human plasma proteins (by FDR<1%) and 91 MS experiment raw data
Protein localization and Secretome DB
- Vesiclepedia A DB for all types of Extracellular Vesicles (includes Exocarta)
- Exocarta A DB for Exosome
- EVpedia A DB for Extracellular Vesicles with many analysis softwares
- SUBA SUBcellular location DB for Arabidopsis proteins
Phenotype/Disease Annotation DBs
- OMIM Human disease DB
- DISEASES gene-disease association from text mining
- Disease Ontology Disease ontology files FUNDO DOLite_term-to-genes map
- Human Phenotype Ontology
- OrphaData Open database for rare diseases and orphan drug (by Orphanet)
- GAD Genetic Associationan Database: archive of human genetic association studies of complex diseases and disorders (includes summary data extracted from published candidate gene and GWAS studies).
- UMLS Unified Medical Language Systems
- ICD International Classification of Disease by WHO
- DGA Disease and Gene Annotation, an integrative set of disease-to-gene, gene-to-gene, disease-to-disease relationships
- GenomeRNAi v12 contains 168 human RNAi, 181 D. melanogaster RNAi screen data sets
- OGEE Online GEne Essentiality database
- Human-Mouse Disease Connection a part of MGI
Drug/Bio-active chemical DBs
- Drugable.com by National Library of Medicine, ~1 million chemicals, ~7000 structural pockets, ~4 millions of drug-protein interactions by docking model
- PubChem A DB contains drug structure and function by NCBI
- ChEMBL A DB contains drug structure and functions by EBI
- Drugs@FDA A DB for FDA approved drugs
- DailyMed High quality Information about marketed drugs by NCBI
- SuperDrug A DB contains 3D-structures of drugs
Drug-Target relationship/ Chemical genomics DBs
- DGIdb An integrated Drug-Gene Interaction DB (CancerCommons, ChEMBL, CIVIC, Clearity Foundation, DoCM, DrugBank, Guid To Pharmacology MyCancerGenome, PharmGKB, Targeted Agents in Lung Cancer TDG, TEND, TTD); go to help for download batch data file
- KEGG DRUG contains information about only approved drugs
- STITCH DB for known and predicted chemical-protein interaction
- Drugbank A major DB of drug/target
- Therapeutic Target Database (TTD) A major DB of drug/target
- MATADOR Manually Annotated Targets and Drugs Online Resource
- IUPHAR/BPS Guide to Pharmacology A DB of in-depth information of drug targets and ligands
- PDSP Ki DB data warehouse for published and internally-derived Ki, or affinity of drugs at targets
- Yeast Fitness DB Chemical genomics test for ~400 chemicals (Science 320-362)
Clinical Trials and Pharmaco/Toxicogenomics DBs
- ClinicalTrials.gov DB for clinical trials conducted around the world
- The Comparative Toxicogenomics database(CTD) The major DB of chemical-disease links from literature curation
- TG-GATE Toxicogenomics data for >150 chemicals in rats and the primary cultured hepatocytes of rats and humans
- Chemical Effects in Biological Systems(CEBS) an integrated public repository for toxicogenomics data
- PharmGKB The Parmacogenomics Knowledgebase
- SIDER Side Effect Resource
Cancer Genome/Cell Line Biology DBs
-Catalog of Cancer genes and mutations
- TSGene Literature curated Tumor suppressor genes (~1000 coding, ~200 non-coding); v2 paper also provides ~300 oncogenes in Supple
- NCG The Network of Cancer Genes; a manually curated repository of cancer genes from the literature (1571 cancer genes by v5)
- COSMIC Catalog Of Somatic Mutations In Cancer
- CGC Cancer Gene Census
-Cancer Genomics Data Portals
- Synapse TCGA-Pancancer The official resource for hosting analysis-ready TCGA Pan Cancer data
- cBioPortal Data sets from published studies including TCGA
- CGHub TCGA data portal by UCSC; TCGA The Cancer Genome Atlas project home
- TumorPortal Pan-cancer data set from many tumor types.
- ICGC data portal raw data from ICGC and TCGA
- MethHC A database of DNA Methylation and gene expression in Human Cancer (use Pan-cancer data)
-Data for survival predictions
- Synapse TCGA-Pancancer survival prediction analysis-ready TCGA data for survival prediction
-Cancer Genomics Data Analysis Web server
-Cancer chemical genomics
- Genomics of Drug Sensitivity in Cancer (GDSC) The largest public DB for drug sensitivity of cancer cell line and biomarkers
- Cancer Cell line Encyclopedia (CCLE) by Broad-Novartis, 1000 cancer cell lines, ~1200 compounds and their combinations
- DTP human tumor cell line screen by NCI-60
- Developmental Therapeutics Program by NCI (contains NCI-60 human tumor cell-line screen data)
- NCI60 mutation data
- GKS Cancer Cell Line Data genomic profiles for 300 cell lines
-Cancer cell essential genes
- Achilles Project shRNA-based essential gene profiles for 216 cancer cell lines
- COLT-cancer database shRNA-based essential gene profiles for 70 breast, pancreatic, ovarian cancer cell lines
Stem Cell Biology DBs
- Stemformatrics Datasets and Bioinformatics tools for Stem Cell Research
- SCDE The Stem Cell Discovery Engine
- ESCAPE Embryonic Stem Cell Atlas of Pluripotency Evidence (Many stem cell related networks)
Metagenome DBs and tools
-Metagenomic data central DB
- EBI Metagenomics (EMG) by EBI, UK
- MG-RAST by Argonne National Laboratory, US
- IMG/MG by Joint Genome Institute of DOE, US
- iMicrobe by Gordon Betty Moore Foundation, U of Arizona
-Human microbiome
- Integrated Reference Catalog of the Human Gut Microbiome ~9.9M genes
- Human Microbiome Project (HMP)
- MetaHIT Metagenomics of the human intestinal tract
- Huttenhower Lab A great resource for analysis tools
- Knight Lab Another great source for analysis tools
- Kinghts Lab
- Relman Lab Read his articles!
Bacterial Antibiotics DBs
- Antibiotic Resistance Genes Database(ARGD)
- BacMet Antibacterial biocide and metal resistance genes database
Organism-centric DBs: Microbes
- Microme A resource for bacterial metabolism
- BEI resources supporting infectious disease research, providing high quality cultures and reagents for microbiology including mutant strains
- PortEco Portal for E. coli
- Ecoli Phenome DB Mutant phenome data for 324 drug/stress conditions
- Pseudomonas Genome Database
- Bactome DB Mutant phenome data for 119 drug/stress conditions (id: pseudo, pw:Haeussler)
- Saccharomyces Genome Database
- Candida Genome Database
- Xanthobase Xanthomonas oryzae pv. oryzae Genome Database
- IMG Integrated Microbial Genomes (include metagenome data)
Organism-centric DBs: Animals
- WormBase
- FlyBase
- DGRP2 Drosophila Genetic Reference Panel 2 (Fly HapMap)
- MGI
- IMPC International Mouse Phenotype Consortium Portal site
- Sanger mouse resource portal
Organism-centric DBs: Plants
- All Plants
- AgriGO GO analysis for the agricultural community
- Gramene
- PlantGDB Plant Genome DB
- Plant Genome Duplication Database (PGDD) provides orthologs and paralogs by between- and within-genome duplication detection
- Arabidopsis
- Rice (Oryza sativa)
- RGAP Rice Genome Annotation Project by MSU (Go get the part list here!)
- Maize (Zea Mays)
-Barley (Hordeum vulgare L.)
-Wheat (Triticum aestivum)
- Tomato (Solanum lycopersicum)
- Sol Genomics Network for Tomato
- Soybean (Glycine Max)
- Soybase
- Soybean HAPMAP SNP catalogs and LD map
Genome Engineering Resources
- Addgene Plasmids for Genome Engineering
- Zhang Lab Feng Zhang at MIT (CRISPR resource, Optic control)
- Joung Lab Keith Joung at Harvard (TALEN resource, CRISPR resource)
Data-driven Omics companies
- Arivale Health coaching for wellness based on multi-omics analysis
- Human Longevity Inc.
- Pesonalis Genome-guided Medicine
- Calico Aging-related disease research company
- Numedii New Indications for Medicines
- Enterome Microbiome analysis for healthcare and drug development
- Second Genome Microbiome company
- Seres Health Microbiome company
- Vedanta Biosciences Microbiome company
Other Resources
-Machine Learning
- Machine Learning by Andrew Ng
- An Introduction To Statistical Learning Free textbook and lecture notes
- Academic society
- KSBSB Korean Society of Bioinformatics and Systems Biology
- KGO Korea Genome Organization
- KSMCB Korean Society of Molecular and Cellular Biology
- KSBMB Korean Society of Biochemistry and Molecular Biology
- Other Systems Biology Links
- DREAM Dialogue for Reverse Engineering Assessments and Methods
- Sage Bionetworks
- Assay depot Online marketplace for pharmaceutical research service
- CAGI Critical Assessment of Genome Interpretation
- Neuroscience
- Allen Brain Atlas Data Potal Integrative gene expression and neuroanatomical data base
- Brainmap.org Published functional and structural neuroimaging (by functional MRI) database
- Others
- Retraction watch
- Conference.city Conference search site
- Yoonsup Choi's Blog