Variation infomation

SNV In VARAdb, we obtained 577,098,938 genome-wide human variations from dbSNP release 151 and provided genetic annotations for each variation. Each variation is scored based on its annotated records on nine annotation categories: risk SNP, eQTL, motif change, conservation, enhancer, promoter, TF binding, ATAC accessible region and Hi-C. The score means that how many categories the variation associated with. Based on the score system, users can prioritize variations of interest.
Common SNP Common SNPs are also obtained from dbsnp release 151. Here, we provide files about 79,482,384 common SNPs for users to download. The colomns of download file are same with the variation download file'.
Risk SNP Genome-wide association studies are providing a large number of data associating genetic variants with diseases and phenotypes. We collected 1,515,001 risk SNPs associated with diseases or traits or phenotypes from 5 sources including the NHGRI GWAS Catalog, GWASdb v2.0, GRASP v2.0, GAD and Johnson and O'Donnell.
LD SNP Linkage disequilibrium (LD) SNPs may share similar regulatory information associated with a phenotype. We utilized VCFTools (v0.1.13) and plink (v1.9) to calculate linkage disequilibrium (LD) for common SNPs accompanying the 1000 Genomes Project phase 3. Using an LD threshold of r2=0.8 and 200 kb window between variants, we got LD SNPs of 5 super-populations (AFR:African, AMR:Ad Mixed American, EAS:East Asian, EUR:European, SAS:South Asian).
Somatic mutation Somatic mutations are also play vital roles in tumor development with specification. We downloaded somatic mutations from OncoBase, including 388,861 somatic mutations in 36 cancer types from TCGA and 7,999,912 somatic mutations in 57 cancer types from ICGC.
Variant-drug pair From PharmGKB, we got 3,652 variant-drug-gene pairs for human which include 2,351 variants, 921 genes and 710 chemicals. Some results have been validated with evidence in PharmGKB.
eQTL The correlations between genotype and tissue-specific gene expression levels can interpret the effects of variants on genes. We obtained 16,489,663 significant SNP-gene pairs (with FDR≤0.05, including 3,052,986 SNPs and 18,126 gens) in 48 human tissues from GTEx project version 7, 5,596,894 cis-eQTLs (with FDR≤0.05, including 1,370,558 SNPs and 17,353 gens) about 33 cancer types in PancanQTL and 4,613,715 pairs from HaploReg 12 studies (914,358 SNPs, 20,331 genes).
Motif change OncoBase database employed motifbreakR to measure the effects of somatic mutations on transcription factors binding motifs that consists of 2,817 position weight matrices (PWMs) of transcription factors from 4 resources including ENCODE, FactorBook, HOCOMOCO and Homer . We added OncoBase’s motifbreak results into our database to score and predict the effects of our variation (neutral, weak or strong).
Variation conservation We obtained phastCon scores calculated from multiple alignments of 100 vertebrate genomes in UCSC and used bigwigAverageOverBed tool to measure conservation of each variation.


Regulatory information

Super enhancer Super enhancers are large clusters of enhancers with a higher degree of enrichment for TFs, higher levels of transcription and stronger cell type specificity. We downloaded 331,146 super enhancers from SEdb identified by H3K27ac ChIP-seq samples. The detailed information is displayed in SEdb.
Typical enhancer We downloaded 6,629,274 typical enhancers from SEdb identified by H3K27ac ChIP-seq samples. The detailed information is displayed in SEdb.
Active enhancer We obtained 877,962 active enhancers from HACER GRO-seq/PRO-seq enhancers and Fantom5 project. The detailed information is displayed in HACER database.
Disease enhancer We obtained 1,535 disease enhancers from DiseaseEnhancer and EnDisease 2.0. The detailed information is displayed in two databases separately.
Validated enhancer We obtained 1,416 experimentally validated enhancers from VISTA Enhancer Browser and ENdb. The detailed information is displayed in two databases separately.
TF chip-seq We collected a total of 7734 TF ChIP-seq samples from ENCODE, Remap, Cistrome, ChIP-Atlas and GTRD and obtained 761 TFs. In VARAdb, we focus on the TFs binding regions embed variations (particularlly common SNPs).
Promoter We considered 2 types of promoters, one type is promoters relevant states from ChromHMM core 15 states and the other one is defined 2kb upstream and 1kb downstream of transcription start sites (TSSs) from GENCODE (50) basic gene annotation file of release 33. (UTL:ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_33/GRCh37_mapping/gencode.v33lift37.basic.annotation.gtf.gz).
Chromatin state Chromatin states (such as enhancers, promoters, insulators and heterochromatin) are necessary factor to analyze regulatory information. Between different chromatin marks, Roadmap used ChromHMM v1.10 to calculate chromatin states across 127 epigenomes based on a multivariate Hidden Markov Model. And we added the ChromHMM core 15 states of 5 chromatin marks (H3K4me3, H3K4me1, H3K36me3, H3K27me3, H3K9me3) to VARAdb.
Histone modification From ENCODE and Roadmap, we obtained histone modifications (H3K36me3, H3K4me1, H3K4me3, H3K79me2, H4K20me1 and H3K9ac) involved 686 ChIP-seq samples.


Related genes

Variation proximal gene To obtain related genes of variations, we used the ROSE method to predict genes including the closest gene, overlapping genes and proximal genes.
Enhancer proximal gene To obtain related genes of tissue/cell-specific enhancers, we also used the ROSE method to predict genes including the closest gene, overlapping genes and proximal genes. Enhancers are came from Fantom5 808 biosamples.
Enhancer target gene We downloaded enhancer target genes predicted by Lasso algorithm from a study (PMID:24670764).
Variation gene For variation related genes, we also considered other resorces. We integrate genes from eQTLs and GWAS reported genes.

Chromatin accessibility

ATAC Open chromatin can be identified using ATAC-seq and Dnase-seq, which is reported to have multiple regulatory elements enriched and embed variations with regulation of distal gene resulting in heterogeneity. We cataloged accessible regions of ATAC-seq (99 samples from Cistrome and 23 cancer types from TCGA), as well as Dnase-seq (narrowpeakS bed files of 243 samples from ENCODE).
DHS Open chromatin can be identified using ATAC-seq and Dnase-seq, which is reported to have multiple regulatory elements enriched and embed variations with regulation of distal gene resulting in heterogeneity. We cataloged the accessible regions of ATAC-seq (99 samples from Cistrome and 23 cancer types from TCGA), as well as Dnase-seq (narrowpeakS bed files of 243 samples from ENCODE).

Chromatin interaction

Hi-C Regulatory elements such as enhancers, are anchored to the promoter region via chromatin looping to affect gene transcription. In VARAdb, we contained chromatin interaction data from 3 databases and downloaded human species chromatin interaction data of 6 experiments including ChIA-PET, IM-PET, Hi-C, 3C, 4C and 5C from 4DGenome. 1,114,278 Hi-C results.
ChIA-PET Same description with the above. 682,526 ChIA-PET results.
IM-PET Same description with the above. 1,844,553 IM-PET results.
3C Same description with the above. 144 3C results.
4C Same description with the above. 298 3C results.
5C Same description with the above. 6,019 5C results.

Related websites

HaploReg
RegulomeDB
rSNPBase
3DSNP
OncoBase
GTEx
Ensembl
dbSNP
ClinVar
dbVar
1000 Genomes
DisGeNET