Introduction

CRAVAT is a web server with simple interface where cancer-related analysis of variants are performed. To cite CRAVAT, please use this article.

CRAVAT currently employs three analysis tools, CHASM, SNVGet, and VEST. For more information on these tools, refer to Analysis Tools chapter.

On how to use CRAVAT, refer to How to Use chapter.

On how to interpret the reports by CRAVAT, refer to Output Report.

How to Cite

To cite CRAVAT, please use the following literature:

Douville C, Carter H, Kim R, Niknafs N, Diekhans M, Stenson PD, Cooper DN, Ryan M, Karchin R (2013). CRAVAT: Cancer-Related Analysis of VAriants Toolkit Bioinformatics, 29(5):647-648.

To cite CHASM, please use the following literature:

Carter H, Chen S, Isik L, Tyekucheva S, Velculescu VE, Kinzler KW, Vogelstein B, Karchin R (2009) Cancer-specific high-throughput annotation of somatic mutations: computational prediction of driver missense mutations Cancer Res, 69(16):6660-7.

To cite VEST, please use the following literature:

Douville C, Christopher, Masica DL, Stenson PD, Cooper DN, Gygax DM, Kim R, Ryan M, and Karchin R (2015) Assessing the Pathogenicity of Insertion and Deletion Variants with the Variant Effect Scoring Tool (VEST-indel) Human Mutation, doi: 10.1002/humu.22911.
Carter H, Douville C, Stenson P, Cooper D, Karchin R (2013) Identifying Mendelian disease genes with the Variant Effect Scoring Tool BMC Genomics, 14(Suppl 3):S3.

To cite SNVBox, please use the following literature:

Wong WC, Kim D, Carter H, Diekhans M, Ryan M, Karchin R (2011). CHASM and SNVBox: toolkit for detecting biologically important single nucleotide mutations in cancer Bioinformatics, 27(15):2147-2148.

Analysis Tools of CRAVAT

CRAVAT currently employs three analysis tools, CHASM, SNVGet, and VEST:

CHASM (Cancer-specific High-throughput Annotation of Somatic Mutations) is a method that predicts the functional significance of somatic missense variants observed in the genomes of cancer cells, allowing variants to be prioritized in subsequent functional studies, based on the probability that they confer increased fitness to a cancer cell. CHASM uses a machine learning method called Random Forest to distinguish between driver and passenger somatic missense variation, The Random Forest is trained on a positive class of drivers curated from the COSMIC database and a negative class of passengers, generated in silico, according to passenger base substitution frequencies estimated for a specific tumor type. Each variant is represented by a list of features, including amino acid substitution properties, alignment-based estimates of conservation at the variant position, predicted local structure and annotations from the UniProt Knowledgebase. Only missense mutations are analyzed by CHASM. For more information on CHASM, please visit http://wiki.chasmsoftware.org and refer to this and this articles.
VEST is a method that predicts the functional effect of a variant. The classifier and null distribution for VEST has been updated on November 12, 2012, so the VEST result obtained before November 13, 2012 might be different from those obtained after that date. For more information on VEST, please visit http://wiki.chasmsoftware.org and refer to this article.
SNVGet retrieves selected predictive features for a variant. Features can be broadly categorized into 3 types:
- Amino Acid Substitution features
- Protein-based position-specific features
- Exon-specific features
Only missense mutations are analyzed by SNVGet. For more information on SNVBox (database made with SNVGet), please visit http://wiki.chasmsoftware.org and refer to this article.
For more information on CRAVAT, please refer to this article.

User Account

CRAVAT provides user account functionality. You can create your user account, retrieve/change your password, and see the status of your jobs and retrieve the results of your jobs through "My Jobs" page. Your username is your email.

Create a CRAVAT Account:

There are two ways to create a CRAVAT account:

When you submit a job for the first time, CRAVAT will create an account with your email and a temporary password and this account information will be sent to you as a part of the result notification email.
A CRAVAT account can be created by clicking "Log-In" > "Create an account" on the top menu.

Retrieve Your Username

Your username is your email.

Retrieve Your Password

If you forgot your password, click "Log-In" > "Forgot password?", type your username (your email) and click "Submit". A temporary password will be sent to you.

Change Your Password

To change your password, first log in, and then click "My Profile" > "Change password". In the "Change Password" pop-up window, type your current password, your new password, and again your new password. Click "Submit".

My Jobs Page

After having logged in, click "My Jobs" on the top menu to open the My Jobs page in a new browser tab. This page shows your past and current jobs and their parameters and status (success, fail, running, and in-queue). By clicking "Here" in the "Result file" column, you can download the result files through this My Jobs page conveniently.

Submitting an Analysis Job

Input

Prepare the mutations you wish to score either as a text of amino-acid residue substitutions or as a text of genomic-coordinate variants, in the following formats:

Comment lines: All the lines that start with ">", "#", or "!" will be ignored as comments.
UID in the below examples is an identifying string uniquely given to each variant-sample pair. UID should not contain any comma.
Genomic-coordinate format (separated by a tab or a space):
```
				# UID / Chr. / Position / Strand / Ref. base / Alt. base / Sample ID (optional)
				TR1	chr17	7577506	-	G	T	TCGA-02-0231
				TR2	chr10	123279680	-	G	A	TCGA-02-3512
				TR3	chr13	49033967	+	C	A	TCGA-02-3532
				TR4	chr7	116417505	+	G	T	TCGA-02-1523
				TR5	chr7	140453136	-	T	A	TCGA-02-0023
				TR6	chr17	37880998	+	G	T	TCGA-02-0252
				Ins1 chr17	37880998	+	-	T	TCGA-02-0252
				Del1 chr17	37880998	+	A	-	TCGA-02-0252
				CSub1 chr2	39871235	+	ATGCT	GA	TCGA-02-0252
				
```
Position is a 1-based open coordinate. For insertions and deletions, use "-" as the reference base for insertion and "-" as the alternate base for deletion. In the above example, Ins1 is that "T" is inserted between the 37880997th and the 37880998th bases. Del1 is that "A" at the 37880998th position is deleted. CSub1 is that "ATGCT" from the 39871235th to the 39871239th positions are changed to "GA". If you do not have strand information from your sequencing results, it is likely that they are all reported on the + strand. Make sure that your reported reference base matches the base in the reported position in the hg19 reference sequence (or hg18 if you checked hg18 checkbox).

* The old format for indels, where you have to specify the base before the insertion/deletion location, are still supported. However, if this old format is used in any row of your input, your entire input will be treated as being in the old format.
Amino-acid residue substitution format (separated by a tab or a space):
```
						# UID / Transcript / AA change / Sample ID (optional)
						TR1	NM_001126116.1	D127Y	TCGA-02-0231
						TR2	NM_001144919.1	R162Q	TCGA-02-3512
						TR3	NM_000321.2	Q702K	TCGA-02-3532
						TR4	NM_000245.2	A1108S	TCGA-02-1523
						TR5	NM_004333.4	V600E	TCGA-02-0023
						TR6	NM_001005862.1	G746V	TCGA-02-0252
						
```
trascript identifier can be from either NCBI Refseq (NM accessions), CCDS, or Ensembl (ENST accessions). Refseq and CCDS accessions can be specified without version numbers. The format of "AA change" column is (reference AA)(AA position)(alternate AA), without "(" and ")". Reference and alternate AAs should be from the 20 essential amino acids and each of them should be one amino acid-long.

VCF format v4.0 and above is supported by CRAVAT. CRAVAT converts VCF format input to CRAVAT format and uses the converted input for analysis and annotation. Only CHROM, POS, ID, REF, ALT, GT, and sample name fields will be preserved in the conversion (The ID field in VCF format will become the UID field in CRAVAT format. If there are multiple samples in the VCF format input, the sample name will be added to ID in the ID -> UID conversion to differentiate the same variant from different samples).

CRAVAT is run with a queuing system, which has two separate queues for small and large jobs. This is done so that small jobs are not held up behind longer-running jobs. Currently, small jobs are defined as those with 25,000 or less mutations. 25,000 mutations will take approximately 1 hour for any analysis. However, your job may finish earlier if the server is not heavily loaded. As of March 4, 2014, the largest single job CRAVAT has processed had 4 million mutations.

Analysis

Choose an analysis type:

Cancer driver analysis: This analysis predicts whether the submitted variants are cancer drivers or not.
Pathogenicity analysis: This analysis predicts whether the submitted variants will have any pathogenic effect on their translated proteins or not.
Gene annotation only: This analysis provides GeneCard and PubMed information on the genes containing the submitted variants.

When an analysis type is chosen, the options for analysis programs will show up. Multiple analysis programs can be chosen, and if any of the program needs a cancer tissue type to be specified, a list box for the selection of the cancer type also will appear.
Currently, the following tissue types can be chosen at CRAVAT.

Name	Full name	Source	Date
Bladder	Bladder Urothelial Carcinoma	BLCA (TCGA)	Jun 2013
Blood-Lymphocyte	Chronic Lymphocytic Leukemia	CLL (ICGC)	Mar 2013
Blood-Myeloid	Acute Myeloid Leukemia	LAML (TCGA)	Jun 2013
Brain-Cerebellum	Medulloblastoma	MB (mixed source)	Dec 2010
Brain-Glioblastoma-Multiforme	Glioblastoma Multiforme	GBM (TCGA)	Jun 2013
Brain-Lower-Grade-Glioma	Brain Lower Grade Glioma	LGG (TCGA)	Jun 2013
Breast	Breast Invasive Carcinoma	BRCA (TCGA)	Jun 2012
Cervix	Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma	CESC (TCGA)	Jun 2013
Colon	Colon Adenocarcinoma	COAD (TCGA)	Jun 2013
Head and Neck	Head and Neck Squamous Cell Carcinoma	HNSC (TCGA)	Jun 2013
Kidney-Chromophobe	Kidney Chromophobe	KICH (TCGA)	Jun 2013
Kidney-Clear-Cell	Kidney Renal Clear Cell Carcinoma	KIRC (TCGA)	Jun 2013
Kidney-Papillary-Cell	Kidney Renal Papillary Cell Carcinoma	KIRP (TCGA)	Jun 2013
Liver-Nonviral	Hepatocellular Carcinoma (Secondary to Alcohol and Adiposity)	HCCA (ICGC)	Mar 2013
Liver-Viral	Hepatocellular Carcinoma (Viral)	HCCV (ICGC)	Mar 2013
Lung-Adenocarcinoma	Lung Adenocarcinoma	LUAD (TCGA)	Jun 2013
Lung-Squamous Cell	Lung Squamous Cell Carcinoma	LUSC (TCGA)	Jun 2013
Melanoma	Melanoma	ML (Yardena Samuels lab)	Dec 2011
Other	General purpose	OV (TCGA)	Jun 2013
Ovary	Ovarian Serous Cystadenocarcinoma	OV (TCGA)	Jun 2013
Pancreas	Pancreatic Cancer	PNCC (ICGC))	Mar 2013
Prostate-Adenocarcinoma	Prostate Adenocarcinoma	PRAD (TCGA)	Jun 2013
Rectum	Rectum Adenocarcinoma	READ (TCGA)	Jun 2013
Skin	Skin Cutaneous Melanoma	SKCM (TCGA)	Jun 2013
Stomach	Stomach Adenocarcinoma	STAD (TCGA)	Jun 2013
Thyroid	Thyroid Carcinoma	THCA (TCGA)	Jun 2013
Uterus	Uterine Corpus Endometriod Carcinoma	UCEC (TCGA)	Jun 2013

Lastly, check "Include gene annotation" based on whether you want to include in the result email the GeneCard and PubMed annotation of the genes containing the submitted variants.

Submit

Enter your email address (if you have logged in you don't need to), and if you want to receive machine processing-friendly, tab-separated text version of the CRAVAT analysis report in addition to its default Microsoft Excel version, check "Include text reports for machine processing". Then, click "SUBMIT". When all the analyses are complete, an email with reports will be sent to you. If you have logged in you can check the status and history of your jobs at 'My Jobs' page, where you can also download your result by clicking 'Here' in the 'Result file' column.

RESTful Web Service

With CRAVAT's RESTful web service, you can submit and check the status of your jobs withuot using a browser.

Jobs

Job submission via POST
URL: http://www.cravat.us/CRAVAT/rest/service/submit
Method: POST
Consumes: Multipart/form-data
Produces: a JSON object, notable fields of which are as follows.
- status: "submitted" for successful job submission, "submissonfailed" for an error in the job submission
- errormsg: If there was any error during the job submission, the error message is written here.
- jobid: The Job ID of the submitted job. This job ID can be used to check the status of the job later using "status" method which is explained below.
Form data parameters (* = essential parameters):
- analyses: "CHASM", "SnvGet", "VEST", "CHASM;VEST", "CHASM;SnvGet", "VEST;SnvGet", or "CHASM;VEST;SnvGet"
- chasmclassifier: classifier name for CHASM analysis
- *email: email of the submitter
- functionalannotation: "on" or "off". GeneCards and PubMed annotation.
- hg18: "on" or "off". Input mutations are in hg18 coordinates or not.
- *inputfile: Input mutation file. This is from the file input element in the POST form.
- mupitinput: "on" or "off". MuPIT input format returned or not.
- tsvreport: "on" or "off". Text format reports returned or not.
Job submission via GET
URL: http://www.cravat.us/CRAVAT/rest/service/submit
Method: GET
Produces: a JSON object, notable fields of which are as follows.
- status: "submitted" for successful job submission, "submissonfailed" for an error in the job submission
- errormsg: If there was any error during the job submission, the error message is written here.
- jobid: The Job ID of the submitted job. This job ID can be used to check the status of the job later using "status" method which is explained below.
Query parameters (* = essential parameters):
- analyses: "CHASM", "SnvGet", "VEST", "CHASM;SnvGet", or "VEST;SnvGet"
- chasmclassifier: classifier name for CHASM analysis
- *email: email of the submitter
- functionalannotation: "on" or "off". GeneCards and PubMed annotation.
- hg18: "on" or "off". Input mutations are in hg18 coordinates or not.
- *mutations: a string with mutations, the format of which is the same as described in the "Input" section above.
- mupitinput: "on" or "off". MuPIT input format returned or not.
- tsvreport: "on" or "off". Text format reports returned or not.
Job status checking
URL: http://www.cravat.us/CRAVAT/rest/service/status
Method: GET
Produces: a JSON object, notable fields of which are as follows.
- status: "running" for still running, "success" for successful completion, "jobfailed" for failed
- errormsg: Error message if the job failed.
- resultfileurl: If the job completed successfully, the URL of the result file.
Query parameters (* = essential parameter):
- *jobid: The job ID to query.
Example: http://www.cravat.us/CRAVAT/rest/service/status?jobid=test@20140204_102423

Single Variant

Single variant Web API
URL:http://www.cravat.us/CRAVAT/rest/service/query
Method: GET
Produces: a JSON object, notable fields of which are as follows.
- Chromosome: Chromosome of the variant
- Position: Position of the variant
- Strand: DNA strand on which the variant is on
- Reference base: Base(s) at the variant position in the reference genome (hg18 or hg19)
- Alternate base: Sequence of the variant
- Hugo symbol: Gene symbol from HUGO in which the variant resides
- Sequence ontology transcript: Transcript used to get the most severe sequence Ontology. If there are more than one transcript of the most severe sequence ontology, the longest RefSeq transcript (if not, the longest Ensembl one, or the longest CCDS one, in this order) is chosen.
- Protein sequence change: Protein sequence change for the Sequence ontology column
- Sequence ontology: Sequence Ontology annotation. See Sequence Ontology section below. When more than one sequence ontology is found due to multiple transcript mapping, the most severe consequence is reported, according to the order of FI, FD, SG, SS, SL, II, ID, CS, MS, and SY.
- Sequence ontology all transcripts: Sequence ontology for each transcript mapped to the variant position. An asterisk is assigned to the transcript that was used to get the most severe sequence ontology.
- ExAC total allele frequency: Total allele frequency from ExAC
- ExAC allele frequency (African/African American): ExAC allele frequency in African and African American population
- ExAC allele frequency (Latino): ExAC allele frequency in Latino population
- ExAC allele frequency (East Asia): ExAC allele frequency in East Asian population
- ExAC allele frequency (Finish): ExAC allele frequency in Finnish population
- ExAC allele frequency (Non-Finnish European): ExAC allele frequency in Non-Finnish European population
- ExAC allele frequency (Other): ExAC allele frequency in Other population
- ExAC allele frequency (South Asian): ExAC allele frequency in South Asian population
- 1000 Genomes allele frequency: Allele frequency from the 1000 Genomes project
- ESP6500 allele frequency (European American): Allele frequency in the European American population, from ESP6500
- ESP6500 allele frequency (African American): Allele frequency in the African American population, from ESP6500
- Transcript in COSMIC: COSMIC Transcript that is mapped to the input variant
- Protein sequence change in COSMIC: Protein sequence change caused by the variant, according to COSMIC
- Occurrences in COSMIC [exact nucleotide change]: How many times the variant is observed in COSMIC
- Occurrences in COSMIC by primary sites [exact nucleotide change]: How many times the mutation is observed in COSMIC, grouped by primary sites
- Mappability Warning: Warning codes for whether the mutation's mapping is reliable or not. See Mappability section below.
- Driver Genes: Cancer driver gene hits (oncogenes and tumor suppressor genes) according to Vogelstein et al.
- TARGET: TARGET drug association DB hits
- dbSNP: dbSNP record which has the mutation
Query parameters (* = essential parameter):
- *mutation: The chromsome, position, strand direction, reference base and alternate base of the variant separated by underscores (chomosome_position_strand_refBase_altBase
Example: http://www.cravat.us/CRAVAT/rest/service/query?mutation=chr22_30421786_+_A_T

Downloadable Results

Upon a successful submission and analysis, you will receive a link to your results via email (if you have logged in you can check the status and history of your jobs at 'My Jobs' page, where you can also download your result by clicking 'Here' in the 'Result file' column), which will be available for 30 days from the date of submission. The results will be delivered as one zip-compressed file containing several report files, including a MS Excel format spreadsheet and optional tab-separated text files. There are three levels of analysis: variant, codon, and gene level. The spreadsheet has each level as a tab, and the tab-separated text files have each level as a separate .tsv file. SNVGet analysis result also shows up as a separate tab or file. The result of the analysis at each level is shown as a table, and the columns of the table are explained below.

Variant Analysis Result

Column	Meaning
Input line number	Line number from the input file
ID	Unique ID of a mutation input line
Chromosome	Chromosome of the mutation
Position	Position of the mutation
Strand	DNA strand on which the mutation is on
Reference base(s)	Base(s) at the mutation position in the reference genome (hg18 or hg19)
Alternate base(s)	Sequence of the mutation
Sample ID	ID of the sample from which the mutation was observed
HUGO symbol	Gene symbol from HUGO in which the mutation resides
Sequence ontology	Sequence Ontology annotation. See Sequence Ontology section below. When more than one sequence ontology is found due to multiple transcript mapping, the most severe consequence is reported, according to the order of FI, FD, SG, SS, SL, II, ID, CS, MS, and SY.
Protein sequence change	Protein sequence change for the Sequence ontology column.
QUAL	Phred-scaled quality score for the assertion made in the alternate bases. This column appears only with a VCF-format input.
FILTER	PASS if the mutation position passed all filters. Otherwise, a semicolon-separated list of codes for filters that fail (e.g. "q10;s50"). This column appears only with a VCF-format input.
Zygosity	"hom" or "het" depending on whether the alternate allele is present on both chromosomes or only one of them, respectively. This column appears only with a VCF-format input.
CHASM cancer driver p-value (missense)	Empirically-derived p-value of the CHASM cancer driver score. Only missense mutations are considered.
CHASM cancer driver FDR (missense)	Benjamini-Hochberg false discovery rate. Only missense mutations are considered.
VEST pathogenicity p-value (non-silent)	Empirically-derived p-value of the VEST pathogenicity score. Only non-silent mutations are considered.
VEST pathogenicity FDR (non-silent)	Benjamini-Hochberg false discovery rate. Only non-silent mutations are considered.
Mappability Warning	Warning codes for whether the mutation's mapping is reliable or not. See Mappability section below.
Driver Genes	Cancer driver gene hits (oncogenes and tumor suppressor genes) according to Vogelstein et al.
TARGET	TARGET drug association DB hits
dbSNP	dbSNP record which has the mutation
1000 Genomes allele frequency	Allele frequency from the 1000 Genomes project
ESP6500 allele frequency (average)	Average allele frequency from ESP6500
ExAC total allele frequency	Total allele frequency from ExAC
Occurrences in COSMIC by primary sites [exact nucleotide change]	How many times the mutation is observed in COSMIC, grouped by primary sites
Number of samples in study having the exact nucleotide change	Number of samples in study having the exact nucleotide change
MuPIT Link	If the mutation falls on a known protein structure or a homology model (see here), it can be visualized with MuPIT by clicking the link in this column.
GeneCards summary	Information on the gene containing the mutation, pulled from GeneCards
Number of retrieved articles from PubMed	Number of the records retrieved in PubMed, using the name of the gene which contains the mutation and "cancer" as keywords. First, the keywords are searched in MeSH terms. If nothing is found, title and abstract of literature are searched. If nothing is still found, the keywords are searched without restriction on their appearance.
PubMed search term	Link to the PubMed search result with the mutation's gene name and "cancer" as keywords

Variant Additional Details Result

Column	Meaning
Input line number	Line number from the input file
ID	Unique ID of a mutation input line
Chromosome	Chromosome of the mutation
Position	Position of the mutation
Strand	DNA strand on which the mutation is on
Reference base(s)	Base(s) at the mutation position in the reference genome (hg18 or hg19)
Alternate base(s)	Sequence of the mutation
Sample ID	ID of the sample from which the mutation was observed
HUGO symbol	Gene symbol from HUGO in which the mutation resides
Sequence ontology	Sequence Ontology annotation. See Sequence Ontology section below. When more than one sequence ontology is found due to multiple transcript mapping, the most severe consequence is reported, according to the order of FI, FD, SG, SS, SL, II, ID, CS, MS, and SY.
Sequence ontology transcript	Transcript used to get the most severe sequence Ontology. If there are more than one transcript of the most severe sequence ontology, the longest RefSeq transcript (if not, the longest Ensembl one, or the longest CCDS one, in this order) is chosen.
Sequence ontology transcript strand	The strand (+ or -) of the transcript used to get the sequence ontology
Protein sequence change	Protein sequence change for the Sequence ontology column
Sequence ontology all transcripts	Sequence ontology for each transcript mapped to the variant position. An asterisk is assigned to the transcript that was used to get the most severe sequence ontology.
CHASM cancer driver score transcript	Transcript used to get the CHASM cancer driver score
Cancer missense driver score (1 - CHASM score)	1 - CHASM cancer driver score. Closer to 1 means that the mutation is more likely a cancer driver.
CHASM cancer driver p-value (missense)	Empirically-derived p-value of the CHASM cancer driver score. Only missense mutations are considered.
CHASM cancer driver FDR (missense)	Benjamini-Hochberg false discovery rate. Only missense mutations are considered.
Cancer missense driver score of all transcripts	Cancer missense driver score (1 - CHASM score) and p-value of each transcript that has mapping to the input variant. Format is Transcript:Protein sequence change(Cancer missense driver score:CHASM cancer driver p-value). An asterisk is assigned to the transcript that has the highest cancer missense driver score.
VEST pathogenicity score transcript	Transcript used to get VEST pathogenicity score
VEST pathogenicity score (missense)	VEST pathogenicity score for missense variants
VEST pathogenicity score (frameshift indels)	VEST pathogenicity score for frameshift indels
VEST pathogenicity score (inframe indels)	VEST pathogenicity score for inframe indels
VEST pathogenicity score (stop-gain)	VEST pathogenicity score for stop-gain variants
VEST pathogenicity score (stop-loss)	VEST pathogenicity score for stop-loss variants
VEST pathogenicity score (splice site)	VEST pathogenicity score for splice site variants
VEST pathogenicity score and p-value of all transcripts (non-silent)	VEST pathogenicity score and p-value of each transcript that has mapping to the input variant. Format is Transcript:Protein sequence change(VEST pathogenicity score:VEST pathogenicity p-value). An asterisk is assigned to the transcript that has the highest VEST pathogenicity score.
ESP6500 allele frequency (European American)	Allele frequency in the European American population, from ESP6500
ESP6500 allele frequency (African American)	Allele frequency in the African American population, from ESP6500
ExAC allele frequency (Latino)	ExAC allele frequency in Latino population
ExAC allele frequency (African/African American)	ExAC allele frequency in African and African American population
ExAC allele frequency (East Asian)	ExAC allele frequency in East Asian population
ExAC allele frequency (Finnish)	ExAC allele frequency in Finnish population
ExAC allele frequency (Non-Finnish European)	ExAC allele frequency in Non-Finnish European population
ExAC allele frequency (Other)	ExAC allele frequency in Other population
ExAC allele frequency (South Asian)	ExAC allele frequency in South Asian population
Transcript in COSMIC	COSMIC Transcript that is mapped to the input variant
Protein sequence change in COSMIC	Protein sequence change caused by the variant, according to COSMIC
Occurrences in COSMIC [exact nucleotide change]	How many times the variant is observed in COSMIC

Variant Non-coding Result

Non-coding regions are regions in the genome that are not in a protein coding portion of a gene. This includes UTR, intron, non-coding RNA, and intergenic regions.

Column	Meaning
Input line number	Line number from the input file
ID	Unique ID of a mutation input line
Chromosome	Chromosome of the mutation
Position	Position of the mutation
Strand	DNA strand on which the mutation is on
Reference base(s)	Base(s) at the mutation position in the reference genome (hg18 or hg19)
Alternate base(s)	Sequence of the mutation
Sample ID	ID of the sample from which the mutation was observed
HUGO symbol	Gene symbol from HUGO in which the mutation resides
Sequence ontology	Sequence Ontology annotation. See Sequence Ontology section below. When more than one sequence ontology is found due to multiple transcript mapping, the most severe consequence is reported, according to the order of FI, FD, SG, SS, SL, II, ID, CS, MS, and SY.
QUAL	Phred-scaled quality score for the assertion made in the alternate bases. This column appears only with a VCF-format input.
FILTER	PASS if the mutation position passed all filters. Otherwise, a semicolon-separated list of codes for filters that fail (e.g. "q10;s50"). This column appears only with a VCF-format input.
Zygosity	"hom" or "het" depending on whether the alternate allele is present on both chromosomes or only one of them, respectively. This column appears only with a VCF-format input.
Mappability Warning	Warning codes for whether the mutation's mapping is reliable or not. See Mappability section below.
dbSNP	dbSNP record which has the mutation
1000 Genomes allele frequency	Allele frequency from the 1000 Genomes project
ESP6500 allele frequency (average)	Average allele frequency from ESP6500
ExAC total allele frequency	Total allele frequency from ExAC
Occurrences in COSMIC by primary sites [exact nucleotide change]	How many times the mutation is observed in COSMIC, grouped by primary sites
Number of samples in study having the exact nucleotide change	Number of samples in study having the exact nucleotide change

Gene Level Analysis Result

Column	Meaning
HUGO Symbol	Gene symbol from HUGO in which the mutation resides
Sequence ontology	Sequence Ontology annotation. See Sequence Ontology section below.
Cancer missense driver score (1-CHASM score)	Most cancer driving CHASM cancer driver score found in the gene. The closer to 1, the more cancer driving variant the gene has.
VEST pathogenicity score (non-silent)	Most pathogenic VEST pathogenicity score found in the gene. The closer to 1, the more pathogenic variant the gene has.
VEST pathogenicity composite p value (non-silent)	Composite p-value based on Stouffer's Z-score method
VEST pathogenicity FDR (non-silent)	Composite FDR based on Stouffer's Z-score method
Driver Genes	Cancer driver gene hits (oncogenes and tumor suppressor genes) according to Vogelstein et al.
TARGET	TARGET drug association DB hits
Occurrences in COSMIC [gene mutated]	How many times any mutation in the gene is observed in COSMIC
Occurrences in COSMIC by primary sites [gene mutated]	How many times any mutation in the gene is observed in COSMIC, grouped by primary sites
Number of samples in study having the gene mutated	Number of samples in study having the gene mutated
MuPIT Link	If the mutations in the gene fall on a known protein structure, they can be visualized with MuPIT by clicking the link in this column.
GeneCards summary (from http://www.genecards.org)	Information on the gene containing the mutation, pulled from GeneCards
Number of retrieved articles from PubMed	Number of the records retrieved in PubMed, using the name of the gene which contains the mutation and "cancer" as keywords. First, the keywords are searched in MeSH terms. If nothing is found, title and abstract of literature are searched. If nothing is still found, the keywords are searched without restriction on their appearance.
PubMed search term	Link to the PubMed search result with the mutation's gene name and "cancer" as keywords

SNVBox Analysis Result

Column	Meaning
Input line number	Line number from the input file
ID	Unique ID of a variant input line
Chromosome	Chromosome of the mutation
Position	Position of the mutation
Strand	DNA strand on which the mutation is on
Reference base(s)	Base(s) at the mutation position in the reference genome (hg18 or hg19)
Alternate base(s)	Sequence of the mutation
Sample ID	ID of the sample from which the mutation was observed
HUGO Symbol	Gene symbol from HUGO in which the mutation resides
Sequence ontology	Sequence Ontology annotation. See Sequence Ontology section below.
Sequence ontology transcript	Transcript used to get the most severe sequence Ontology. If there are more than one transcript of the most severe sequence ontology, the longest RefSeq transcript (if not, the longest Ensembl one, or the longest CCDS one, in this order) is chosen.
Protein sequence change	Position and amino acid changed by the variant, in the representative transcript

To understand the other columns of the SNVBox analysis result table, please refer to this document for comprehensive explanation.

Input Errors Result

Column	Meaning
Input line number	Line number from the input file
Input line UID	Unique ID of the input line
Gene	The gene the input occurs on
Error	The reason this input line caused an error
Input Line	The variant from the input file that caused the error

Sequence Ontology Codes

Code	Meaning
SY	Synonymous Variant
SL	Stop Lost
SG	Stop Gained
MS	Missense Variant
II	Inframe Insertion
FI	Frameshift Insertion
ID	Inframe Deletion
FD	Frameshift Deletion
CS	Complex Substitution

The source of Sequence Ontology terms is here.

Mappability Codes

Code	Meaning
A75	The hg19 reference genome has more than 1 location with the 75 mer sequence from the query position
ACR	ACRO1 (Human acromeric satellite)
ALC	ALR/Alpha
BSR	Beta satellite repeat/beta
CAT	(CATTC)n
CHM	Chromosome M
CNR	Centromeric Repeat
GAA	(GAATG)n
GAG	(GAGTG)n
HMI	High artifact island
LMI	Low artifact island
LSU	Large subunit rRNA Hsa
snR	Small nuclear RNA
SSU	Small subunit rRNA Hsa
STL	Satellite repeat
TAR	TAR1
TII	HSATII (Human satellite II DNA)
TLM	Telomeric repeat

The source of the mappability tags is here.

CRAVAT Galaxy Tool

A Galaxy Tool for querying CRAVAT is available at https://toolshed.g2.bx.psu.edu/view/insilicosolutions/cravat/9e29dd2972ab. CRAVAT input format is used.

Create a CRAVAT Account:

Retrieve Your Username

Retrieve Your Password

Change Your Password

My Jobs Page

Job submission via POST

Job submission via GET

Job status checking

Single variant Web API