|
CRAVAT is a web server with simple interface where cancer-related analysis of variants are performed. To cite CRAVAT, please use
this article.
CRAVAT currently employs three analysis tools, CHASM, SNVGet, and VEST.
For more information on these tools, refer to Analysis Tools chapter.
On how to use CRAVAT, refer to How to Use chapter.
On how to interpret the reports by CRAVAT, refer to Output Report.
To cite CRAVAT, please use the following literature:
- Douville C, Carter H, Kim R, Niknafs N, Diekhans M, Stenson PD, Cooper DN, Ryan M, Karchin R (2013).
CRAVAT: Cancer-Related Analysis of VAriants Toolkit
Bioinformatics, 29(5):647-648.
To cite CHASM, please use the following literature:
- Carter H, Chen S, Isik L, Tyekucheva S, Velculescu VE, Kinzler KW, Vogelstein B, Karchin R (2009)
Cancer-specific high-throughput annotation of somatic mutations: computational prediction of driver missense mutations
Cancer Res, 69(16):6660-7.
To cite VEST, please use the following literature:
- Douville C, Christopher, Masica DL, Stenson PD, Cooper DN, Gygax DM, Kim R, Ryan M, and Karchin R (2015)
Assessing the Pathogenicity of Insertion and Deletion Variants with the Variant Effect Scoring Tool (VEST-indel)
Human Mutation, doi: 10.1002/humu.22911.
- Carter H, Douville C, Stenson P, Cooper D, Karchin R (2013)
Identifying Mendelian disease genes with the Variant Effect Scoring Tool
BMC Genomics, 14(Suppl 3):S3.
To cite SNVBox, please use the following literature:
- Wong WC, Kim D, Carter H, Diekhans M, Ryan M, Karchin R (2011).
CHASM and SNVBox: toolkit for detecting biologically important single nucleotide mutations in cancer
Bioinformatics, 27(15):2147-2148.
CRAVAT currently employs three analysis tools, CHASM, SNVGet, and VEST:
-
CHASM (Cancer-specific High-throughput Annotation of Somatic Mutations) is a method that
predicts the functional significance of somatic missense variants observed in the genomes
of cancer cells, allowing variants to be prioritized in subsequent functional studies, based on the
probability that they confer increased fitness to a cancer cell. CHASM uses a machine learning method called
Random Forest to distinguish between driver and passenger somatic missense variation, The Random Forest is
trained on a positive class of drivers curated from the COSMIC database and a negative class of passengers,
generated in silico, according to passenger base substitution frequencies estimated for a specific tumor type.
Each variant is represented by a list of features, including amino acid substitution properties,
alignment-based estimates of conservation at the variant position, predicted local structure and annotations
from the UniProt Knowledgebase. Only missense mutations are analyzed by CHASM.
For more information on CHASM, please visit
http://wiki.chasmsoftware.org and
refer to this and
this articles.
-
VEST is a method that predicts the functional effect of a variant.
The classifier and null distribution for VEST has been updated on November 12, 2012,
so the VEST result obtained before November 13, 2012 might be different from those obtained after that date.
For more information on VEST, please visit http://wiki.chasmsoftware.org
and refer to
this article.
-
SNVGet retrieves selected predictive features for a variant. Features can be broadly
categorized into 3 types:
-
Amino Acid Substitution features
-
Protein-based position-specific features
-
Exon-specific features
Only missense mutations are analyzed by SNVGet.
For more information on SNVBox (database made with SNVGet), please visit
http://wiki.chasmsoftware.org and
refer to
this article.
-
CRAVAT provides user account functionality. You can create your user account, retrieve/change your password, and see the status of your jobs and
retrieve the results of your jobs through "My Jobs" page. Your username is your email.
Create a CRAVAT Account:
There are two ways to create a CRAVAT account:
- When you submit a job for the first time, CRAVAT will create an account with your email and
a temporary password and this account information will be sent to you as a part of the result
notification email.
-
A CRAVAT account can be created by clicking "Log-In" > "Create an account" on the top menu.
Retrieve Your Username
Your username is your email.
Retrieve Your Password
If you forgot your password, click "Log-In" > "Forgot password?", type your username (your email) and click "Submit".
A temporary password will be sent to you.
Change Your Password
To change your password, first log in, and then click "My Profile" > "Change password".
In the "Change Password" pop-up window, type your current password, your new password, and again your new password. Click "Submit".
My Jobs Page
After having logged in, click "My Jobs" on the top menu to open the My Jobs page in a new browser tab.
This page shows your past and current jobs and their parameters and status (success, fail, running, and in-queue).
By clicking "Here" in the "Result file" column, you can download the result files through this My Jobs page conveniently.
Choose an analysis type:
-
Cancer driver analysis: This analysis predicts whether the submitted variants
are cancer drivers or not.
-
Pathogenicity analysis: This analysis predicts whether the submitted variants
will have any pathogenic effect on their translated proteins or not.
-
Gene annotation only: This analysis provides GeneCard and PubMed information on
the genes containing the submitted variants.
When an analysis type is chosen, the options for analysis programs will show up.
Multiple analysis programs can be chosen, and if any of the program needs a cancer tissue type
to be specified, a list box for the selection of the cancer type also will appear.
Currently, the following tissue types can be chosen at CRAVAT.
Name |
Full name |
Source |
Date |
Bladder |
Bladder Urothelial Carcinoma |
BLCA (TCGA) |
Jun 2013 |
Blood-Lymphocyte |
Chronic Lymphocytic Leukemia |
CLL (ICGC) |
Mar 2013 |
Blood-Myeloid |
Acute Myeloid Leukemia |
LAML (TCGA) |
Jun 2013 |
Brain-Cerebellum |
Medulloblastoma |
MB (mixed source) |
Dec 2010 |
Brain-Glioblastoma-Multiforme |
Glioblastoma Multiforme |
GBM (TCGA) |
Jun 2013 |
Brain-Lower-Grade-Glioma | Brain Lower Grade Glioma | LGG (TCGA) | Jun 2013 |
Breast | Breast Invasive Carcinoma | BRCA (TCGA) | Jun 2012 |
Cervix | Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma | CESC (TCGA) | Jun 2013 |
Colon | Colon Adenocarcinoma | COAD (TCGA) | Jun 2013 |
Head and Neck | Head and Neck Squamous Cell Carcinoma | HNSC (TCGA) | Jun 2013 |
Kidney-Chromophobe | Kidney Chromophobe | KICH (TCGA) | Jun 2013 |
Kidney-Clear-Cell | Kidney Renal Clear Cell Carcinoma | KIRC (TCGA) | Jun 2013 |
Kidney-Papillary-Cell | Kidney Renal Papillary Cell Carcinoma | KIRP (TCGA) | Jun 2013 |
Liver-Nonviral | Hepatocellular Carcinoma (Secondary to Alcohol and Adiposity) | HCCA (ICGC) | Mar 2013 |
Liver-Viral | Hepatocellular Carcinoma (Viral) | HCCV (ICGC) | Mar 2013 |
Lung-Adenocarcinoma | Lung Adenocarcinoma | LUAD (TCGA) | Jun 2013 |
Lung-Squamous Cell | Lung Squamous Cell Carcinoma | LUSC (TCGA) | Jun 2013 |
Melanoma | Melanoma | ML (Yardena Samuels lab) | Dec 2011 |
Other | General purpose | OV (TCGA) | Jun 2013 |
Ovary | Ovarian Serous Cystadenocarcinoma | OV (TCGA) | Jun 2013 |
Pancreas | Pancreatic Cancer | PNCC (ICGC)) | Mar 2013 |
Prostate-Adenocarcinoma | Prostate Adenocarcinoma | PRAD (TCGA) | Jun 2013 |
Rectum | Rectum Adenocarcinoma | READ (TCGA) | Jun 2013 |
Skin | Skin Cutaneous Melanoma | SKCM (TCGA) | Jun 2013 |
Stomach | Stomach Adenocarcinoma | STAD (TCGA) | Jun 2013 |
Thyroid | Thyroid Carcinoma | THCA (TCGA) | Jun 2013 |
Uterus | Uterine Corpus Endometriod Carcinoma | UCEC (TCGA) | Jun 2013 |
Lastly, check "Include gene annotation" based on whether you want to include in the result email
the GeneCard and PubMed annotation of the genes containing the submitted variants.
Enter your email address (if you have logged in you don't need to), and if you want to receive machine processing-friendly, tab-separated
text version of the CRAVAT analysis report in addition to its default Microsoft Excel version,
check "Include text reports for machine processing". Then, click "SUBMIT". When all the analyses
are complete, an email with reports will be sent to you. If you have logged in you can check the status and history of your jobs at 'My Jobs' page, where you can also
download your result by clicking 'Here' in the 'Result file' column.
With CRAVAT's RESTful web service, you can submit and check the status of your jobs withuot using a browser.
-
Job submission via POST
URL: http://www.cravat.us/CRAVAT/rest/service/submit
Method: POST
Consumes: Multipart/form-data
Produces: a JSON object, notable fields of which are as follows.
- status: "submitted" for successful job submission, "submissonfailed" for an error in the job submission
- errormsg: If there was any error during the job submission, the error message is written here.
- jobid: The Job ID of the submitted job. This job ID can be used to check the status of the job later using "status" method which is explained below.
Form data parameters (* = essential parameters):
- analyses: "CHASM", "SnvGet", "VEST", "CHASM;VEST", "CHASM;SnvGet", "VEST;SnvGet", or "CHASM;VEST;SnvGet"
- chasmclassifier: classifier name for CHASM analysis
- *email: email of the submitter
- functionalannotation: "on" or "off". GeneCards and PubMed annotation.
- hg18: "on" or "off". Input mutations are in hg18 coordinates or not.
- *inputfile: Input mutation file. This is from the file input element in the POST form.
- mupitinput: "on" or "off". MuPIT input format returned or not.
- tsvreport: "on" or "off". Text format reports returned or not.
-
Job submission via GET
URL: http://www.cravat.us/CRAVAT/rest/service/submit
Method: GET
Produces: a JSON object, notable fields of which are as follows.
- status: "submitted" for successful job submission, "submissonfailed" for an error in the job submission
- errormsg: If there was any error during the job submission, the error message is written here.
- jobid: The Job ID of the submitted job. This job ID can be used to check the status of the job later using "status" method which is explained below.
Query parameters (* = essential parameters):
- analyses: "CHASM", "SnvGet", "VEST", "CHASM;SnvGet", or "VEST;SnvGet"
- chasmclassifier: classifier name for CHASM analysis
- *email: email of the submitter
- functionalannotation: "on" or "off". GeneCards and PubMed annotation.
- hg18: "on" or "off". Input mutations are in hg18 coordinates or not.
- *mutations: a string with mutations, the format of which is the same as described in the "Input" section above.
- mupitinput: "on" or "off". MuPIT input format returned or not.
- tsvreport: "on" or "off". Text format reports returned or not.
-
Job status checking
URL: http://www.cravat.us/CRAVAT/rest/service/status
Method: GET
Produces: a JSON object, notable fields of which are as follows.
- status: "running" for still running, "success" for successful completion, "jobfailed" for failed
- errormsg: Error message if the job failed.
- resultfileurl: If the job completed successfully, the URL of the result file.
Query parameters (* = essential parameter):
- *jobid: The job ID to query.
Example: http://www.cravat.us/CRAVAT/rest/service/status?jobid=test@20140204_102423
-
Single variant Web API
URL:http://www.cravat.us/CRAVAT/rest/service/query
Method: GET
Produces: a JSON object, notable fields of which are as follows.
- Chromosome: Chromosome of the variant
- Position: Position of the variant
- Strand: DNA strand on which the variant is on
- Reference base: Base(s) at the variant position in the reference genome (hg18 or hg19)
- Alternate base: Sequence of the variant
- Hugo symbol: Gene symbol from HUGO in which the variant resides
- Sequence ontology transcript: Transcript used to get the most severe
sequence Ontology.
If there are more than one transcript of the most severe sequence ontology,
the longest RefSeq transcript (if not, the longest Ensembl one, or the longest
CCDS one, in this order) is chosen.
- Protein sequence change: Protein sequence change for the Sequence ontology column
- Sequence ontology: Sequence Ontology annotation.
See Sequence Ontology section below.
When more than one sequence ontology is found due to multiple transcript mapping,
the most severe consequence is reported, according to the order of
FI, FD, SG, SS, SL, II, ID, CS, MS, and SY.
- Sequence ontology all transcripts: Sequence ontology for each transcript mapped to the variant position.
An asterisk is assigned to the transcript that was used to get the most severe sequence ontology.
- ExAC total allele frequency: Total allele frequency from
ExAC
- ExAC allele frequency (African/African American): ExAC
allele frequency in African and African American population
- ExAC allele frequency (Latino): ExAC
allele frequency in Latino population
- ExAC allele frequency (East Asia): ExAC
allele frequency in East Asian population
- ExAC allele frequency (Finish): ExAC
allele frequency in Finnish population
- ExAC allele frequency (Non-Finnish European): ExAC
allele frequency in Non-Finnish European population
- ExAC allele frequency (Other): ExAC
allele frequency in Other population
- ExAC allele frequency (South Asian): ExAC
allele frequency in South Asian population
- 1000 Genomes allele frequency: Allele frequency from the 1000 Genomes project
- ESP6500 allele frequency (European American): Allele frequency in the European American population,
from ESP6500
- ESP6500 allele frequency (African American): Allele frequency in the African American population,
from ESP6500
- Transcript in COSMIC: COSMIC Transcript that is mapped to the input variant
- Protein sequence change in COSMIC: Protein sequence change caused by the variant, according to COSMIC
- Occurrences in COSMIC [exact nucleotide change]: How many times the variant is observed in COSMIC
- Occurrences in COSMIC by primary sites [exact nucleotide change]: How many times the mutation is observed in COSMIC, grouped by primary sites
- Mappability Warning: Warning codes for whether the mutation's mapping is reliable or not. See Mappability section below.
- Driver Genes: Cancer driver gene hits (oncogenes and tumor suppressor genes) according to Vogelstein et al.
- TARGET: TARGET drug association DB hits
- dbSNP: dbSNP record which has the mutation
Query parameters (* = essential parameter):
- *mutation: The chromsome, position, strand direction, reference base and alternate base of the variant separated by underscores (chomosome_position_strand_refBase_altBase
Example: http://www.cravat.us/CRAVAT/rest/service/query?mutation=chr22_30421786_+_A_T
Upon a successful submission and analysis, you will receive a link to your results via
email (if you have logged in you can check the status and history of your jobs at 'My Jobs' page, where you can also
download your result by clicking 'Here' in the 'Result file' column), which will be available for 30 days from the date of submission. The results will be
delivered as one zip-compressed file containing several report files, including a MS Excel format spreadsheet and optional tab-separated text files.
There are three levels of analysis: variant, codon, and gene level. The spreadsheet has each level as a tab, and the tab-separated text files have
each level as a separate .tsv file. SNVGet analysis result also shows up as a separate tab or file. The result of the analysis at each level is shown as a table, and the columns of the table are explained below.
Column |
Meaning |
Input line number |
Line number from the input file |
ID |
Unique ID of a mutation input line |
Chromosome |
Chromosome of the mutation |
Position |
Position of the mutation |
Strand |
DNA strand on which the mutation is on |
Reference base(s) |
Base(s) at the mutation position in the reference genome (hg18 or hg19) |
Alternate base(s) |
Sequence of the mutation |
Sample ID |
ID of the sample from which the mutation was observed |
HUGO symbol |
Gene symbol from HUGO in which the mutation resides |
Sequence ontology |
Sequence Ontology annotation.
See Sequence Ontology section below.
When more than one sequence ontology is found due to multiple transcript mapping,
the most severe consequence is reported, according to the order of
FI, FD, SG, SS, SL, II, ID, CS, MS, and SY.
|
Protein sequence change |
Protein sequence change for the Sequence ontology column. |
QUAL |
Phred-scaled quality score for the assertion made in
the alternate bases.
This column appears only with a VCF-format input. |
FILTER |
PASS if the mutation position passed all filters.
Otherwise, a semicolon-separated list of codes for filters that fail
(e.g. "q10;s50").
This column appears only with a VCF-format input. |
Zygosity |
"hom" or "het" depending on whether the alternate allele is present
on both chromosomes or only one of them, respectively.
This column appears only with a VCF-format input. |
CHASM cancer driver p-value (missense) |
Empirically-derived p-value of the CHASM cancer driver score. Only missense mutations are considered. |
CHASM cancer driver FDR (missense) |
Benjamini-Hochberg false discovery rate. Only missense mutations are considered. |
VEST pathogenicity p-value (non-silent) |
Empirically-derived p-value of the VEST pathogenicity score. Only non-silent mutations are considered. |
VEST pathogenicity FDR (non-silent) |
Benjamini-Hochberg false discovery rate. Only non-silent mutations are considered. |
Mappability Warning |
Warning codes for whether the mutation's mapping is reliable or not. See Mappability section below. |
Driver Genes |
Cancer driver gene hits (oncogenes and tumor suppressor genes) according to Vogelstein et al. |
TARGET |
TARGET drug association DB hits |
dbSNP |
dbSNP record which has the mutation |
1000 Genomes allele frequency |
Allele frequency from the 1000 Genomes project |
ESP6500 allele frequency (average) |
Average allele frequency from ESP6500 |
ExAC total allele frequency |
Total allele frequency from
ExAC |
Occurrences in COSMIC by primary sites [exact nucleotide change] |
How many times the mutation is observed in COSMIC, grouped by primary sites |
Number of samples in study having the exact nucleotide change |
Number of samples in study having the exact nucleotide change |
MuPIT Link |
If the mutation falls on a known protein structure or a homology model (see here), it can be visualized with MuPIT by clicking the link in this column. |
GeneCards summary |
Information on the gene containing the mutation, pulled from GeneCards |
Number of retrieved articles from PubMed |
Number of the records retrieved in PubMed, using the name of the gene which contains the mutation and "cancer" as keywords. First, the keywords are searched in MeSH terms. If nothing is found, title and abstract of literature are searched. If nothing is still found, the keywords are searched without restriction on their appearance. |
PubMed search term |
Link to the PubMed search result with the mutation's gene name and "cancer" as keywords |
Column |
Meaning |
Input line number |
Line number from the input file |
ID |
Unique ID of a mutation input line |
Chromosome |
Chromosome of the mutation |
Position |
Position of the mutation |
Strand |
DNA strand on which the mutation is on |
Reference base(s) |
Base(s) at the mutation position in the reference genome (hg18 or hg19) |
Alternate base(s) |
Sequence of the mutation |
Sample ID |
ID of the sample from which the mutation was observed |
HUGO symbol |
Gene symbol from HUGO in which the mutation resides |
Sequence ontology |
Sequence Ontology annotation.
See Sequence Ontology section below.
When more than one sequence ontology is found due to multiple transcript mapping,
the most severe consequence is reported, according to the order of
FI, FD, SG, SS, SL, II, ID, CS, MS, and SY.
|
Sequence ontology transcript |
Transcript used to get the most severe
sequence Ontology.
If there are more than one transcript of the most severe sequence ontology,
the longest RefSeq transcript (if not, the longest Ensembl one, or the longest
CCDS one, in this order) is chosen. |
Sequence ontology transcript strand |
The strand (+ or -) of the transcript used to get the sequence ontology |
Protein sequence change |
Protein sequence change for the Sequence ontology column |
Sequence ontology all transcripts |
Sequence ontology for each transcript mapped to the variant position.
An asterisk is assigned to the transcript that was used to get the most severe sequence ontology. |
CHASM cancer driver score transcript |
Transcript used to get the CHASM cancer driver score |
Cancer missense driver score (1 - CHASM score) |
1 - CHASM cancer driver score. Closer to 1 means that the mutation is more likely a cancer driver. |
CHASM cancer driver p-value (missense) |
Empirically-derived p-value of the CHASM cancer driver score. Only missense mutations are considered. |
CHASM cancer driver FDR (missense) |
Benjamini-Hochberg false discovery rate. Only missense mutations are considered. |
Cancer missense driver score of all transcripts |
Cancer missense driver score (1 - CHASM score) and p-value of each transcript that has mapping to the input variant.
Format is Transcript:Protein sequence change(Cancer missense driver score:CHASM cancer driver p-value).
An asterisk is assigned to the transcript that has the highest cancer missense driver score. |
VEST pathogenicity score transcript |
Transcript used to get VEST pathogenicity score |
VEST pathogenicity score (missense) |
VEST pathogenicity score for missense variants |
VEST pathogenicity score (frameshift indels) |
VEST pathogenicity score for frameshift indels |
VEST pathogenicity score (inframe indels) |
VEST pathogenicity score for inframe indels |
VEST pathogenicity score (stop-gain) |
VEST pathogenicity score for stop-gain variants |
VEST pathogenicity score (stop-loss) |
VEST pathogenicity score for stop-loss variants |
VEST pathogenicity score (splice site) |
VEST pathogenicity score for splice site variants |
VEST pathogenicity score and p-value of all transcripts (non-silent) |
VEST pathogenicity score and p-value of each transcript that has mapping to the input variant.
Format is Transcript:Protein sequence change(VEST pathogenicity score:VEST pathogenicity p-value).
An asterisk is assigned to the transcript that has the highest VEST pathogenicity score. |
ESP6500 allele frequency (European American) |
Allele frequency in the European American population,
from ESP6500 |
ESP6500 allele frequency (African American) |
Allele frequency in the African American population,
from ESP6500 |
ExAC allele frequency (Latino) |
ExAC
allele frequency in Latino population |
ExAC allele frequency (African/African American) |
ExAC
allele frequency in African and African American population |
ExAC allele frequency (East Asian) |
ExAC
allele frequency in East Asian population |
ExAC allele frequency (Finnish) |
ExAC
allele frequency in Finnish population |
ExAC allele frequency (Non-Finnish European) |
ExAC
allele frequency in Non-Finnish European population |
ExAC allele frequency (Other) |
ExAC
allele frequency in Other population |
ExAC allele frequency (South Asian) |
ExAC
allele frequency in South Asian population |
Transcript in COSMIC |
COSMIC Transcript that is mapped to the input variant |
Protein sequence change in COSMIC |
Protein sequence change caused by the variant, according to COSMIC |
Occurrences in COSMIC [exact nucleotide change] |
How many times the variant is observed in COSMIC |
Non-coding regions are regions in the genome that are not in a protein coding portion of a gene. This includes UTR, intron, non-coding RNA, and intergenic regions.
Column |
Meaning |
Input line number |
Line number from the input file |
ID |
Unique ID of a mutation input line |
Chromosome |
Chromosome of the mutation |
Position |
Position of the mutation |
Strand |
DNA strand on which the mutation is on |
Reference base(s) |
Base(s) at the mutation position in the reference genome (hg18 or hg19) |
Alternate base(s) |
Sequence of the mutation |
Sample ID |
ID of the sample from which the mutation was observed |
HUGO symbol |
Gene symbol from HUGO in which the mutation resides |
Sequence ontology |
Sequence Ontology annotation.
See Sequence Ontology section below.
When more than one sequence ontology is found due to multiple transcript mapping,
the most severe consequence is reported, according to the order of
FI, FD, SG, SS, SL, II, ID, CS, MS, and SY.
|
QUAL |
Phred-scaled quality score for the assertion made in
the alternate bases.
This column appears only with a VCF-format input. |
FILTER |
PASS if the mutation position passed all filters.
Otherwise, a semicolon-separated list of codes for filters that fail
(e.g. "q10;s50").
This column appears only with a VCF-format input. |
Zygosity |
"hom" or "het" depending on whether the alternate allele is present
on both chromosomes or only one of them, respectively.
This column appears only with a VCF-format input. |
Mappability Warning |
Warning codes for whether the mutation's mapping is reliable or not. See Mappability section below. |
dbSNP |
dbSNP record which has the mutation |
1000 Genomes allele frequency |
Allele frequency from the 1000 Genomes project |
ESP6500 allele frequency (average) |
Average allele frequency from ESP6500 |
ExAC total allele frequency |
Total allele frequency from
ExAC |
Occurrences in COSMIC by primary sites [exact nucleotide change] |
How many times the mutation is observed in COSMIC, grouped by primary sites |
Number of samples in study having the exact nucleotide change |
Number of samples in study having the exact nucleotide change |
Column |
Meaning |
HUGO Symbol |
Gene symbol from HUGO in which the mutation resides |
Sequence ontology |
Sequence Ontology annotation. See Sequence Ontology section below. |
Cancer missense driver score (1-CHASM score) |
Most cancer driving CHASM cancer driver score found in the gene.
The closer to 1, the more cancer driving variant the gene has. |
VEST pathogenicity score (non-silent) |
Most pathogenic VEST pathogenicity score found in the gene.
The closer to 1, the more pathogenic variant the gene has. |
VEST pathogenicity composite p value (non-silent) |
Composite p-value based on Stouffer's Z-score method |
VEST pathogenicity FDR (non-silent) |
Composite FDR based on Stouffer's Z-score method |
Driver Genes |
Cancer driver gene hits (oncogenes and tumor suppressor genes) according to Vogelstein et al. |
TARGET |
TARGET drug association DB hits |
Occurrences in COSMIC [gene mutated] |
How many times any mutation in the gene is observed in COSMIC |
Occurrences in COSMIC by primary sites [gene mutated] |
How many times any mutation in the gene is observed in COSMIC, grouped by primary sites |
Number of samples in study having the gene mutated |
Number of samples in study having the gene mutated |
MuPIT Link |
If the mutations in the gene fall on a known protein structure, they can be visualized with MuPIT by clicking the link in this column. |
GeneCards summary (from http://www.genecards.org) |
Information on the gene containing the mutation, pulled from GeneCards |
Number of retrieved articles from PubMed |
Number of the records retrieved in PubMed, using the name of the gene which contains the mutation and "cancer" as keywords. First, the keywords are searched in MeSH terms. If nothing is found, title and abstract of literature are searched. If nothing is still found, the keywords are searched without restriction on their appearance. |
PubMed search term |
Link to the PubMed search result with the mutation's gene name and "cancer" as keywords |
Column |
Meaning |
Input line number |
Line number from the input file |
ID |
Unique ID of a variant input line |
Chromosome |
Chromosome of the mutation |
Position |
Position of the mutation |
Strand |
DNA strand on which the mutation is on |
Reference base(s) |
Base(s) at the mutation position in the reference genome (hg18 or hg19) |
Alternate base(s) |
Sequence of the mutation |
Sample ID |
ID of the sample from which the mutation was observed |
HUGO Symbol |
Gene symbol from HUGO in which the mutation resides |
Sequence ontology |
Sequence Ontology annotation. See Sequence Ontology section below. |
Sequence ontology transcript |
Transcript used to get the most severe
sequence Ontology.
If there are more than one transcript of the most severe sequence ontology,
the longest RefSeq transcript (if not, the longest Ensembl one, or the longest
CCDS one, in this order) is chosen. |
Protein sequence change |
Position and amino acid changed by the variant, in the representative transcript |
To understand the other columns of the SNVBox analysis result table, please refer to this document for comprehensive explanation.
Code | Meaning |
SY | Synonymous Variant |
SL | Stop Lost |
SG | Stop Gained |
MS | Missense Variant |
II | Inframe Insertion |
FI | Frameshift Insertion |
ID | Inframe Deletion |
FD | Frameshift Deletion |
CS | Complex Substitution |
The source of Sequence Ontology terms is here.
Code | Meaning |
A75 | The hg19 reference genome has more than 1 location with the 75 mer sequence from the query position |
ACR | ACRO1 (Human acromeric satellite) |
ALC | ALR/Alpha |
BSR | Beta satellite repeat/beta |
CAT | (CATTC)n |
CHM | Chromosome M |
CNR | Centromeric Repeat |
GAA | (GAATG)n |
GAG | (GAGTG)n |
HMI | High artifact island |
LMI | Low artifact island |
LSU | Large subunit rRNA Hsa |
snR | Small nuclear RNA |
SSU | Small subunit rRNA Hsa |
STL | Satellite repeat |
TAR | TAR1 |
TII | HSATII (Human satellite II DNA) |
TLM | Telomeric repeat |
The source of the mappability tags is here.
|
|