Unconfigured Ad

**Philcolson** · 03-05-2013, 11:57 AM

I seem to be in a similar predicament. I am unable to extract some necessary data from MAF files, such as the forward tumor reads and the forward normal read counts.

**m_two** · 03-07-2013, 02:17 PM

VCF files should soon begin to trickle out for many TCGA cases.

You can find them using the "Find Archives" GUI

The Cancer Genome Atlas Program (TCGA)

http://tcga-data.nci.nih.gov/tcga/findArchives.htm

The Cancer Genome Atlas (TCGA) is a landmark cancer genomics program that sequenced and molecularly characterized over 11,000 cases of primary cancer samples. Learn more about how the program transformed the cancer research community and beyond.

or via the protected access data portal.

The Cancer Genome Atlas Program (TCGA)

https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/tcga4yeo/tumor/

The Cancer Genome Atlas (TCGA) is a landmark cancer genomics program that sequenced and molecularly characterized over 11,000 cases of primary cancer samples. Learn more about how the program transformed the cancer research community and beyond.

The DP4 will provide (high quality) strand specific read counts for reads supporting reference and variant alleles:

##FORMAT=<ID=DP4,Number=4,Type=Integer,Description="# high-quality ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">

Here is the latest TCGA VCF filespec:

403 Forbidden

https://wiki.nci.nih.gov/display/TCGA/TCGA+Variant+Call+Format+%28VCF%29+1.1+Specification

**Richard Finney** · 03-07-2013, 02:55 PM

Just want to clarify (if any of this is wrong, please correct me):

The MAF files are (mostly) public. Level 3 MAF files are verified mutations. level 2 may contain unvalidated (source: https://wiki.nci.nih.gov/display/TCG...ion-Validation )

The VCF files are "protected".

The 2.3 MAF columns are

$ head -1 genome.wustl.edu_BRCA.IlluminaGA_DNASeq.Level_2.3.2.0.somatic.maf
#version 2.3
$ head -2 genome.wustl.edu_BRCA.IlluminaGA_DNASeq.Level_2.3.2.0.somatic.maf | grep -v "#" | tr "\t" "\n" | nl
1 Hugo_Symbol
2 Entrez_Gene_Id
3 Center
4 NCBI_Build
5 Chromosome
6 Start_position
7 End_position
8 Strand
9 Variant_Classification
10 Variant_Type
11 Reference_Allele
12 Tumor_Seq_Allele1
13 Tumor_Seq_Allele2
14 dbSNP_RS
15 dbSNP_Val_Status
16 Tumor_Sample_Barcode
17 Matched_Norm_Sample_Barcode
18 Match_Norm_Seq_Allele1
19 Match_Norm_Seq_Allele2
20 Tumor_Validation_Allele1
21 Tumor_Validation_Allele2
22 Match_Norm_Validation_Allele1
23 Match_Norm_Validation_Allele2
24 Verification_Status
25 Validation_Status
26 Mutation_Status
27 Sequencing_Phase
28 Sequence_Source
29 Validation_Method
30 Score
31 BAM_file
32 Sequencer
33 Tumor_Sample_UUID
34 Matched_Norm_Sample_UUID
35 chromosome_name_WU
36 start_WU
37 stop_WU
38 reference_WU
39 variant_WU
40 type_WU
41 gene_name_WU
42 transcript_name_WU
43 transcript_species_WU
44 transcript_source_WU
45 transcript_version_WU
46 strand_WU
47 transcript_status_WU
48 trv_type_WU
49 c_position_WU
50 amino_acid_change_WU
51 ucsc_cons_WU
52 domain_WU
53 all_domains_WU
54 deletion_substructures_WU
55 annotation_errors_WU

VCF files will contain the DP4 (fwd/rev) information and are not in the MAF files.

Easiest way for me to deal with TCGA data warehouse is just spider the site and get the file names into a file for using wget.

**m_two** · 03-07-2013, 03:08 PM

For TCGA there should only be Level 2 MAF files. Level 3 sequence data would involve significantly mutated genes, domains, coding elements, and mutation hotspots.

The official MAF filespec headers can be found at http://goo.gl/6Mv1T.
Only the first 34 columns are defined in the filespec.

TCGA VCF files should contain the AD or DP4 (fwd/rev) information which is not in most MAF files. The exact contents will depend on software support.

Topics	Statistics	Last Post
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, Yesterday, 06:09 AM	0 responses 16 views 0 reactions	Last Post by SEQadmin2 Yesterday, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 37 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 43 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 49 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM

Unconfigured Ad

TCGA data analysis details

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News