Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Comparing exome sequencing data

    Hi folks!

    I recently got my exome sequencing raw data from macrogen. Along with providing the raw data, they have also provided an analysis file in the excel format listing all the SNPs with all the relevant information. I have few questions regarding those files.

    1. How do I filter through the excel file and find the relevant mutations?
    2. For each sample I have an excel file for germline DNA and for tumor DNA. How do I compare the two and find out true somatic mutations?

    I am a newbie, so ain't good with handling raw data. However, in the mean time, I want to see if meaningful information can be extracted from the excel files.

    I will really appreciate any input from the community.

    Thanks,

    Sadiq

  • #2
    Just to add to the above post:

    Each excel file has the following column for each SNV:

    chr_name
    chr_start
    chr_end
    ref_base
    alt_base
    hom_het
    snp_quality
    tot_depth
    alt_depth
    region
    gene
    change
    annotation
    dbSNP135_full
    dbSNP135_common
    1000G_2010Nov_allele_freq
    1000G_2011Oct_allele_freq
    SCS : Clinical Significance
    CLN : Variant is Clinical(LSDB,OMIM,TPA,Diagnostic)
    OMIM : Variant has OMIM ID

    Comment


    • #3
      Originally posted by sadiqsaleem09 View Post
      Hi folks!

      I recently got my exome sequencing raw data from macrogen. Along with providing the raw data, they have also provided an analysis file in the excel format listing all the SNPs with all the relevant information. I have few questions regarding those files.

      1. How do I filter through the excel file and find the relevant mutations?
      2. For each sample I have an excel file for germline DNA and for tumor DNA. How do I compare the two and find out true somatic mutations?
      For point 1) Relevant to what exactly?
      For point 2) I'd go back and ask for the VCF files and then use standard tools such as bedtools or vcf-tools to do the comparisons

      Given that you appear to be doing tumour/normal comparisons were the genotype calls done with software that is designed to handle the biases in tumour cells? (VarScan, SomaticSniper spring to mind).

      If you have the bam files, or mpileup output then you might want to look at approaches specifically designed for interrogating tumour/normal pairs (SomaticSniper, jointSNVmix).

      I'm not really sure Excel files are the best way of handling and interrogating exome data.

      Comment


      • #4
        Thanks Bukowski!!

        Of course Excel is not the best format to deal with but since the other files are so huge to get uploaded, I was thinking if I can analyse these excel files in the meantime.

        1) By relevant, I meant if the SNVs are true mutations or not

        The information manual provided to me by the company mentioned that SamTools were used to detect SNPs and Indels. So I am assuming they must have used pileup and varFilter for that purpose.

        I have the fastq, bam and bai files for each sample (germline/tumor pair).

        I will check out the tools that you have mentioned to specifically interrogate the pairs. How are these tools different from bedools or vcf-tools?

        Comment


        • #5
          Thanks Bukowski!!

          Of course Excel is not the best format to deal with but since the other files are so huge to get uploaded, I was thinking if I can analyse these excel files in the meantime.

          1) By relevant, I meant if the SNVs are true mutations or not

          The information manual provided to me by the company mentioned that SamTools were used to detect SNPs and Indels. So I am assuming they must have used pileup and varFilter for that purpose.

          I have the fastq, bam and bai files for each sample (germline/tumor pair).

          I will check out the tools that you have mentioned to specifically interrogate the pairs. How are these tools different from bedools or vcf-tools?

          Comment


          • #6
            If you are looking to get the intersecting, or non-intersecting SNPs between control and normal, vcftools or BEDtools can do that. The good news is that your file is fairly close to bed format already, so you could try bedtools intersectBed

            Comment


            • #7
              Thank you swbarnes2..

              Are there any Windows based open-source platforms/softwares to use these tools?
              Thanks

              Comment


              • #8
                Thank you swbarnes2..

                Are there any Windows based open-source platforms/softwares to use these tools?
                Thanks

                Comment


                • #9
                  Originally posted by sadiqsaleem09 View Post
                  Thanks Bukowski!!

                  Of course Excel is not the best format to deal with but since the other files are so huge to get uploaded, I was thinking if I can analyse these excel files in the meantime.

                  1) By relevant, I meant if the SNVs are true mutations or not
                  Define 'true' I mean the question is still very open, a 'true' mutation I would think is one that you have gone back and verified via another genotyping mechanism.

                  'Genuine somatic mutations' would be verifiable changes between your tumour/normal pairs

                  'Causative mutations' would be somatic changes presumably driving the tumour phenotype.

                  Assuming you're talking about the latter (again I'm a bioinformatician not a biologist) I guess you're looking for mutations in known cancer genes - perhaps compare against a database such as COSMIC?

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  22 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  24 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  20 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  52 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X