Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using SomaticSniper

    Hi

    I am using SomaticSniper to identify single nucleotide positions that are different between tumor and normal. The tools is at http://genome.wustl.edu/software/somaticsniper

    It takes a tumor bam and a normal bam and compares the two to determine the differences. It outputs a file in a format very similar to Samtools consensus format.

    My question is that if a region is given, how to identify it instead of analyze the entire region?
    You know the bam file has a big size.

    Thanks

  • #2
    Hi ardmore,

    This would be a nice thing for us to add to somaticsniper, but, in the meantime, you should be able to get around it by using named pipes.

    Something along the lines of:

    mkfifo normal_region.bam;
    mkfifo tumor_region.bam
    samtools view -u normal.bam chr:start-stop > normal_region.bam &
    samtools view -u tumor.bam chr:start-stop > tumor_region.bam &

    bam-somaticsniper -f your_reference.fa tumor_region.bam normal_region.bam sniper_output

    will work.

    Alternatively, you could just use samtools view and physically extract your region of interest. Admittedly, both methods are cumbersome for multiple regions and it would be better to integrate into the program.

    Comment


    • #3
      When I use the command mkfifo. An error:
      Code:
      mkfifo: cannot create fifo ...bam File exists
      Can I skip this command?
      I mean can I run the samtools commands directly?

      Comment


      • #4
        You can run the samtools commands directly, but you should remove -u and use -b instead and you will need to wait until they finish writing to disk before you launch the somaticsniper command.

        If you can get the named pipes to work, then you will not have to write your region-only bam files to disk and you can just launch somaticsniper immediately. Depending on the size of your region it may be quite a bit slower. On the other hand, using named pipes tends to be a little tricky and much easier to screw up.

        Comment


        • #5
          ernfrid,
          do you think SomaticSniper will be effective if tumor genome is 40x coverage but made up of say, 20% tumor cells and 20% normal(surround fibroblasts, lymphocytes etc) and the normal genome is 40x coverage (from PBMCs)?
          Thanks!
          Robert

          Comment


          • #6
            Hi Robert,

            If your tumor purity is only 20% and we assume that most of the somatic mutations are heterozygous then we're expecting that 10% of the reads should show the variant. SomaticSniper has less than 10% power to detect mutations at such low purities. I would not recommend its use for any sample with purity less than 40% (it should have ~80% power at that purity level). Furthermore, I'm not certain that 40X depth is going to be enough to really detect many mutations at purities as low as you describe.

            Comment


            • #7
              Does anyone know if SomaticSniper can be used with a Unix system?

              Comment


              • #8
                The pre-compiled binary is for Linux, however, the only dependency is samtools so if you can compile samtools on your Unix system then it should be able to compile SomaticSniper from source and get it to run. I confess that we only use Linux and therefore, I have no experience attempting to do this. Current links can be found off of the SomaticSniper Genome Modeling Tools page here: http://gmt.genome.wustl.edu/somatic-sniper/current/

                Comment


                • #9
                  Originally posted by ernfrid View Post
                  Hi Robert,

                  If your tumor purity is only 20% and we assume that most of the somatic mutations are heterozygous then we're expecting that 10% of the reads should show the variant. SomaticSniper has less than 10% power to detect mutations at such low purities. I would not recommend its use for any sample with purity less than 40% (it should have ~80% power at that purity level). Furthermore, I'm not certain that 40X depth is going to be enough to really detect many mutations at purities as low as you describe.
                  The coverage for the samples I have is around 50X, and I have germline and tumor pair, do you think somaticsniper also works in this case? I don't quite understand that "If your tumor purity is only 20% and we assume that most of the somatic mutations are heterozygous", could you explain a bit?

                  Thanks,
                  EL

                  Comment


                  • #10
                    So the answer is that it depends on the purity of your tumor. I'll try to explain what I meant previously.

                    Let's say your tumor purity is 20%. This means only 20% of the cells in the sample are actually from the tumor. If a mutation was homozygous and present in every single cell of the tumor, then you would expect that 20% of your reads would indicate there was a mutation. However, most somatic mutations are likely to be homozygous. Thus, even if they are in every cell in the tumor you would expect no more than half of the reads to contain these mutations. This is where the 10% number came from in my earlier post. Thus the lower your purity the harder it is going to be to detect the mutations, especially if your depth is not high.

                    Many solid tumors have poor purity and it is difficult to know the purity before you begin a project. Pathology can usually provide some estimate although it is not always accurate. If you expect your tumor to be pure then SomaticSniper should do a good job at finding mutations that are present at a high abundance in the tumor sample. It will not do a good job if the purity is very low. Note that the same reasoning/caution applies to mutations that are present in only a fraction of the tumor cells.

                    Comment


                    • #11
                      Thanks ernfrid for your reply!

                      I run SomaticSniper and got the results. I am not very sure the format of the output. should I define it as txt or VCF format? one column is to define somatic score, what's the cut-off for this score? 100 or 50?
                      In the command line, I used "-q 7 -Q 20" option, is that -Q used to define somatic score? When I checked the output file, I found it detected 23,298 SNPs, I don't think they are positives, what parameters should I use to filter those false positives?

                      Thanks again.

                      Comment


                      • #12
                        Does SomaticSniper 1.0.3 detect INDELS?
                        I can it (parameter -q 1 -Q 15 -s 0.0001) on an WGS, and it didn't see a single INDEL.
                        Basically, INDEL will not have the 5th column of a VCF (ALT) to be a single [GCTA] , right?

                        Comment


                        • #13
                          No, as you've observed, SomaticSniper does not detect indels.

                          Comment


                          • #14
                            Hello everybody,

                            I am using SomaticSniper for a few days on WGS paired data (28x and 27x).
                            My tumor sample is almost 100% tumor and my normal sample is around 90% normal.

                            After generating the classic output, I wanted to process the annotation (with Annovar) of my variants.
                            In this classic output, the variant base is not reported, thus it is impossible to annotate the variants (with Annovar at least).
                            In the VCF output, this information is reported but there are less details than in the classic output.
                            Finally, I should generate both outputs to get everything. But as far as I understand, it is impossible to generate both simultaneously....

                            My second question concerns the parameters. I only specified: -q 20 -Q 20.
                            Do you think I am too stringent for the mapping quality?
                            Maybe I should increase the mean base quality to 30, I read people use 40?

                            In the raw output, I get 218507 SNVs.
                            I asked for a difference of 15% at least of variants between the two samples, less than 20% of variants in the normal sample, >=10x in the normal sample and >=6x in the tumor sample.
                            Finally, 55507 variants remain. After the annotation, I will remove the variants reported in dbsnp129 and I will use what remains.

                            Do you have any advice for what I intend to do?

                            Thank you for your help,
                            Jane

                            Comment


                            • #15
                              What kind of information is missing from VCF output? The format column contains almost everything. The mapping quality is ok. Increase this value only when you would like to get smaller number of variations.
                              I think your pipeline is correct.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              18 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              17 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              48 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X