Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Samtools mpileup for Solid Data - calls wrong and too less SNP/INDELS

    Hello,

    i simulated SOLiD-PE Reads with dwgsim and use BWA (0.5-Version) for mapping in Colorspace. When use the mpileup command from Samtools i got no ja wrong and strange output. I use the same Bamfile for other callers like freebayes or gatk and there i got my expected results (round about 18000 Indels/SNP in EXOM)

    I added CS:Z: and CQ:Z: tag and READ-Groups to my samfile because GATK_BQSR need it and it works well. So i think this should not be the problem.

    Here are my Inputs:
    SAM-file TEST_SOLID_5x_header.sam:
    ....
    @SQ SN:chr21 LN:48129895
    @SQ SN:chr22 LN:51304566
    @SQ SN:chrX LN:155270560
    @SQ SN:chrY LN:59373566
    @RG ID:five_fold_test PL:solid PU:test_unit LB:solid_test SM:five_fold_test
    @PG ID:bwa PN:bwa VN:0.5.9-r26-dev
    chr1_110292753_110292997_0_1_0_0_0:0:0_1:0:0_0 97 chr1 110292780 37 73M = 110293024 277 TTTGGGAAAGAGGTAAAATAAATAGGTGGTTACTGGGGAGGCTCCAACACAGCCAGAAGGGACACTGTTTGCT ]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]WY]]]]T]]]]]Z]]]L//////////// RG:Z:five_fold_test XT:A:U CM:i:0 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:73 CS:Z:A210010020022201300033003320110103121000220322010111123012202002111211001320 CQ:Z:FGHFGHHEGGEGGFHBGHBHHH?HHFHBHGFFEHGGCECBADBAC+EDC=74E>AD:7A5@##############

    -> convert to Bam-file -> Sort -> Index

    Samtools mpileup:

    samtools mpileup -uf unmutiert_ucsc_chr1-22XY.cs.fa TEST_SOLID_5x_header_sorted.bam | bcftools view -bvcg - > var.raw.bcf

    I got a breakdown:
    [mpileup] 1 samples in 1 input files
    <mpileup> Set max per-file depth to 8000
    [afs] 0:2107.299 1:27.577 2:29.124

    I expect round about 18000 vcf-entries and samtools call 45_ Example:
    chr1 36931497 . gga g 7.57 . INDEL;DP=3;VDB=0.0295;AF1=0.5336;CI95=0.5,1;DP4=1,0,1,1;MQ=24;FQ=-25.5;PV4=1,0.13,1,0.2 GT:PL:GQ 0/1:44,0,9:13
    chr1 197093832 . tacaca taca 14.4 . INDEL;DP=2;VDB=0.0588;AF1=1;CI95=0.5,1;DP4=0,0,0,2;MQ=29;FQ=-40.5 GT:PL:GQ 1/1:53,6,0:10
    chr1 205131047 . tata t 19.3 . INDEL;DP=2;VDB=0.059

    Any suggestions?

    Best regards

  • #2
    I suspect that BWA is not handling colorspace correctly. You'll notice that they dropped support for it in the most recent versions. While I am not sure about the following, I suspect that what they are outputting is the so-called "double-encoded" version of color space where the colorspace 0=A, 1=C and so on. If so this would obviously make any SNP calling go haywire unless you are using a double-encoded reference in Samtools. Even then I am not sure if you would get good results.

    As I said I could be wrong about the above. But I do suggest using more recent tools that are made to work with native colorspace.

    Comment


    • #3
      Also, now that I look more closely, it appears that your reference file is 'unmutiert_ucsc_chr1-22XY.cs.fa'. What is the format of that file? Color-space? double-encoded? Something else?

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Recent Advances in Sequencing Analysis Tools
        by seqadmin


        The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
        05-06-2024, 07:48 AM
      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 05-10-2024, 06:35 AM
      0 responses
      20 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 05-09-2024, 02:46 PM
      0 responses
      25 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 05-07-2024, 06:57 AM
      0 responses
      21 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 05-06-2024, 07:17 AM
      0 responses
      21 views
      0 likes
      Last Post seqadmin  
      Working...
      X