Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Samtools mpileup for Solid Data - calls wrong and too less SNP/INDELS

    Hello,

    i simulated SOLiD-PE Reads with dwgsim and use BWA (0.5-Version) for mapping in Colorspace. When use the mpileup command from Samtools i got no ja wrong and strange output. I use the same Bamfile for other callers like freebayes or gatk and there i got my expected results (round about 18000 Indels/SNP in EXOM)

    I added CS:Z: and CQ:Z: tag and READ-Groups to my samfile because GATK_BQSR need it and it works well. So i think this should not be the problem.

    Here are my Inputs:
    SAM-file TEST_SOLID_5x_header.sam:
    ....
    @SQ SN:chr21 LN:48129895
    @SQ SN:chr22 LN:51304566
    @SQ SN:chrX LN:155270560
    @SQ SN:chrY LN:59373566
    @RG ID:five_fold_test PL:solid PU:test_unit LB:solid_test SM:five_fold_test
    @PG ID:bwa PN:bwa VN:0.5.9-r26-dev
    chr1_110292753_110292997_0_1_0_0_0:0:0_1:0:0_0 97 chr1 110292780 37 73M = 110293024 277 TTTGGGAAAGAGGTAAAATAAATAGGTGGTTACTGGGGAGGCTCCAACACAGCCAGAAGGGACACTGTTTGCT ]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]WY]]]]T]]]]]Z]]]L//////////// RG:Z:five_fold_test XT:A:U CM:i:0 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:73 CS:Z:A210010020022201300033003320110103121000220322010111123012202002111211001320 CQ:Z:FGHFGHHEGGEGGFHBGHBHHH?HHFHBHGFFEHGGCECBADBAC+EDC=74E>AD:7A5@##############

    -> convert to Bam-file -> Sort -> Index

    Samtools mpileup:

    samtools mpileup -uf unmutiert_ucsc_chr1-22XY.cs.fa TEST_SOLID_5x_header_sorted.bam | bcftools view -bvcg - > var.raw.bcf

    I got a breakdown:
    [mpileup] 1 samples in 1 input files
    <mpileup> Set max per-file depth to 8000
    [afs] 0:2107.299 1:27.577 2:29.124

    I expect round about 18000 vcf-entries and samtools call 45_ Example:
    chr1 36931497 . gga g 7.57 . INDEL;DP=3;VDB=0.0295;AF1=0.5336;CI95=0.5,1;DP4=1,0,1,1;MQ=24;FQ=-25.5;PV4=1,0.13,1,0.2 GT:PL:GQ 0/1:44,0,9:13
    chr1 197093832 . tacaca taca 14.4 . INDEL;DP=2;VDB=0.0588;AF1=1;CI95=0.5,1;DP4=0,0,0,2;MQ=29;FQ=-40.5 GT:PL:GQ 1/1:53,6,0:10
    chr1 205131047 . tata t 19.3 . INDEL;DP=2;VDB=0.059

    Any suggestions?

    Best regards

  • #2
    I suspect that BWA is not handling colorspace correctly. You'll notice that they dropped support for it in the most recent versions. While I am not sure about the following, I suspect that what they are outputting is the so-called "double-encoded" version of color space where the colorspace 0=A, 1=C and so on. If so this would obviously make any SNP calling go haywire unless you are using a double-encoded reference in Samtools. Even then I am not sure if you would get good results.

    As I said I could be wrong about the above. But I do suggest using more recent tools that are made to work with native colorspace.

    Comment


    • #3
      Also, now that I look more closely, it appears that your reference file is 'unmutiert_ucsc_chr1-22XY.cs.fa'. What is the format of that file? Color-space? double-encoded? Something else?

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      27 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      30 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      26 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      52 views
      0 likes
      Last Post seqadmin  
      Working...
      X