Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • strange mapping results bwa + SOLiD

    Hi,
    I used bwa to map a pair-end SOLiD data. samtools flagstat showed 66% mapped reads, but only 0.17% properlly mapped.
    Here are some detail and my cmd lines:
    my input file: SOLID_F3.csfasta + SOLID_F3_QV.qual
    SOLID_R3.csfasta + SOLID_R3_QV.qual
    (1) make color-space reference: bwa index -p hg18_cs.fa -a bwtsw -c hg18.fa
    (2) convert solid to fastq: perl solid2fastq.pl SOLID_ test
    this resulted test.read1.fastq.gz test.read2.fastq.gz test.single.fastq.gz
    unzip them
    (3) bwa aln -t 4 -c hg18_cs.fa test.read1.fastq > test_aln_sa1.sai
    bwa aln -t 4 -c hg18_cs.fa test.read2.fastq > test_aln_sa2.sai
    (4) bwa sampe hg18_cs.fa test_aln_sa1.sai test_aln_sa2.sai test.read1.fastq test.read2.fastq > aln.sam
    (5) samtools view -bt hg18_cs.fa -o aln.bam aln.sam
    [samopen] SAM header is present: 25 sequences.

    (6) $ samtools flagstat aln.bam
    The resutl shows mapped (66.64%), properly paired (0.15%)

    Can anyone tell me where I did wrong?

    Thanks!

  • #2
    Are you using the paired end or mate pair protocol? Was the insert size estimate from BWA around what you expect?

    Comment


    • #3
      Hi Nil,

      The data is paired end, one is 50bp while the other is 25bp. The original file name is actually F3 and F5-P2: I changed F5-P2 to R3 to make it recognizable by solid2fastq.pl provided in bwa. The insert size looks fine for me. Here are some example lines (when using ):
      [infer_isize] (25, 50, 75) percentile: (150, 160, 173)
      [infer_isize] low and high boundaries: 104 and 219 for estimating avg and std
      [infer_isize] inferred external isize from 100442 pairs: 161.585 +/- 17.497
      [infer_isize] skewness: 0.147; kurtosis: 0.337; ap_prior: 1.00e-05
      [infer_isize] inferred maximum insert size: 285 (7.05 sigma)
      [bwa_sai2sam_pe_core] time elapses: 22.68 sec
      [bwa_sai2sam_pe_core] changing coordinates of 73 alignments.
      [bwa_sai2sam_pe_core] align unmapped mate...
      [bwa_paired_sw] 20 out of 79202 Q17 singletons are mated.
      [bwa_paired_sw] 316 out of 125430 Q17 discordant pairs are fixed.
      [bwa_sai2sam_pe_core] time elapses: 14.42 sec
      [bwa_sai2sam_pe_core] refine gapped alignments... 4.12 sec
      [bwa_sai2sam_pe_core] print alignments... 1.31 sec
      [bwa_sai2sam_pe_core] 89915392 sequences have been processed.
      [bwa_sai2sam_pe_core] convert to sequence coordinate...
      [infer_isize] (25, 50, 75) percentile: (150, 160, 173)
      [infer_isize] low and high boundaries: 104 and 219 for estimating avg and std
      [infer_isize] inferred external isize from 97115 pairs: 161.534 +/- 17.614
      [infer_isize] skewness: 0.137; kurtosis: 0.339; ap_prior: 1.00e-05
      [infer_isize] inferred maximum insert size: 286 (7.05 sigma)

      Comment


      • #4
        Originally posted by Hit View Post
        Hi Nil,

        The data is paired end, one is 50bp while the other is 25bp. The original file name is actually F3 and F5-P2: I changed F5-P2 to R3 to make it recognizable by solid2fastq.pl provided in bwa. The insert size looks fine for me. Here are some example lines (when using ):
        [infer_isize] (25, 50, 75) percentile: (150, 160, 173)
        [infer_isize] low and high boundaries: 104 and 219 for estimating avg and std
        [infer_isize] inferred external isize from 100442 pairs: 161.585 +/- 17.497
        [infer_isize] skewness: 0.147; kurtosis: 0.337; ap_prior: 1.00e-05
        [infer_isize] inferred maximum insert size: 285 (7.05 sigma)
        [bwa_sai2sam_pe_core] time elapses: 22.68 sec
        [bwa_sai2sam_pe_core] changing coordinates of 73 alignments.
        [bwa_sai2sam_pe_core] align unmapped mate...
        [bwa_paired_sw] 20 out of 79202 Q17 singletons are mated.
        [bwa_paired_sw] 316 out of 125430 Q17 discordant pairs are fixed.
        [bwa_sai2sam_pe_core] time elapses: 14.42 sec
        [bwa_sai2sam_pe_core] refine gapped alignments... 4.12 sec
        [bwa_sai2sam_pe_core] print alignments... 1.31 sec
        [bwa_sai2sam_pe_core] 89915392 sequences have been processed.
        [bwa_sai2sam_pe_core] convert to sequence coordinate...
        [infer_isize] (25, 50, 75) percentile: (150, 160, 173)
        [infer_isize] low and high boundaries: 104 and 219 for estimating avg and std
        [infer_isize] inferred external isize from 97115 pairs: 161.534 +/- 17.614
        [infer_isize] skewness: 0.137; kurtosis: 0.339; ap_prior: 1.00e-05
        [infer_isize] inferred maximum insert size: 286 (7.05 sigma)
        I am not sure if BWA supports the SOLiD paired end protocol, since it assumes that the two ends are on opposite strands, which is not the case for paired end. That is why it is getting upset. You could email the BWA mailing list, PM the user "lh3" (the author of BWA), or how "lh3" reads this.

        Comment


        • #5
          Check out what Nils mentioned to confirm if that's the reason. Let us know what
          you find.

          Also, I'd like to mention that BFast has a development branch called bfast2. It
          allows you to perform the mapping step (match) using bwa's algorithm. After
          that you'd perform the normal bfast process (localalign, postprocess).

          In addition, bfast2 has also a hybrid mode that allows you to perform matches
          using different algorithms on each end of your reads. This is particularly
          useful for PE(50+25) since bfast match does not perform well with 25bp reads.
          -drd

          Comment


          • #6
            Thank you, Nils and Drio.

            I searched online and got the following information:

            As explained in this file:
            Thermo Fisher Scientific enables our customers to make the world healthier, cleaner and safer. Delivering technology, pharmaceutical and biotechnology services.


            my case should be the second one, where F3 tag is forward and F5-P2 tag is reverse.

            However, I saw this post in the BWA mailing list:


            In Heng's reply, it's said "For Illumina reads, only Forward-reverse pairs are set bit 2; for SOLiD reads, R3-forward-F3-forward or F3-reverse-R3-reverse pairs are set bit 2."

            So I'm guessing my case is: it is solid data but it's forward-reverse, so bwa correctly mapped most of them (66%) but failed to recognize the reads as perporly paired -- this is just my guessing. I would greatly appreciate if anyone can help me figure this out.

            Comment


            • #7
              Originally posted by Hit View Post
              Thank you, Nils and Drio.

              I searched online and got the following information:

              As explained in this file:
              Thermo Fisher Scientific enables our customers to make the world healthier, cleaner and safer. Delivering technology, pharmaceutical and biotechnology services.


              my case should be the second one, where F3 tag is forward and F5-P2 tag is reverse.

              However, I saw this post in the BWA mailing list:


              In Heng's reply, it's said "For Illumina reads, only Forward-reverse pairs are set bit 2; for SOLiD reads, R3-forward-F3-forward or F3-reverse-R3-reverse pairs are set bit 2."

              So I'm guessing my case is: it is solid data but it's forward-reverse, so bwa correctly mapped most of them (66%) but failed to recognize the reads as perporly paired -- this is just my guessing. I would greatly appreciate if anyone can help me figure this out.
              Note that the read-rescue code in "bwa sampe" depends on the correct orientation, so it is not just the proper-pair bit being different.

              Comment


              • #8
                Hi All,
                Can anybody tell me how to run the script solid2fastq.pl.

                I have a cs.fasta file, I have to calls the SNPs and want to use MAQ for that. However I need the files in fastq. Using BWA I did get a colour space index. But after that if i try to run
                perl solid2fastq.pl file.sc.fasta out.cs.fa it is not working
                the cs.fasta is a F3 file.
                It tell me it cannot open the file.

                Many thanks

                Comment


                • #9
                  Originally posted by tamara_0021 View Post
                  Hi All,
                  Can anybody tell me how to run the script solid2fastq.pl.

                  I have a cs.fasta file, I have to calls the SNPs and want to use MAQ for that. However I need the files in fastq. Using BWA I did get a colour space index. But after that if i try to run
                  perl solid2fastq.pl file.sc.fasta out.cs.fa it is not working
                  the cs.fasta is a F3 file.
                  It tell me it cannot open the file.

                  Many thanks
                  Check it here:
                  Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

                  drio's answer.

                  Comment


                  • #10
                    ok, thanks, I will try.... but tell me what does this instruction means
                    <in.title> is the string showed in the `# Title:' line of a
                    ".csfasta" read file. Then <in.title>F3.csfasta is read sequence
                    file and <in.title>F3_QV.qual is the quality file. If
                    <in.title>R3.csfasta is present, this script assumes reads are
                    paired; otherwise reads will be regarded as single-end.

                    The read name will be <out.prefix>anel_x_y/[12] with `1' for R3
                    tag and `2' for F3. Usually you may want to use short <out.prefix>


                    I want to know what is it actually asking ?

                    Many Thanks for the link

                    Comment


                    • #11
                      I have posted a patch plus an update perl script to the bwa malinglist so bwa now can handle PE data as well as MP.

                      Comment


                      • #12
                        Hi Brugger,
                        I saw your mail in the bwa mailing list about your patch.
                        Do you know if the later version of BWA incorporates this?
                        What I have installed is 0.5.9-r16

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM
                        • seqadmin
                          Techniques and Challenges in Conservation Genomics
                          by seqadmin



                          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                          Avian Conservation
                          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                          03-08-2024, 10:41 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, Yesterday, 06:37 PM
                        0 responses
                        11 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, Yesterday, 06:07 PM
                        0 responses
                        10 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-22-2024, 10:03 AM
                        0 responses
                        51 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-21-2024, 07:32 AM
                        0 responses
                        68 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X