Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Nop, I calculate score as matches - mismatches - gapbases

    Comment


    • #17
      also if I can venture a suggestion, when I implemented the blat mapq value using S1 - S2, I noticed a big effect on the MinMapQ the MapQ threshold needed to achieve 99% specificity, as you have longer reads you need a lower thresholds, but also as your error rate decreases you need a lower threshold as well. I've been wondering that instead of using the S1 to normalize the value, if you could normalize by some sort of combination between the error rate and the read length. I think you can better approximate MapQ by combining these 2 components rather than trying to summarize them in S1.
      Last edited by aleferna; 08-18-2010, 07:20 AM.

      Comment


      • #18
        Here's the behavior of a simple Blat MapQ value
        Attached Files

        Comment


        • #19
          I gradually recall the decision on choosing the parameters for blat. My focus was more on >=500bp reads. And for these reads, blat -fastMap is similar to blat deault in accuracy but tens of times faster. However, for shorter reads which you are focusing on, blat default is much more accurate than blat -fastMap (still much slower, though). Your table would largely agree with mine for blat default.

          Actually for 454, I would highly recommend ssaha2. Ssaha2 is designed for mapping sequencing data and calling SNPs from the first day and has been thoroughly validated. Blat, although being one of the best tools for mapping ESTs, is not for SNP finding initially and is not heavily evaluated. From what I have heard, blat does not refine the final alignment, which may make gaps positioned suboptimally and pose problems to indel finding. The default blat mode is also much slower and less accurate than ssaha2. In my view, it is a common mistake to overlook the superiority of ssaha2 for longer reads. The 1000 genomes project chooses every program for a reason.
          Last edited by lh3; 08-18-2010, 07:58 AM.

          Comment


          • #20
            @Adamo

            Sorry if I'm spamming you, I don't understand how the private messages work here. Send me a message to afer at kth.se I can send you the script to joing the 2 BWA files if you still want to use bwa.

            Comment


            • #21
              @Heng Li

              Well I don't care too much about SNPs, actually what I work with resembles more chip-seq technology. All I need to know is the position, not the alignment. I like BWA because I need to work with both 454 and HiSeq, and compare them, so I prefer BWA because seems to be able to manage both. Does Ssaha2 manage high throughput?

              Comment


              • #22
                Originally posted by aleferna View Post
                @Heng Li

                Well I don't care too much about SNPs, actually what I work with resembles more chip-seq technology. All I need to know is the position, not the alignment. I like BWA because I need to work with both 454 and HiSeq, and compare them, so I prefer BWA because seems to be able to manage both. Does Ssaha2 manage high throughput?
                Ssaha2 is designed for high throughput sequencing. As I said, it is usually faster than blat, although less easy to use, I would say.

                Comment


                • #23
                  Originally posted by query View Post
                  What is the best tool available to map 454 reads to a reference genome? What is the method used by gs reference Mapper (analysis tool that comes with 454) and does it do a decent job of mapping and identifying variants?
                  You may wish to try the mapper in NextGEne it is especially robust for the detection of indels using a 3 step process...you can obtain a free time limited trial on the softgenetics web site.

                  Comment


                  • #24
                    @Adamo

                    Here is the script that I've been using. DISCLAIMER: I made this for my own data and it has not been tested on regular sequence data, so please read the code make sure you understand what the script does before using it. It is tuned to join BWASW Z 100 with ALN N 4 sam files.

                    Also, its a python script but the system wouldn't upload it with extention .py.
                    Attached Files

                    Comment


                    • #25
                      Originally posted by aleferna View Post
                      @Adamo

                      Here is the script that I've been using. DISCLAIMER: I made this for my own data and it has not been tested on regular sequence data, so please read the code make sure you understand what the script does before using it. It is tuned to join BWASW Z 100 with ALN N 4 sam files.

                      Also, its a python script but the system wouldn't upload it with extention .py.
                      Ok, I didn't notice you'd posted here!
                      Thanks a lot, I'm gonna see what's in it now.

                      Comment


                      • #26
                        Instead of using Z=100 on the whole data set, it might be a better (meaning faster) idea to first align the data set with Z=1 (default value) and then realign the ones that do not satisfy your alignment criteria with a higher value for Z. This should speed up the process if you assume that a high number of the reads will map to the reference.

                        Comment


                        • #27
                          Originally posted by aleferna View Post
                          The first time I ran BWA with the long aligner I didn't realize that there was a short/long option and since I have both in my library I was very disappointed of BWA. I started testing algorithm after algorithm and finally reviewed BWA again. This time I made a small script that will just join 2 sam files, one for the small aligner and one from the long aligner. It will choose the alignment from the short aligner if it cannot find it in the long aligner, this was the winning combination.

                          I've mentioned this chart in another thread, but here you can see that BWA is the only one that can cover the full range of read sizes in 454 datasets (or in 100bp solexa data after you remove the pair end adapters!)



                          Moreover, I know using the Z=100 seems a bit of an overkill but with 454 data and a decent computer BWA will take just a few minutes and I did measure Z=1,10,25,50,100,250 and even 500. Z = 100 seems to be the peak, after this I cannot squeeze any specificity out of the algorithm, but you do see a change from Z=10 to Z=100.
                          Looking at your chart, you actually get better sensitivity for longer reads with low error rates using the default settings instead of using Z=100. Any idea what causes a higher Z-best value to result in lower sensitivity?

                          Comment


                          • #28
                            Originally posted by lh3 View Post
                            Ssaha2 is designed for high throughput sequencing. As I said, it is usually faster than blat, although less easy to use, I would say.
                            Actually, I couldn't install in ubuntu. After extraction, I could see the files (read me, ssaha2, ssaha2build, ssaha snp). However, after put the command into terminal, it told me that command can't found. This bothers me for a week.

                            My RNA-seq data is not for a species that genome is sequenced but zebrafish genome maybe suitable for these sample are fishes which are close relative of zebrafish. The goal is to analysis SNP and recombination in hybirds and their parents. Is there any guys have idea?

                            Really appreciate for you guys!
                            Last edited by boyzoe; 08-23-2010, 07:35 AM.

                            Comment


                            • #29
                              Originally posted by boyzoe View Post
                              Actually, I couldn't install in ubuntu. After extraction, I could see the files (read me, ssaha2, ssaha2build, ssaha snp). However, after put the command into terminal, it told me that command can't found. This bothers me for a week.
                              Try:

                              ./ssaha

                              (assuming the file is in the current directory, indicated by the dot in Unix). If you tried this:

                              ssaha

                              it would look for an installed copy of ssaha on the system path - but it would not try the current directory. At least, that is how recent versions of Ubuntu are configured.

                              Comment


                              • #30
                                Originally posted by robs View Post
                                Looking at your chart, you actually get better sensitivity for longer reads with low error rates using the default settings instead of using Z=100. Any idea what causes a higher Z-best value to result in lower sensitivity?
                                @rob

                                you mean like 200bp 0% error? where Z100 is 97.29% and default is 97.30%??

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM
                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 03-27-2024, 06:37 PM
                                0 responses
                                13 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-27-2024, 06:07 PM
                                0 responses
                                11 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-22-2024, 10:03 AM
                                0 responses
                                53 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-21-2024, 07:32 AM
                                0 responses
                                69 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X