Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • The best software for mapping SOLiD reads?

    Hi All,

    Could you recommend a software for mapping SOLiD reads?

    I do not care about the running time and splice mapping. My purpose is to map as many reads as possible to the reference genome.

    I tested a few softwares, inlcuding corona, socs, and bfast(with default parameters). It seems that corona can map ~55% reads, while socs maps ~25% reads and bfast maps ~35% reads.

    Could you give me some suggestions?

    Thank you!

    -Cuncong

  • #2
    BFAST is giving me the highest % mapped reads so I am surprised by your numbers. Did you use all ten indexes?

    Comment


    • #3
      Originally posted by cczhong View Post
      Hi All,

      Could you recommend a software for mapping SOLiD reads?

      I do not care about the running time and splice mapping. My purpose is to map as many reads as possible to the reference genome.

      I tested a few softwares, inlcuding corona, socs, and bfast(with default parameters). It seems that corona can map ~55% reads, while socs maps ~25% reads and bfast maps ~35% reads.

      Could you give me some suggestions?

      Thank you!

      -Cuncong
      You should see approximately 55-60% of all bases map with BFAST when mapping human genomic DNA sequenced on SOLiD v2 & v3. As for software for SOLiD (I am the author of BFAST), my other three favorites are:
      BWA
      Mosaik
      MAQ

      Keep checking back every once in a while since new software for SOLiD is always being developed

      Comment


      • #4
        hum... it seems that I have used only one index for BFAST. I will try the whole set of them and report the mapped fraction as soon as I get the result. Thank you!

        Comment


        • #5
          Originally posted by cczhong View Post
          hum... it seems that I have used only one index for BFAST. I will try the whole set of them and report the mapped fraction as soon as I get the result. Thank you!
          Great! What organism are you sequencing? Given the polymorphism rate of the organism (ex. mouse), you may need to up the sensitivity since the ten indexes were designed for human resequencing. I look forward to you results.

          Comment


          • #6
            Compared to corona (50.4), BFAST produced significantly more unique alignments (41 vs 68%, human data, ungapped). If you plan to use it I would suggest to filter reads on quality first since it reports far too many alignments from low quality reads. Filtering by mapping quality helps to some extent, but generally reads with an average QV < 10 are just increasing the noise.

            Comment


            • #7
              The new corona-replacement software from ABI/Lifetech called 'Bioscope' is suppose to give better mapping results. I do not have good numbers from it at the moment but will try to get at least a bfast vs. bioscope comparison done within the week.

              Comment


              • #8
                We just got the Bioscope, it would be great if you can share the results of bfast vs. bioscope.

                Comment


                • #9
                  Mapping software for SOLiD data

                  Hi. Alhtough is quite slow (but much faster if you use the precompiled binaries with intel's compiler and don't recompile yourself from scratch) I would recommend SHRiMP. Local alignment and excellent sensitivity. You will need around 16 Giga for 4 million 50bp reads vs human genome

                  Regards

                  Alessandro



                  Originally posted by cczhong View Post
                  Hi All,

                  Could you recommend a software for mapping SOLiD reads?

                  I do not care about the running time and splice mapping. My purpose is to map as many reads as possible to the reference genome.

                  I tested a few softwares, inlcuding corona, socs, and bfast(with default parameters). It seems that corona can map ~55% reads, while socs maps ~25% reads and bfast maps ~35% reads.

                  Could you give me some suggestions?

                  Thank you!

                  -Cuncong

                  Comment


                  • #10
                    speed of bfast

                    We tried mapping 25bp SOLiD reads against E.Coli genome. The bfast mapping is very very slow (more than 20 times slower than bwa). Perhaps this is caused by masks we have used or other settings. Can anyone provide a time of how many millions SOLiD reads per hour can be mapped on typical CPU (assume we allow upto 3 errors including short indel) using bfast?

                    Comment


                    • #11
                      I've installed and run data using Bioscope v1.0 for the last few months. With the progressive mapping approach, we are receiving significantly better alignment results with an average alignment read length of 40bp from a 50bp run. We have been consistently obtaining 70-80% mapped reads on human and mouse data.

                      I haven't run BFAST much on SOLiD data so can't provide a comment there.

                      I'll have to post a few caveats that I ran into when installing Bioscope. There are a few counter intuitive issues when installing on a decent size cluster (i.e ~500 nodes).
                      Last edited by rdeborja; 05-19-2010, 01:40 PM. Reason: forgot to mention species of reads mapped

                      Comment


                      • #12
                        Originally posted by rdeborja View Post
                        I've installed and run data using Bioscope v1.0 for the last few months. With the progressive mapping approach, we are receiving significantly better alignment results with an average alignment read length of 40bp from a 50bp run. We have been consistently obtaining 70-80% mapped reads on human and mouse data.

                        I haven't run BFAST much on SOLiD data so can't provide a comment there.

                        I'll have to post a few caveats that I ran into when installing Bioscope. There are a few counter intuitive issues when installing on a decent size cluster (i.e ~500 nodes).
                        I was told at the SOLiD User's Summit in September of last year that BioScope used a "seeded extension" method -- which might be characterized as a "progressive mapping approach". But, if only for historical reasons, I suppose it should be distinguished from the "progressive extension" methodology deployed by Global SETs. Since Global SETs was sold, (BioScope is "free" as in "free beer" -- well, as long as you own a SOLiD) and the one person I heard talk about it seemed to have a low opinion of Global SETs -- I might well be accused of being pedantic for even bringing it up.

                        Anyway, I'm with you. We seem to be getting very high levels of mapping using v3+ chemistry and BioScope 1.x. Generally 70-80% with both mate pair and fragment runs on small to moderate complexity genomes. (yeast, rice, chicken). Okay, the rice and chicken used mate pair and the yeast was fragment.

                        Of course, the rub is that validating the mapping positions would be, well, non-trivial.

                        --
                        Phillip

                        Comment


                        • #13
                          Originally posted by ech View Post
                          We tried mapping 25bp SOLiD reads against E.Coli genome. The bfast mapping is very very slow (more than 20 times slower than bwa). Perhaps this is caused by masks we have used or other settings. Can anyone provide a time of how many millions SOLiD reads per hour can be mapped on typical CPU (assume we allow upto 3 errors including short indel) using bfast?
                          Bfast aligns first (look for CALs against the indexes/SA) and then performs SW against those. You cannot specify the number of mismatches directly but you can tweak the SW parameters (I haven't had to tweak them).

                          That being said, you should be able to map 10M reads (50bp -- default human indexes recommend) in about 5 hours against a typical 8 core machine with 32G of RAM. This numbers can change depending on your hardware, particularly the storage system.

                          Yes bwa is faster and generates pretty good alignments for CS data. But Bfast is the only open source aligner I know that truly works in CS natively. In addition, the reported alignments (SAM) come with very useful CS information (namely, original CS calls/quals, CS correction, etc...)

                          I also like the fact you get small indels (up to 20 bp) with BF.

                          Give bfast a try, you won't regret it.
                          -drd

                          Comment


                          • #14
                            Comparing aligners for SOLiD data

                            Originally posted by westerman View Post
                            The new corona-replacement software from ABI/Lifetech called 'Bioscope' is suppose to give better mapping results. I do not have good numbers from it at the moment but will try to get at least a bfast vs. bioscope comparison done within the week.
                            What happend to your comparison? The result would be very interesting to know because noone seems to compare those two tools that both claim to be the best for SOLiD data. BFAST is the only popular aligner I haven't tried yet because noone in my institution has any experience with it.
                            So far, I have used BWA version 0.5.8, BioScope 1.2, and NovoalignCS beta on a set of ~60 Mio. reads from a human transcriptome project (of course not representative, but unbiased). BWA was fastest (3.5 h on 8 CPUs) but only mapped 34% of the reads with default parameters. I will try -l 25 -n 8 as recommended in a thread and see if it gets better. The BioScope WT pipeline was slower (due to merging reads mapped to genome, filters, and splice junctions, in total 10 h on 16 CPUs) and mapped 79% to the genome. NovoalignCS mapped 57% and by far the slowest (took almost a week on 16 CPUs), but it's beta after all.
                            I agree that installing BioScope is complicated and I dislike the fact that it's quite inscrutable how the programs inside work. Using BWA is easy but the conversion into pseudo-colorspace has the huge disadvantage of missing color space sequences in the sam/bam file (there is no CS tag). For NovoalignCS, parameters that work well for test data will have to be adjusted for real world data in order to make it really comparable.
                            Another question is if the number of mapped reads is a good criterion ...
                            Edit:
                            BWA -l 25 -n 8 took 20h and mapped 50%.
                            With the new defaults, NovoalignCS improved speed by almost 50% to 75h but unfortunately, the mapping rate became a bit lower.
                            BFAST took 25h on 8 CPUs and mapped 69% with using the option to output only reads that have a unique best scoring alignment (this underestimates the number of reads that could be mapped, in contrast to BWA, where a random best alignment is output).
                            The only disadvantage for BFAST is big files: for the human genome, I ended up with 10 index files of 12 GB that have to be read into memory for mapping, then there is a temp file for each for storing the matches, each almost 5 GB => a lot of I/O going on, in which our cluster is not that great.
                            After learning that BioScope does a lot of hardclipping and ungapped alignment (see below), I think the winner in the category "gapped aligners for SOLiD" with criteria "highest number of mapped reads" is quite obvious!
                            Last edited by epigen; 07-13-2010, 09:35 AM. Reason: BWA and NovoalignCS improved with changed settings; added BFAST

                            Comment


                            • #15
                              Originally posted by epigen View Post
                              What happend to your comparison? The result would be very interesting to know because noone seems to compare those two tools that both claim to be the best for SOLiD data. BFAST is the only popular aligner I haven't tried yet because noone in my institution has any experience with it.
                              So far, I have used BWA version 0.5.8, BioScope 1.2, and NovoalignCS beta on a set of ~60 Mio. reads from a human transcriptome project (of course not representative, but unbiased). BWA was fastest (3.5 h on 8 CPUs) but only mapped 34% of the reads with default parameters. I will try -l 25 -n 8 as recommended in a thread and see if it gets better. The BioScope WT pipeline was slower (due to merging reads mapped to genome, filters, and splice junctions, in total 10 h on 16 CPUs) and mapped 79% to the genome. NovoalignCS mapped 57% and by far the slowest (took almost a week on 16 CPUs), but it's beta after all.
                              I agree that installing BioScope is complicated and I dislike the fact that it's quite inscrutable how the programs inside work. Using BWA is easy but the conversion into pseudo-colorspace has the huge disadvantage of missing color space sequences in the sam/bam file (there is no CS tag). For NovoalignCS, parameters that work well for test data will have to be adjusted for real world data in order to make it really comparable.
                              Another question is if the number of mapped reads is a good criterion ...
                              (Author of BFAST here :P). Mapped reads really only tells you when something went wrong (low mapping rate), since a high mapping rate can be achieved by aligning everything to chromosome 1 position 1. What matters are the variant calls produced at the end of the day. Unfortunately, this involves post-alignment filtering and then plugging the alignments into variant calling (i.e. many places to go wrong and add bias). I would be happy to help you set up BFAST for your comparison.

                              Some useful links for your own edification:
                              (BWA author) http://lh3lh3.users.sourceforge.net/NGSalign.shtml
                              (BFAST author) http://www.nilshomer.com/index.php?title=NGS_Alignment

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin




                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                Yesterday, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              56 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              45 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              55 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X