Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to align SOLiD data?

    Hey all,

    I've only ever looked at Illumina data, and we are being given exome data done with SOLiD sequencing. I am curious as to opinions regarding the optimal way to align this data. I figure once I get it into SAM format I can do what I normally do. Would NovoalignCS be a good option? I do not have a licensed version but I am not in a rush and can probably get by with the free downloadable version. Any other opinions? This is a one time thing to my knowledge so I just want to find something that will work and implement: knowing all of the caveats of each method is not important to me in this case. Thanks a bunch!

  • #2
    If time is not a constraint, and you want something easy, then use NovoalignCS.

    Comment


    • #3
      Originally posted by nilshomer View Post
      If time is not a constraint, and you want something easy, then use NovoalignCS.
      Alright, sounds good. Thanks!

      Comment


      • #4
        Bwa

        BWA is good, also. I get very good results with SOLID data.

        Beware that you must provide the right "color space" parameters for both indexing the genome (the very first step) and during the alignment process. There's also a seperate "csfasta to fastq" step. The BWA package contains a perl script to do this. Beware,, also, that SOLID is "transition based", so when you get a bad nucleotide, the rest of the read is bad. BWA "clips" the read.

        Comment


        • #5
          Alright, alright, now I'm intrigued.

          I had no idea there are color space parameters. Could you (or anyone), provide a link to explain that aspect of this and how I need to incorporate it into the alignment? Also, how to determine it (ie, can I look at the data and figure it out or do I need to contact the people who ran the instrument)?

          Now that I'm curious, might as well do it right. Any links would be appreciated. Is BWA the predominantly used aligner for SOLiD data?

          Comment


          • #6
            Bioscope/Lifescope is the aligner for SOLiD data that produces the most mapped reads (I think about 70-90%). The reads will often be end-clipped to find a match. Other programs seem to be phasing out colour-space support, and have much lower mapping proportions (about 30-60%).

            Comment


            • #7
              Novoalign is still an excellent choice if you had to pick a colourspace aligner.
              If this is not 5500 data, that is where I would start.

              Comment


              • #8
                richardfinney: I have been having trouble getting BWA to perform well on color-space data. It would be most appreciated if you could share the settings that you use.

                Comment


                • #9
                  With NovoalignCS and setting t = 150, I'm uniquely aligning up to 50% of the read sequences, although only about 30-35% are aligning as pairs. I'm probably fine with this, and thanks for the help, everyone. That said, I'm not sure if there's an easy way from looking at the files to see if it's from the 5500, and I googled around for bio/lifescope and could not find a downloadable aligner anywhere.

                  Comment


                  • #10
                    Illumina definitely maps more reads. A quality drop of one base in the middle of the read is still usable with Illumina. Because SOLID is transition based, you are lost if you "miss" a base pair. Bioscope (at least the free old one installed on BIOWULF at NIH) tries to align these anyway and they're often junk; you get these runt reads spread out all over the place. BWA will, I beleive it's called "soft clip", some of these reads; but most it will just assign as unmapped. The unmapped percentage of reads for SOLID BWA versus Illumina BWA is much higher. I'm not an expert and I can't rule out wetlab folks just not being as good as with SOLID samples but I ~suspect~ that the SOLID techniques are just plain trickier. Biocscope goes to great pains to try and hide this situation. BWA is better because it "files the unmapped reads into the- unamapped directory". However, the alignments that are good, look right, and the SNPS discovered make sense and can often enough be verified. It that sense SOLID is quite good and usable, just don't expect 97% mapping of reads.

                    I use no special parameters to BWA. I think they don't affect the results too much. The defaults are fine. If anyone knows better, please let us know hereabouts. The parameters to make BWA work with SOLID are documented in the BWA documentation. BWA also provides a perl script to convert SOLID CSFASTA/QUAL files to fastq for input into BWA. The target (i.e. genome) indexing for COLORSPACE/SOLID is not the same as for non-color space (e.g. Illumina).

                    Comment


                    • #11
                      Originally posted by Richard Finney View Post
                      Illumina definitely maps more reads. A quality drop of one base in the middle of the read is still usable with Illumina. Because SOLID is transition based, you are lost if you "miss" a base pair. Bioscope (at least the free old one installed on BIOWULF at NIH) tries to align these anyway and they're often junk; you get these runt reads spread out all over the place. BWA will, I beleive it's called "soft clip", some of these reads; but most it will just assign as unmapped. The unmapped percentage of reads for SOLID BWA versus Illumina BWA is much higher. I'm not an expert and I can't rule out wetlab folks just not being as good as with SOLID samples but I ~suspect~ that the SOLID techniques are just plain trickier. Biocscope goes to great pains to try and hide this situation. BWA is better because it "files the unmapped reads into the- unamapped directory". However, the alignments that are good, look right, and the SNPS discovered make sense and can often enough be verified. It that sense SOLID is quite good and usable, just don't expect 97% mapping of reads.

                      I use no special parameters to BWA. I think they don't affect the results too much. The defaults are fine. If anyone knows better, please let us know hereabouts. The parameters to make BWA work with SOLID are documented in the BWA documentation. BWA also provides a perl script to convert SOLID CSFASTA/QUAL files to fastq for input into BWA. The target (i.e. genome) indexing for COLORSPACE/SOLID is not the same as for non-color space (e.g. Illumina).
                      3-4 years of solid and people still don't get it. I know colorspace is the minority, but it can be managed relatively easily.
                      %mapping should not be used between platforms. Solid does minimal filtering prior to alignment, while Illumina does generous amounts of filtering. This leads to vastly different mapping percentages.

                      You are not lost if you miss a base as long as you are aligning in colourspace. Translating out of colourspace prior to alignment is not the proper way to handle cs data. Align with known CS tools, then use more common tools for the basespace output.
                      Last edited by Guest; 01-29-2012, 05:20 PM.

                      Comment


                      • #12
                        Thanks. I'm no expert on SOLID/Colorspace and fine with being schooled on this. I'm just reporting my experiences with SOLID on bioscope and BWA. I actually don't know exactly why there are so many hard clipped runt reads using SOLID bioscope; I'm just guessing having stared at it too long. I still think the bias/noise/wackiness whatever with SOLID is manageable and the long reads that do map are good reads. It's is a perfectly usable system and I'm sure the SOLID folks are working at addressing any issues and improving their processes.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM
                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        29 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        31 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 09:21 AM
                        0 responses
                        27 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-04-2024, 09:00 AM
                        0 responses
                        52 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X