Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Split read mapping

    Mosaik does split read mapping for structural variation but does any one know any other program that does split read mapping??
    Thank you.
    From Antwerp
    hi1

  • #2
    Originally posted by bosTau2 View Post
    Mosaik does split read mapping for structural variation but does any one know any other program that does split read mapping??
    Thank you.
    From Antwerp
    hi1
    Split read mapping? Please be more specific.

    Comment


    • #3
      Split read mapping: a read is mapped to two separate locations because of possible structural variation.
      -------- A ----------- break --------------- B -----------------
      |==============||=====================|

      This mapping makes sense for read longer than 50-76 or 454reads with sufficient coverage.
      Split reads should be flagged with 256 in SAM. So any split reads should have a SAM flag greater than 256. So far I have not seen any of split reads.

      Mosaik does this and BC is specialized in this area but the version released does not, I think. I thought ssaha does this but other people told me it does not.
      hi1

      Comment


      • #4
        Hi there,

        We have been using the split read methodology quite a bit with MOSAIK.

        We have a new version out that makes this available to the masses. In addition to MOSAIK, we used some external code for our split-read alignments.

        Briefly, our process is as follows:

        1. Align the reads against a reference sequence, but remember to store the unaligned reads (-rur parameter).

        e.g. "-rur ChrX_unaligned.fq" will store the unaligned reads in the specified fastq file.

        2. Build a new read archive using the unaligned reads from step 1.

        3. We align the reads as normal, but instead of requiring the entire read to align, we specify that we want to align at least X bp of a read (-min X).

        Normally, MOSAIK will count the unaligned portions of the read as mismatches. In this case, this is not what we want - so we deactivate that using the -mmal parameter.

        e.g. If I wanted to align at least 32 bp of a read, I would add "-min 32 -mmal" to my MosaikAligner command line.

        These reads didn't align to the reference for a reason. One of those reasons will be because they align to a non-contiguous span. A good example of this is aligning to the end of one exon to the beginning of another exon.

        3. Using some in-house programs, we take those alignments, trim off the parts that aligned, create a new read archive, and align the reads yet again.

        You can easily do something similar with the MosaikTools C++ or Perl API. Or you could export the reads into some other format and work from there.

        The reads that aligned to two significant regions are prime candidates for split-read structural variations.

        Cheers,

        // Michael

        Comment


        • #5
          You may also try "bwa bwasw" with the default settings. You will see two or more alignments for a chimeric read. However, by default it probably works better for >150-200bp reads. It will miss some hits for shorter reads.

          PS: SAM flag 256 is not for split reads. Actually, SAM does not specify how split reads should be represented. In addition, bwasw identifies chimeric reads, not really split reads. It simply does local alignment. Two non-overlapping pieces on a read can be aligned on different strands or to different chromosomes.
          Last edited by lh3; 10-22-2009, 06:39 PM.

          Comment


          • #6
            Originally posted by lh3 View Post
            You may also try "bwa bwasw" with the default settings. You will see two or more alignments for a chimeric read. However, by default it probably works better for >150-200bp reads. It will miss some hits for shorter reads.

            PS: SAM flag 256 is not for split reads. Actually, SAM does not specify how split reads should be represented. In addition, bwasw identifies chimeric reads, not really split reads. It simply does local alignment. Two non-overlapping pieces on a read can be aligned on different strands or to different chromosomes.
            So, does bwa bwasw (formerly misnamed as bwtsw?) not produce more than one alignment for each chunk of read?
            And, is there a way to force bwasw to apply the mismatch and indel cutoffs to the entire read -- in other words, not identify chimeric reads?

            Comment


            • #7
              BWT-SW is a different software that was published last year by a Hong Kong group. Previously the BWA-SW algorithm was named as dBWT-SW but people complain that it is hard to pronounce.

              Reporting local hits is the right thing for reads longer than 200bp. Long reads are fragile to SVs and misassemblies in the reference. We do not always know if the unaligned part is due to SV/misassembly or to low quality bases. If it is due to SV, forcefully aligning the entire reads will lead to spurious variants; if it is due to low quality bases, discarding them does not do much harm. You may reduce the mismatch/gap penalty to get longer aligned segments based on the error profile of your reads, but forcefully aligning the entire read is not an option.

              Comment


              • #8
                Hi Heng,

                That helps - very good point that assemblies may have chimeric sequence in them, so even if you expect no SV in your reads, local alignments are appropriate for long reads.

                But what about the number of alignments? Does bwasw look for the best local alignment for each chunk of a read, and only report one alignment for each chunk? I.e. is each base of a read involved in only one alignment (and is then clipped out of all other alignments)? Or can one stretch of a read be matched to different locations in the reference, thus appear on different lines of the bwasw output SAM file?

                ~Joe

                Comment


                • #9
                  Thank M and H,
                  Mosaik and BWA split reads will be useful for SV as well as RNA seq in which a read can be mapped in separate locations, I think.
                  Similar to Joe's questions. In Mosaik and BWA, how these spitted reads will be presented in SAM? Also how are the mapping qualities for these reads?

                  Another question:
                  >PS: SAM flag 256 is not for split reads.
                  (from SAMrool) 256 : the alignment is not primary (a read having split hits may have multiple primary alignment records)
                  How do we interpret this if this is not for split read mapping???

                  Mosaik and BWA have very nice features but the manuals do not even mention split read mapping. It will be good to have these feature described in the manuals since it is not so obvious how to use them. Slightly different but PIDEL does split read but it is purely for SV detection.

                  hi1
                  not from Antwerp.

                  Comment


                  • #10
                    BWA does as follows:

                    In BWA-SW, we say two alignments are distinct if the length of the
                    overlapping region on the query is less than half of the length of the
                    shorter query segment. We aim to find a set of distinct alignments which
                    maximizes the sum of scores of each alignment in the set. This problem
                    can be solved by dynamic programming, but as in our case a read is
                    usually aligned entirely, a greedy approximation would work well. In the
                    practical implementation, we sort the local alignments based on their
                    alignment scores, scan the sorted list from the best one and keep an
                    alignment if it is distinct from all the kept alignments with larger
                    scores; if alignment a_2 is rejected because it is not distinctive
                    from a_1, we regard a_2 to be a suboptimal alignment to a_1 and
                    use this information to approximate the mapping quality.

                    A chimeric read will occupy two or more lines in SAM. Effectively identifying chimera and conveniently reporting chimera are important features of bwasw. They are documented in the bwa manual page as well as FAQ on its home page. In practical applications, you just need to use the default option. (Actually bwasw is designed in a way that internal parameters are adjusted automatically based on the input length and the error rate, and therefore the default option works for most inputs with different characteristics).

                    Nonetheless, pindel still has its advantage. An aligner specifically designed for split reads (not chimeric reads in general) is able to identify shorter matches and should achieve higher sensitivity.

                    Comment


                    • #11
                      what are disadvantages of SR(split read) method in sequencing how to avoid it?

                      SR is popular now.But I don't know its distanvages and how to avoid it.I really appreciate it if you can help me solve this problem, thank you!



                      Originally posted by snownebula View Post
                      Hi there,

                      We have been using the split read methodology quite a bit with MOSAIK.

                      We have a new version out that makes this available to the masses. In addition to MOSAIK, we used some external code for our split-read alignments.

                      Briefly, our process is as follows:

                      1. Align the reads against a reference sequence, but remember to store the unaligned reads (-rur parameter).

                      e.g. "-rur ChrX_unaligned.fq" will store the unaligned reads in the specified fastq file.

                      2. Build a new read archive using the unaligned reads from step 1.

                      3. We align the reads as normal, but instead of requiring the entire read to align, we specify that we want to align at least X bp of a read (-min X).

                      Normally, MOSAIK will count the unaligned portions of the read as mismatches. In this case, this is not what we want - so we deactivate that using the -mmal parameter.

                      e.g. If I wanted to align at least 32 bp of a read, I would add "-min 32 -mmal" to my MosaikAligner command line.

                      These reads didn't align to the reference for a reason. One of those reasons will be because they align to a non-contiguous span. A good example of this is aligning to the end of one exon to the beginning of another exon.

                      3. Using some in-house programs, we take those alignments, trim off the parts that aligned, create a new read archive, and align the reads yet again.

                      You can easily do something similar with the MosaikTools C++ or Perl API. Or you could export the reads into some other format and work from there.

                      The reads that aligned to two significant regions are prime candidates for split-read structural variations.

                      Cheers,

                      // Michael

                      Comment


                      • #12
                        Since MosaikText doesn't properly deal with clipping when converting to SAM/BAM format, I wouldn't recommend it for this application. Without soft clipping, you're losing the necessary information to get the portion of the read not included in the alignment. Furthermore, without hard clipping information, you're losing the information to even know that a portion of the read didn't align in the first place. You're going to have to realign every single read to its own reference sequence alignment just to get back the unaligned portion of the read, which seems completely absurd.

                        In this day, with literally hundreds of alignment programs available and a mature standard alignment format available and widely used, I can't see learning an API for a aging alignment program myself. But that's what you're in for if you want to use Mosaik for this task. Just wanted to qualify snownebula's enthusiastic post. SAM/BAM is not really an option with Mosaik for this task, and it took me days to figure this out.

                        Comment


                        • #13
                          Hi there,

                          MOSAIK v2.0 supports soft clipping. The source code can be downloaded here, https://github.com/wanpinglee/MOSAIK.


                          Wan-Ping

                          Comment


                          • #14
                            Originally posted by wanpinglee View Post
                            Hi there,

                            MOSAIK v2.0 supports soft clipping. The source code can be downloaded here, https://github.com/wanpinglee/MOSAIK.


                            Wan-Ping
                            Great news!

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Current Approaches to Protein Sequencing
                              by seqadmin


                              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                              04-04-2024, 04:25 PM
                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 04-11-2024, 12:08 PM
                            0 responses
                            29 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 10:19 PM
                            0 responses
                            32 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 09:21 AM
                            0 responses
                            28 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-04-2024, 09:00 AM
                            0 responses
                            52 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X