Seqanswers Leaderboard Ad

**nilshomer** · 08-07-2009, 08:44 AM

Originally posted by bosTau2 View Post

Mosaik does split read mapping for structural variation but does any one know any other program that does split read mapping??
Thank you.
From Antwerp
hi1

Split read mapping? Please be more specific.

**bosTau2** · 08-07-2009, 09:47 AM

Split read mapping: a read is mapped to two separate locations because of possible structural variation.
-------- A ----------- break --------------- B -----------------
|==============||=====================|

This mapping makes sense for read longer than 50-76 or 454reads with sufficient coverage.
Split reads should be flagged with 256 in SAM. So any split reads should have a SAM flag greater than 256. So far I have not seen any of split reads.

Mosaik does this and BC is specialized in this area but the version released does not, I think. I thought ssaha does this but other people told me it does not.
hi1

**snownebula** · 10-22-2009, 06:10 PM

Hi there,

We have been using the split read methodology quite a bit with MOSAIK.

We have a new version out that makes this available to the masses. In addition to MOSAIK, we used some external code for our split-read alignments.

Briefly, our process is as follows:

1. Align the reads against a reference sequence, but remember to store the unaligned reads (-rur parameter).

e.g. "-rur ChrX_unaligned.fq" will store the unaligned reads in the specified fastq file.

2. Build a new read archive using the unaligned reads from step 1.

3. We align the reads as normal, but instead of requiring the entire read to align, we specify that we want to align at least X bp of a read (-min X).

Normally, MOSAIK will count the unaligned portions of the read as mismatches. In this case, this is not what we want - so we deactivate that using the -mmal parameter.

e.g. If I wanted to align at least 32 bp of a read, I would add "-min 32 -mmal" to my MosaikAligner command line.

These reads didn't align to the reference for a reason. One of those reasons will be because they align to a non-contiguous span. A good example of this is aligning to the end of one exon to the beginning of another exon.

3. Using some in-house programs, we take those alignments, trim off the parts that aligned, create a new read archive, and align the reads yet again.

You can easily do something similar with the MosaikTools C++ or Perl API. Or you could export the reads into some other format and work from there.

The reads that aligned to two significant regions are prime candidates for split-read structural variations.

Cheers,

// Michael

**lh3** · 10-22-2009, 06:33 PM

You may also try "bwa bwasw" with the default settings. You will see two or more alignments for a chimeric read. However, by default it probably works better for >150-200bp reads. It will miss some hits for shorter reads.

PS: SAM flag 256 is not for split reads. Actually, SAM does not specify how split reads should be represented. In addition, bwasw identifies chimeric reads, not really split reads. It simply does local alignment. Two non-overlapping pieces on a read can be aligned on different strands or to different chromosomes.

**jnfass** · 10-23-2009, 12:32 PM

Originally posted by lh3 View Post

You may also try "bwa bwasw" with the default settings. You will see two or more alignments for a chimeric read. However, by default it probably works better for >150-200bp reads. It will miss some hits for shorter reads.

PS: SAM flag 256 is not for split reads. Actually, SAM does not specify how split reads should be represented. In addition, bwasw identifies chimeric reads, not really split reads. It simply does local alignment. Two non-overlapping pieces on a read can be aligned on different strands or to different chromosomes.

So, does bwa bwasw (formerly misnamed as bwtsw?) not produce more than one alignment for each chunk of read?
And, is there a way to force bwasw to apply the mismatch and indel cutoffs to the entire read -- in other words, not identify chimeric reads?

**lh3** · 10-23-2009, 12:48 PM

BWT-SW is a different software that was published last year by a Hong Kong group. Previously the BWA-SW algorithm was named as dBWT-SW but people complain that it is hard to pronounce.

Reporting local hits is the right thing for reads longer than 200bp. Long reads are fragile to SVs and misassemblies in the reference. We do not always know if the unaligned part is due to SV/misassembly or to low quality bases. If it is due to SV, forcefully aligning the entire reads will lead to spurious variants; if it is due to low quality bases, discarding them does not do much harm. You may reduce the mismatch/gap penalty to get longer aligned segments based on the error profile of your reads, but forcefully aligning the entire read is not an option.

**jnfass** · 10-23-2009, 01:38 PM

Hi Heng,

That helps - very good point that assemblies may have chimeric sequence in them, so even if you expect no SV in your reads, local alignments are appropriate for long reads.

But what about the number of alignments? Does bwasw look for the best local alignment for each chunk of a read, and only report one alignment for each chunk? I.e. is each base of a read involved in only one alignment (and is then clipped out of all other alignments)? Or can one stretch of a read be matched to different locations in the reference, thus appear on different lines of the bwasw output SAM file?

~Joe

**bosTau2** · 10-23-2009, 02:19 PM

Thank M and H,
Mosaik and BWA split reads will be useful for SV as well as RNA seq in which a read can be mapped in separate locations, I think.
Similar to Joe's questions. In Mosaik and BWA, how these spitted reads will be presented in SAM? Also how are the mapping qualities for these reads?

Another question:
>PS: SAM flag 256 is not for split reads.
(from SAMrool) 256 : the alignment is not primary (a read having split hits may have multiple primary alignment records)
How do we interpret this if this is not for split read mapping???

Mosaik and BWA have very nice features but the manuals do not even mention split read mapping. It will be good to have these feature described in the manuals since it is not so obvious how to use them. Slightly different but PIDEL does split read but it is purely for SV detection.

hi1
not from Antwerp.

**lh3** · 10-23-2009, 04:14 PM

BWA does as follows:

In BWA-SW, we say two alignments are distinct if the length of the
overlapping region on the query is less than half of the length of the
shorter query segment. We aim to find a set of distinct alignments which
maximizes the sum of scores of each alignment in the set. This problem
can be solved by dynamic programming, but as in our case a read is
usually aligned entirely, a greedy approximation would work well. In the
practical implementation, we sort the local alignments based on their
alignment scores, scan the sorted list from the best one and keep an
alignment if it is distinct from all the kept alignments with larger
scores; if alignment a_2 is rejected because it is not distinctive
from a_1, we regard a_2 to be a suboptimal alignment to a_1 and
use this information to approximate the mapping quality.

A chimeric read will occupy two or more lines in SAM. Effectively identifying chimera and conveniently reporting chimera are important features of bwasw. They are documented in the bwa manual page as well as FAQ on its home page. In practical applications, you just need to use the default option. (Actually bwasw is designed in a way that internal parameters are adjusted automatically based on the input length and the error rate, and therefore the default option works for most inputs with different characteristics).

Nonetheless, pindel still has its advantage. An aligner specifically designed for split reads (not chimeric reads in general) is able to identify shorter matches and should achieve higher sensitivity.

**hada** · 06-10-2010, 04:10 AM

what are disadvantages of SR(split read) method in sequencing how to avoid it?

SR is popular now.But I don't know its distanvages and how to avoid it.I really appreciate it if you can help me solve this problem, thank you!

Originally posted by snownebula View Post

Hi there,

We have been using the split read methodology quite a bit with MOSAIK.

We have a new version out that makes this available to the masses. In addition to MOSAIK, we used some external code for our split-read alignments.

Briefly, our process is as follows:

1. Align the reads against a reference sequence, but remember to store the unaligned reads (-rur parameter).

e.g. "-rur ChrX_unaligned.fq" will store the unaligned reads in the specified fastq file.

2. Build a new read archive using the unaligned reads from step 1.

3. We align the reads as normal, but instead of requiring the entire read to align, we specify that we want to align at least X bp of a read (-min X).

Normally, MOSAIK will count the unaligned portions of the read as mismatches. In this case, this is not what we want - so we deactivate that using the -mmal parameter.

e.g. If I wanted to align at least 32 bp of a read, I would add "-min 32 -mmal" to my MosaikAligner command line.

These reads didn't align to the reference for a reason. One of those reasons will be because they align to a non-contiguous span. A good example of this is aligning to the end of one exon to the beginning of another exon.

3. Using some in-house programs, we take those alignments, trim off the parts that aligned, create a new read archive, and align the reads yet again.

You can easily do something similar with the MosaikTools C++ or Perl API. Or you could export the reads into some other format and work from there.

The reads that aligned to two significant regions are prime candidates for split-read structural variations.

Cheers,

// Michael

**delphi_ote** · 02-01-2011, 09:28 AM

Since MosaikText doesn't properly deal with clipping when converting to SAM/BAM format, I wouldn't recommend it for this application. Without soft clipping, you're losing the necessary information to get the portion of the read not included in the alignment. Furthermore, without hard clipping information, you're losing the information to even know that a portion of the read didn't align in the first place. You're going to have to realign every single read to its own reference sequence alignment just to get back the unaligned portion of the read, which seems completely absurd.

In this day, with literally hundreds of alignment programs available and a mature standard alignment format available and widely used, I can't see learning an API for a aging alignment program myself. But that's what you're in for if you want to use Mosaik for this task. Just wanted to qualify snownebula's enthusiastic post. SAM/BAM is not really an option with Mosaik for this task, and it took me days to figure this out.

**wanpinglee** · 04-25-2011, 10:04 AM

Hi there,

MOSAIK v2.0 supports soft clipping. The source code can be downloaded here, https://github.com/wanpinglee/MOSAIK.

Wan-Ping

**delphi_ote** · 04-30-2011, 12:59 PM

Originally posted by wanpinglee View Post

Hi there,

MOSAIK v2.0 supports soft clipping. The source code can be downloaded here, https://github.com/wanpinglee/MOSAIK.

Wan-Ping

Great news!

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 29 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Split read mapping

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News