SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Question on mapping quality and uniquely mapped read chenyao Bioinformatics 6 12-10-2012 02:18 AM
Multiple mapping locations for the same read deMan Bioinformatics 3 03-28-2012 07:21 PM
split read SV identification avinash Bioinformatics 2 01-03-2012 11:53 AM
Problem with BWA mapping second read pbluescript Bioinformatics 3 10-12-2011 07:52 AM
SOLID split RNA-Seq mapping darked89 SOLiD 0 08-15-2011 06:23 AM

Reply
 
Thread Tools
Old 08-07-2009, 01:08 AM   #1
bosTau2
Member
 
Location: Antwerp, BE or Cambrigde, UK

Join Date: Nov 2008
Posts: 12
Default Split read mapping

Mosaik does split read mapping for structural variation but does any one know any other program that does split read mapping??
Thank you.
From Antwerp
hi1
bosTau2 is offline   Reply With Quote
Old 08-07-2009, 08:44 AM   #2
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by bosTau2 View Post
Mosaik does split read mapping for structural variation but does any one know any other program that does split read mapping??
Thank you.
From Antwerp
hi1
Split read mapping? Please be more specific.
nilshomer is offline   Reply With Quote
Old 08-07-2009, 09:47 AM   #3
bosTau2
Member
 
Location: Antwerp, BE or Cambrigde, UK

Join Date: Nov 2008
Posts: 12
Default

Split read mapping: a read is mapped to two separate locations because of possible structural variation.
-------- A ----------- break --------------- B -----------------
|==============||=====================|

This mapping makes sense for read longer than 50-76 or 454reads with sufficient coverage.
Split reads should be flagged with 256 in SAM. So any split reads should have a SAM flag greater than 256. So far I have not seen any of split reads.

Mosaik does this and BC is specialized in this area but the version released does not, I think. I thought ssaha does this but other people told me it does not.
hi1
bosTau2 is offline   Reply With Quote
Old 10-22-2009, 06:10 PM   #4
snownebula
Junior Member
 
Location: Boston, MA

Join Date: Oct 2009
Posts: 9
Default

Hi there,

We have been using the split read methodology quite a bit with MOSAIK.

We have a new version out that makes this available to the masses. In addition to MOSAIK, we used some external code for our split-read alignments.

Briefly, our process is as follows:

1. Align the reads against a reference sequence, but remember to store the unaligned reads (-rur parameter).

e.g. "-rur ChrX_unaligned.fq" will store the unaligned reads in the specified fastq file.

2. Build a new read archive using the unaligned reads from step 1.

3. We align the reads as normal, but instead of requiring the entire read to align, we specify that we want to align at least X bp of a read (-min X).

Normally, MOSAIK will count the unaligned portions of the read as mismatches. In this case, this is not what we want - so we deactivate that using the -mmal parameter.

e.g. If I wanted to align at least 32 bp of a read, I would add "-min 32 -mmal" to my MosaikAligner command line.

These reads didn't align to the reference for a reason. One of those reasons will be because they align to a non-contiguous span. A good example of this is aligning to the end of one exon to the beginning of another exon.

3. Using some in-house programs, we take those alignments, trim off the parts that aligned, create a new read archive, and align the reads yet again.

You can easily do something similar with the MosaikTools C++ or Perl API. Or you could export the reads into some other format and work from there.

The reads that aligned to two significant regions are prime candidates for split-read structural variations.

Cheers,

// Michael
snownebula is offline   Reply With Quote
Old 10-22-2009, 06:33 PM   #5
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

You may also try "bwa bwasw" with the default settings. You will see two or more alignments for a chimeric read. However, by default it probably works better for >150-200bp reads. It will miss some hits for shorter reads.

PS: SAM flag 256 is not for split reads. Actually, SAM does not specify how split reads should be represented. In addition, bwasw identifies chimeric reads, not really split reads. It simply does local alignment. Two non-overlapping pieces on a read can be aligned on different strands or to different chromosomes.

Last edited by lh3; 10-22-2009 at 06:39 PM.
lh3 is offline   Reply With Quote
Old 10-23-2009, 12:32 PM   #6
jnfass
Member
 
Location: Davis, CA

Join Date: Aug 2008
Posts: 88
Default

Quote:
Originally Posted by lh3 View Post
You may also try "bwa bwasw" with the default settings. You will see two or more alignments for a chimeric read. However, by default it probably works better for >150-200bp reads. It will miss some hits for shorter reads.

PS: SAM flag 256 is not for split reads. Actually, SAM does not specify how split reads should be represented. In addition, bwasw identifies chimeric reads, not really split reads. It simply does local alignment. Two non-overlapping pieces on a read can be aligned on different strands or to different chromosomes.
So, does bwa bwasw (formerly misnamed as bwtsw?) not produce more than one alignment for each chunk of read?
And, is there a way to force bwasw to apply the mismatch and indel cutoffs to the entire read -- in other words, not identify chimeric reads?
jnfass is offline   Reply With Quote
Old 10-23-2009, 12:48 PM   #7
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

BWT-SW is a different software that was published last year by a Hong Kong group. Previously the BWA-SW algorithm was named as dBWT-SW but people complain that it is hard to pronounce.

Reporting local hits is the right thing for reads longer than 200bp. Long reads are fragile to SVs and misassemblies in the reference. We do not always know if the unaligned part is due to SV/misassembly or to low quality bases. If it is due to SV, forcefully aligning the entire reads will lead to spurious variants; if it is due to low quality bases, discarding them does not do much harm. You may reduce the mismatch/gap penalty to get longer aligned segments based on the error profile of your reads, but forcefully aligning the entire read is not an option.
lh3 is offline   Reply With Quote
Old 10-23-2009, 01:38 PM   #8
jnfass
Member
 
Location: Davis, CA

Join Date: Aug 2008
Posts: 88
Default

Hi Heng,

That helps - very good point that assemblies may have chimeric sequence in them, so even if you expect no SV in your reads, local alignments are appropriate for long reads.

But what about the number of alignments? Does bwasw look for the best local alignment for each chunk of a read, and only report one alignment for each chunk? I.e. is each base of a read involved in only one alignment (and is then clipped out of all other alignments)? Or can one stretch of a read be matched to different locations in the reference, thus appear on different lines of the bwasw output SAM file?

~Joe
jnfass is offline   Reply With Quote
Old 10-23-2009, 02:19 PM   #9
bosTau2
Member
 
Location: Antwerp, BE or Cambrigde, UK

Join Date: Nov 2008
Posts: 12
Default

Thank M and H,
Mosaik and BWA split reads will be useful for SV as well as RNA seq in which a read can be mapped in separate locations, I think.
Similar to Joe's questions. In Mosaik and BWA, how these spitted reads will be presented in SAM? Also how are the mapping qualities for these reads?

Another question:
>PS: SAM flag 256 is not for split reads.
(from SAMrool) 256 : the alignment is not primary (a read having split hits may have multiple primary alignment records)
How do we interpret this if this is not for split read mapping???

Mosaik and BWA have very nice features but the manuals do not even mention split read mapping. It will be good to have these feature described in the manuals since it is not so obvious how to use them. Slightly different but PIDEL does split read but it is purely for SV detection.

hi1
not from Antwerp.
bosTau2 is offline   Reply With Quote
Old 10-23-2009, 04:14 PM   #10
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

BWA does as follows:

In BWA-SW, we say two alignments are distinct if the length of the
overlapping region on the query is less than half of the length of the
shorter query segment. We aim to find a set of distinct alignments which
maximizes the sum of scores of each alignment in the set. This problem
can be solved by dynamic programming, but as in our case a read is
usually aligned entirely, a greedy approximation would work well. In the
practical implementation, we sort the local alignments based on their
alignment scores, scan the sorted list from the best one and keep an
alignment if it is distinct from all the kept alignments with larger
scores; if alignment a_2 is rejected because it is not distinctive
from a_1, we regard a_2 to be a suboptimal alignment to a_1 and
use this information to approximate the mapping quality.

A chimeric read will occupy two or more lines in SAM. Effectively identifying chimera and conveniently reporting chimera are important features of bwasw. They are documented in the bwa manual page as well as FAQ on its home page. In practical applications, you just need to use the default option. (Actually bwasw is designed in a way that internal parameters are adjusted automatically based on the input length and the error rate, and therefore the default option works for most inputs with different characteristics).

Nonetheless, pindel still has its advantage. An aligner specifically designed for split reads (not chimeric reads in general) is able to identify shorter matches and should achieve higher sensitivity.
lh3 is offline   Reply With Quote
Old 06-10-2010, 04:10 AM   #11
hada
Junior Member
 
Location: Shenzhen, China

Join Date: Jun 2010
Posts: 1
Default what are disadvantages of SR(split read) method in sequencing how to avoid it?

SR is popular now.But I don't know its distanvages and how to avoid it.I really appreciate it if you can help me solve this problem, thank you!



Quote:
Originally Posted by snownebula View Post
Hi there,

We have been using the split read methodology quite a bit with MOSAIK.

We have a new version out that makes this available to the masses. In addition to MOSAIK, we used some external code for our split-read alignments.

Briefly, our process is as follows:

1. Align the reads against a reference sequence, but remember to store the unaligned reads (-rur parameter).

e.g. "-rur ChrX_unaligned.fq" will store the unaligned reads in the specified fastq file.

2. Build a new read archive using the unaligned reads from step 1.

3. We align the reads as normal, but instead of requiring the entire read to align, we specify that we want to align at least X bp of a read (-min X).

Normally, MOSAIK will count the unaligned portions of the read as mismatches. In this case, this is not what we want - so we deactivate that using the -mmal parameter.

e.g. If I wanted to align at least 32 bp of a read, I would add "-min 32 -mmal" to my MosaikAligner command line.

These reads didn't align to the reference for a reason. One of those reasons will be because they align to a non-contiguous span. A good example of this is aligning to the end of one exon to the beginning of another exon.

3. Using some in-house programs, we take those alignments, trim off the parts that aligned, create a new read archive, and align the reads yet again.

You can easily do something similar with the MosaikTools C++ or Perl API. Or you could export the reads into some other format and work from there.

The reads that aligned to two significant regions are prime candidates for split-read structural variations.

Cheers,

// Michael
hada is offline   Reply With Quote
Old 02-01-2011, 08:28 AM   #12
delphi_ote
Junior Member
 
Location: Champaign, IL

Join Date: Oct 2010
Posts: 9
Default

Since MosaikText doesn't properly deal with clipping when converting to SAM/BAM format, I wouldn't recommend it for this application. Without soft clipping, you're losing the necessary information to get the portion of the read not included in the alignment. Furthermore, without hard clipping information, you're losing the information to even know that a portion of the read didn't align in the first place. You're going to have to realign every single read to its own reference sequence alignment just to get back the unaligned portion of the read, which seems completely absurd.

In this day, with literally hundreds of alignment programs available and a mature standard alignment format available and widely used, I can't see learning an API for a aging alignment program myself. But that's what you're in for if you want to use Mosaik for this task. Just wanted to qualify snownebula's enthusiastic post. SAM/BAM is not really an option with Mosaik for this task, and it took me days to figure this out.
delphi_ote is offline   Reply With Quote
Old 04-25-2011, 10:04 AM   #13
wanpinglee
Junior Member
 
Location: boston, ma

Join Date: Oct 2009
Posts: 1
Default

Hi there,

MOSAIK v2.0 supports soft clipping. The source code can be downloaded here, https://github.com/wanpinglee/MOSAIK.


Wan-Ping
wanpinglee is offline   Reply With Quote
Old 04-30-2011, 12:59 PM   #14
delphi_ote
Junior Member
 
Location: Champaign, IL

Join Date: Oct 2010
Posts: 9
Default

Quote:
Originally Posted by wanpinglee View Post
Hi there,

MOSAIK v2.0 supports soft clipping. The source code can be downloaded here, https://github.com/wanpinglee/MOSAIK.


Wan-Ping
Great news!
delphi_ote is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:07 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO