SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > SOLiD



Similar Threads
Thread Thread Starter Forum Replies Last Post
Aligners for Illumina's mate-pairs Margarida Bioinformatics 8 07-29-2013 09:28 AM
Read length for Illumina mate pairs Linnea Illumina/Solexa 2 06-08-2010 11:46 PM
454 mate pairs and mosaik afb Bioinformatics 4 04-02-2010 05:07 AM
Mate pairs contaminated with paired ends - impact on assembly? reithme Bioinformatics 2 12-13-2009 11:35 PM
find all mate-pairs (75b / 0-infinity gap) alignment ramouz87 Bioinformatics 5 11-18-2009 11:04 PM

Reply
 
Thread Tools
Old 06-18-2010, 05:17 AM   #1
Calle
Junior Member
 
Location: Sweden

Join Date: Feb 2009
Posts: 5
Default bfast: how does it recognize SOLiD mate pairs

I have read through the bfast manual and although I generally find it very informative, I think it is unclear how SOLiD mate-pairs should be treated in the solid2fastq step. My SOLiD reads files have the following format: F3 and R3 mates are on the same rows of two separate files.

What are the requirements for successful fastq conversion of such files, keeping mate-pair information in the conversion process?

Should the names of mates be identical or is it ok to keep the F3 and R3s?

Should the pairs be placed next to each other prior to conversion in one file, as is exemplified in Figure 4.4 (depicting the resulting fastq file for Illumina reads)?

Regards

//Carl
Calle is offline   Reply With Quote
Old 06-18-2010, 07:54 AM   #2
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by Calle View Post
I have read through the bfast manual and although I generally find it very informative, I think it is unclear how SOLiD mate-pairs should be treated in the solid2fastq step. My SOLiD reads files have the following format: F3 and R3 mates are on the same rows of two separate files.

What are the requirements for successful fastq conversion of such files, keeping mate-pair information in the conversion process?

Should the names of mates be identical or is it ok to keep the F3 and R3s?

Should the pairs be placed next to each other prior to conversion in one file, as is exemplified in Figure 4.4 (depicting the resulting fastq file for Illumina reads)?

Regards

//Carl
Paired end or mate pair reads must have the same read name, so you have to strip off the trailing F3/R3 etc. The pairs/mates should be successive in the file.
nilshomer is offline   Reply With Quote
Old 08-24-2010, 06:12 AM   #3
epigen
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 101
Question aligning paired end data with differing lengths

I just learned that the new ABI SOLiD mate pairs have differing lengths: 50 bp for the first read and 35 bp for the other. How does this affect the usage of aligners that are designed to map reads of equal length?
I'll have to work with such data soon and I'm just thinking about the complications. Unless I'm mistaken, BWA does not work with different read sizes. As to BFAST, the indexes for the genome are different for lengths <40 and >40.
Will I have to apply a trick like adding 15 "." and according quality scores to the 35 bp reads, or shorten the 50 bp to 35 bp?
I hope you can enlighten me so I don't have to resort to using BioScope.
epigen is offline   Reply With Quote
Old 08-24-2010, 06:41 AM   #4
drio
Senior Member
 
Location: 4117'49"N / 24'42"E

Join Date: Oct 2008
Posts: 323
Default

Quote:
Originally Posted by epigen View Post
I just learned that the new ABI SOLiD mate pairs have
differing lengths: 50 bp for the first read and 35 bp for the other. How does
this affect the usage of aligners that are designed to map reads of equal
length? I'll have to work with such data soon and I'm just thinking about the
complications. Unless I'm mistaken, BWA does not work with different read
sizes.
BWA deals with different read lenghts without problems.

Quote:
As to BFAST, the indexes for the genome are different for lengths <40
and >40. Will I have to apply a trick like adding 15 "." and according quality
scores to the 35 bp reads, or shorten the 50 bp to 35 bp? I hope you can
enlighten me so I don't have to resort to using BioScope.
You can process both ends of the read using the recommended indexes.

Also, try Bioscope. I haven't use it since version 1.0 but I've heart it
is much friendly now.
__________________
-drd
drio is offline   Reply With Quote
Old 08-25-2010, 10:54 AM   #5
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Quote:
Originally Posted by epigen View Post
I just learned that the new ABI SOLiD mate pairs have differing lengths: 50 bp for the first read and 35 bp for the other.
Haven't heard about that but I suppose it could be done. Aside from a shorter run time and slightly less cost, I do not see the advantage of using 35 bp reads. Currently we are doing mate-pair runs of 50bp F3 with 50bp R3 and paired-end runs of 50bp F3 with 25bp F5. Personally I'd like to see the paired-end go up to 35bp since 25bp gets into 'noise' territory.

Quote:
I hope you can enlighten me so I don't have to resort to using BioScope.
I'll agree with 'drio' that bioscope has become a lot more friendly. Or perhaps I have just gotten use to it. Like any tool with lots of 'blades' to handle the various tasks people may wish to do, Bioscope can seem intimidating.
westerman is offline   Reply With Quote
Old 08-25-2010, 10:27 PM   #6
bpetersen
Member
 
Location: Germany

Join Date: Mar 2010
Posts: 20
Default

Quote:
Originally Posted by epigen View Post
I just learned that the new ABI SOLiD mate pairs have differing lengths: 50 bp for the first read and 35 bp for the other.
The reads you are talking about are not the SOLiD mate-pairs, but PAIRED-END. These are different from the mate-pairs, because library prep is the same as with fragment, but you get more data by additionally sequencing 25 or 35 bp from the other end of the fragment. Hope this helps!
bpetersen is offline   Reply With Quote
Old 09-07-2010, 05:58 AM   #7
epigen
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 101
Default BFAST indexes for SOLiD 50+35 paired end reads

Quote:
Originally Posted by bpetersen View Post
The reads you are talking about are not the SOLiD mate-pairs, but PAIRED-END. These are different from the mate-pairs, because library prep is the same as with fragment, but you get more data by additionally sequencing 25 or 35 bp from the other end of the fragment. Hope this helps!
Thanks for correcting me, indeed we have paired end of 50+35 bp.
We have decided to use both BioScope and BFAST.
For BFAST, I have the indexes for 50 bp already. Should I create additional indexes for the 35 bp ends or will it work well with the ones recommended for 50 bp? In this thread http://seqanswers.com/forums/showthread.php?t=3535 Nils and David had different recommendations for 35 bp and it seems the 50 bp indexes work better than the 25 bp indexes. Has anyone explicitely compared the performance?
epigen is offline   Reply With Quote
Old 09-07-2010, 07:42 AM   #8
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by epigen View Post
Thanks for correcting me, indeed we have paired end of 50+35 bp.
We have decided to use both BioScope and BFAST.
For BFAST, I have the indexes for 50 bp already. Should I create additional indexes for the 35 bp ends or will it work well with the ones recommended for 50 bp? In this thread http://seqanswers.com/forums/showthread.php?t=3535 Nils and David had different recommendations for 35 bp and it seems the 50 bp indexes work better than the 25 bp indexes. Has anyone explicitely compared the performance?
We haven't tested the indexes on the 35bp end. We have found that running BWA on the 35bp and BFAST on the 50bp end works very well. That is why we incorporated parts of BWA into BFAST, creating a hybrid version.
nilshomer is offline   Reply With Quote
Old 09-07-2010, 08:18 AM   #9
epigen
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 101
Default 50+35 bp SOLiD recipe

Quote:
Originally Posted by nilshomer View Post
We haven't tested the indexes on the 35bp end. We have found that running BWA on the 35bp and BFAST on the 50bp end works very well. That is why we incorporated parts of BWA into BFAST, creating a hybrid version.
Great that you have already experience to share! And now I know the true reason why BWA was incorporated in BFAST. I'll try that as soon as I get the data. Two questions come to my mind right now:
1. I assume that the indexes needed for bwaaln are the same as the ones that bwa index creates, right?
2. Which file do I have to specifiy with which parameter for bfast localalign:
-1 matches_from_bfastmatch_50bp -2 matches_from_bwaaln_35bp?

You might want to include the 50+35 bp SOLiD procedure in the manual. I find the examples given there very helpful and I'm sure other users would appreciate a "cooking recipe" for this, too, because 50+35 bp seem to become a standard for new SOLiD machines.

Thank you very much again Nils.
Best,
Barbara
epigen is offline   Reply With Quote
Old 09-08-2010, 04:44 AM   #10
drio
Senior Member
 
Location: 4117'49"N / 24'42"E

Join Date: Oct 2008
Posts: 323
Default

1. use bfast match for the 50bp tag and create your bmf file. Do the same for the second tag but using bwaaln. Bfast match will use bfast indexees and bwaaln will use bwa indexes.

2. Yes. -1 50bp bmf -2 25bp bmf.
__________________
-drd
drio is offline   Reply With Quote
Old 09-08-2010, 05:40 AM   #11
epigen
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 101
Default bfast bwaaln parameters

Thanks David!
Can I just use the defaults in bfast bwaaln? I was wondering about two parameters because those F5-P2 reads are already 35 bp:
"-l INT seed length [32]"
"-q INT quality threshold for read trimming down to 35bp [0]"

Best,
Barbara
epigen is offline   Reply With Quote
Old 09-09-2010, 01:03 AM   #12
KevinLam
Senior Member
 
Location: SEA

Join Date: Nov 2009
Posts: 199
Default

Quote:
Originally Posted by drio View Post

Also, try Bioscope. I haven't use it since version 1.0 but I've heart it
is much friendly now.
I constantly have to fight with the big 'friendly' giant to make it work the way I want. I think documentation has improved substantially though. They have changed to BAM as input for post mapping.
RAM requirements are going up and 1.3 is coming soon.
I dislike the lack of community support though.


@Nils: I wasn't aware of the incorporation of BWA portion into BFAST so how does that work?
when we input paired end data it auto uses BWA for the 35bp and BFAST for the front 50 bp?
KevinLam is offline   Reply With Quote
Old 09-09-2010, 08:24 AM   #13
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by KevinLam View Post
I constantly have to fight with the big 'friendly' giant to make it work the way I want. I think documentation has improved substantially though. They have changed to BAM as input for post mapping.
RAM requirements are going up and 1.3 is coming soon.
I dislike the lack of community support though.


@Nils: I wasn't aware of the incorporation of BWA portion into BFAST so how does that work?
when we input paired end data it auto uses BWA for the 35bp and BFAST for the front 50 bp?
The "bwa aln" command is incorporated into BFAST as the "bfast bwaaln" command, to support short reads (i.e. 35bp reads) The output format is BFAST-compatible (BMF) so it can be seamlessly input into "bfast localalign" and moved through the pipeline. Theoretically, you could run "bfast bwaaln" on both ends and it could go through the rest of the pipeline.
nilshomer is offline   Reply With Quote
Old 09-09-2010, 06:18 PM   #14
drio
Senior Member
 
Location: 4117'49"N / 24'42"E

Join Date: Oct 2008
Posts: 323
Default

Quote:
Originally Posted by epigen View Post
Thanks David!
Can I just use the defaults in bfast bwaaln? I was wondering about two parameters because those F5-P2 reads are already 35 bp:
"-l INT seed length [32]"
"-q INT quality threshold for read trimming down to 35bp [0]"

Best,
Barbara
Yes, defaults values on 35bp have yielded good results. Sample your
data (for quicker testing) and try different options.
__________________
-drd
drio is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:07 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO