Seqanswers Leaderboard Ad

**naxin** · 06-08-2012, 01:51 PM

Originally posted by swbarnes2 View Post

The other possibility; doublecheck that your -aln command line was right. If you accidently put a typo in one of your fastq names, and one fastq doesn't actually get aligned, sampe proceeds along anyway, and it returns crazy large insert sizes. So try running samse on each of your individual fastqs. You want to know that they are working, and you want to know if the two files are in sync.

Thanks a lot!

I am trying it right now.

**swNGS** · 06-09-2012, 10:05 AM

I had a very similar problem which was very helpfully fixed using Trimmomatic and TrimGalore as detailed in this thread:
http://seqanswers.com/forums/showthread.php?t=19874
The author of TrimGalore was particularly accommodating in modifying the script to allow different trimming of R1 and R2.

**bwubb** · 06-07-2013, 08:10 AM

Does discarding the size estimate affect anything with the read data, the quality, or any potential variant calls?

I am trying to determine if I should use the -A option for all of my data or if there is a way to dynamically determine that sampe will take forever and the -A option should be used.

Thanks.

**dGho** · 07-10-2013, 11:10 AM

Originally posted by rskr View Post

I have seen it when one of the pairs was quality filtered but the other then it gets replaced with whatever was next in the file so, it not longer matches.

1.1 1.2
2.1 2.2
3.1 3.2
4.1 5.2 <--4.2 was omitted, they are no longer in parity.
5.1 6.2

I have a question regarding using the -A option in the case above. If the reads are out of sync, as is the case between 4.1 and 5.2, bwa will not perform SW on the unmapped mate. What happens after that? will 5.1 and 6.2 be thrown away also bc they do not match...etc? I guess what I am asking is, is it dangerous to use -A and force bwa to throw away unmatched pairs. Are we losing important data by doing this? And is the mismatch something that carries on to all the reads after the mismatch?

**swbarnes2** · 07-10-2013, 01:21 PM

-A should really only be used if you know that your files are lined up right, and you know that the insert sizes won't properly match what bwa is expecting.

Fix your fastqs. You can pull out the singletons, align them separately, then combine the bams.

**dGho** · 07-11-2013, 04:43 AM

Originally posted by swbarnes2 View Post

-A should really only be used if you know that your files are lined up right, and you know that the insert sizes won't properly match what bwa is expecting.

Fix your fastqs. You can pull out the singletons, align them separately, then combine the bams.

Thank you so much for the advice. As tempting as it is to use -A as a quick solution, I am not completely comfortable with the idea because I don't completely understand what is being tossed away: the true "orphaned" read, or a read that does have a mate but simply does not line up correctly with its mate due to the presence of these singleton "orphans".

I am looking for more details on this but havent found it yet. If anyone can confirm that only the true singletons are ignored, then I guess -A would be a good solution. In the meantime, I think barnes' advice is the safest.

**dGho** · 07-11-2013, 05:53 AM

Originally posted by swbarnes2 View Post

-A should really only be used if you know that your files are lined up right, and you know that the insert sizes won't properly match what bwa is expecting.

Fix your fastqs. You can pull out the singletons, align them separately, then combine the bams.

Does anyone know how to pull out the singletons from paired end fastqs separated into two fastq (read1.fastq and read2.fastq)? I haven't found a tool that does this yet. Is this something I should write a script for?

**bwubb** · 07-11-2013, 06:00 AM

Originally posted by swbarnes2 View Post

-A should really only be used if you know that your files are lined up right, and you know that the insert sizes won't properly match what bwa is expecting.

Fix your fastqs. You can pull out the singletons, align them separately, then combine the bams.

Could you describe a method for identifying singletons between one read.fq file and its mate? Thanks.

**rskr** · 07-11-2013, 07:01 AM

Originally posted by dGho View Post

I have a question regarding using the -A option in the case above. If the reads are out of sync, as is the case between 4.1 and 5.2, bwa will not perform SW on the unmapped mate. What happens after that? will 5.1 and 6.2 be thrown away also bc they do not match...etc? I guess what I am asking is, is it dangerous to use -A and force bwa to throw away unmatched pairs. Are we losing important data by doing this? And is the mismatch something that carries on to all the reads after the mismatch?

Any results with files out of parity are invalid(in addition to being a waste of time waiting for the results). If the files are in parity, and the mate doesn't map, is a different question.

**dGho** · 07-11-2013, 08:50 AM

answering my own question, but if anyone else is looking for a way remove singletons, check out this thread. I am trying this out now. azneto shared his script for making sure that two fastqs are in sync. It seems to use a whole lot of ram though

http://seqanswers.com/forums/showthread.php?t=17974

**dGho** · 08-13-2013, 10:18 AM

I just wanted to confirm that azneto's script worked well. It removed singletons and ordered the two fastq files so reads were synchronized. Running bwa sampe on the resulting fastqs produced no errors and had runtimes that feel within the expected range
.

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 12 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News