SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
fastq-dump and paired end reads moritz Bioinformatics 3 01-09-2014 01:57 AM
Fastq format and Paired-end reads sandhya Bioinformatics 10 07-03-2013 04:24 AM
Cut the reads.. paired end fastq file empyrean Bioinformatics 5 06-05-2012 08:52 AM
Can paired-end mapping produce more reads than single-end ? warrenemmett Bioinformatics 13 03-20-2012 11:10 PM
should I reverse paired end reads before mapping? supermario Bioinformatics 2 03-06-2012 08:59 PM

Reply
 
Thread Tools
Old 03-19-2011, 09:46 AM   #1
cedance
Senior Member
 
Location: Germany

Join Date: Feb 2011
Posts: 108
Default Fastq: Paired end reads and mapping

Hi,
I took a pair of paired-end reads (PE1.fq and PE2.fq) and mapped it to my reference database using bwa tool and obtained an ouput in SAM format. Then I split the Aligned and Unaligned reads from the sam file using ViewSam from Picard.

However, I see that for some reads, one end is mapped/aligned and the other end is unaligned. Is this normal?

The problem at a later stage is that even if they are split into aligned and unaligned their flag is still set to "paired reads" and so if you try to convert either of the split-files (Aligned.sam and Unaligned.sam) back to fastq format using "SamToFastq" module of picard, it gives error showing that the other end of the paired read isn't found!

I just wrote a script to extract the fastq format file from this myself because of this issue. However, I would like to know the likeliness of this error.

Thank you.
cedance is offline   Reply With Quote
Old 03-20-2011, 09:30 AM   #2
Yilong Li
Member
 
Location: WTSI

Join Date: Dec 2010
Posts: 41
Default

Try running picard FixMateInformation (or something like that) after ViewSam to correct the mate pair information in the resulting sam files.

I guess it is not unexpected to see fragments where only one read has been mapped - this could happen for example when the other read comes from some inserted sequence that is not present in the reference genome. We do see such read pairs in our own data.
Yilong Li is offline   Reply With Quote
Old 03-20-2011, 09:35 AM   #3
cedance
Senior Member
 
Location: Germany

Join Date: Feb 2011
Posts: 108
Default

Li,

Thanks for your reply. I already tried fixing for Mate information and duplicates and still Picard was not able to extract the fastq incase of paired reads where 1 read was missing and ended up in error. I guess, SamToFastq checks for a flag which tells if its a paired read of not (and if it is, then the other one must exist), but ViewSam doesn't necessarily rewrite this information for the reads which were separated. Anyway I just extracted the fastq files from the sam file myself, it wasn't difficult.

However, its nice to know that this can happen. Thanks once again!
cedance is offline   Reply With Quote
Old 06-17-2011, 03:43 PM   #4
myrna
Member
 
Location: Vancouver, Canada

Join Date: Feb 2008
Posts: 44
Default FixMateInformation complains about incorrect mate pair information

I was trying to use SRMA to perform local realignment of my RNA-seq data. It complained about some reads having bad mate information. I think it is because the mates of some of my reads mapped to chromosomes that are not in my bam reference file (long story). I was hoping to use FixMateInformation to correct this issue, but I get the same complaint from Picard when trying to run this. Is there a way to over-ride this check in Picard?

java -jar picard-tools-1.47/FixMateInformation.jar INPUT=HS0639_7.bam OUTPUT=HS0639_7.bam
[Fri Jun 17 16:24:03 PDT 2011] net.sf.picard.sam.FixMateInformation INPUT=[HS0639_7.rmdup.broad.sort.bam] OUTPUT=HS0639_7.rmdup.broad.sort.matefix.bam TMP_DIR=/tmp/rmorin VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
INFO 2011-06-17 16:24:03 FixMateInformation Sorting input into queryname order.
[Fri Jun 17 16:24:03 PDT 2011] net.sf.picard.sam.FixMateInformation done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=758054912
Exception in thread "main" java.lang.RuntimeException: SAM validation error: ERROR: Record 938, Read name SOLEXA3_60:2:4:1712:630, Mate Alignment start should be 0 because reference name = *.
at net.sf.samtools.SAMUtils.processValidationErrors(SAMUtils.java:334)
at net.sf.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:469)
at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:450)
at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:417)
at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:629)
at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:607)
at net.sf.picard.sam.FixMateInformation.doWork(FixMateInformation.java:146)
at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:158)
at net.sf.picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:118)
at net.sf.picard.sam.FixMateInformation.main(FixMateInformation.java:74)
myrna is offline   Reply With Quote
Old 06-17-2011, 10:20 PM   #5
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Try this option:
Code:
VALIDATION_STRINGENCY=SILENT
.

Last edited by nilshomer; 06-17-2011 at 10:23 PM.
nilshomer is offline   Reply With Quote
Old 06-18-2011, 05:12 AM   #6
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,542
Default

Quote:
Originally Posted by cedance View Post
However, I see that for some reads, one end is mapped/aligned and the other end is unaligned. Is this normal?
Yes, normal but undesirable. It can and does happen for good reasons.

For example, with poor quality reads (or if you are mapping a different strain) one read might match within the thresholds, but the other might be too different.

Another example is if you are mapping against an unfinished genome, one read might map to a contig but the partner would map to the unassembled region off the end of the contig.
maubp is offline   Reply With Quote
Old 06-18-2011, 10:48 AM   #7
myrna
Member
 
Location: Vancouver, Canada

Join Date: Feb 2008
Posts: 44
Default

Thanks Nils.
Does VALIDATION_STRINGENCY=SILENT apply to running SRMA or will this only work for Picard tools? In other words, I'm wondering if FixMateInformation is a prerequisite for running SRMA successfully, or if I can simply run SRMA.

Thanks.
Ryan
myrna is offline   Reply With Quote
Old 06-18-2011, 12:33 PM   #8
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

It should work with all tools that use the Picard library (like SRMA). No, FixMateInformation is not a pre-requisite for SRMA. In fact, you will lose mate information with SRMA. See: http://sourceforge.net/apps/mediawik...ng_information
nilshomer is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:54 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO