SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
samtools flagstat output nguyendofx Bioinformatics 23 01-22-2014 05:45 AM
Galaxy tool to filter low mapping quality reads? kwoweiho Bioinformatics 3 08-08-2013 06:56 AM
Samtools flagstat - no duplicates? Orr Shomroni Bioinformatics 3 11-25-2011 01:46 AM
Samtools flagstat Anelda Bioinformatics 0 09-26-2011 04:55 AM
samtools flagstat bair Bioinformatics 3 05-28-2010 07:15 AM

Reply
 
Thread Tools
Old 11-01-2012, 07:35 AM   #1
nr23
Member
 
Location: Ireland

Join Date: Oct 2012
Posts: 42
Default Samtools flagstat - low % reads mapping

Hi,

I'm working with RNA-Seq and using bowtie and tophat to align 65bp PE reads to a reference genome. My reads were sequenced from X.laevis and I'm attempting to first map to X.tropicalis (X.laevis genome is still draft version).

After trimming and filtering my reads I am left with 31*2 = 62M reads but running samtools on my accepted_hits.bam file shows that only 12M reads have mapped in total. I'm completely confused about why the number of reads mapping is so low - I've tried fine tuning the options in tophat (-r value, -N value) and using differently trimmed reads - but have seen little improvement on 20% mapping success.

In addition almost none of my reads pair properly (samtools flagstat 'properly paired' = 0.01%).

Any help would be hugely appreciated,

Thanks
nr23 is offline   Reply With Quote
Old 11-01-2012, 08:08 AM   #2
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

How have you trimmed your reads? Have you looked for adaptor sequence in your reads?
chadn737 is offline   Reply With Quote
Old 11-01-2012, 08:18 AM   #3
nr23
Member
 
Location: Ireland

Join Date: Oct 2012
Posts: 42
Default

I've trimmed the reads using fastq_quality_trimmer & filter and fastx_trimmer.

One of the problems I've had is that the RNA fragment size is ~130 bp (post adapter removal) and my 100bp reads therefore overlap considerably. I've been using fastx_trimmer to cut the reads to 65bp to ensure no overlap - but they don't seem to be pairing properly in mapping.

I haven't checked for adapters - I ran the .txt files through fastqc and there were no over-represented sequences.

N
nr23 is offline   Reply With Quote
Old 11-01-2012, 08:37 AM   #4
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

Thats what I thought.

Even at 65 bp you may still have overlap and/or adaptor sequence.

Is it critical that you have paired end data? I had a similar situation with some paired end data. I simply dispensed with the second set of reads and treated it as single end reads. With that amount of overlap, its probably going to be impossible for tophat to get the insert size right.

Also try adaptor trimming with a trimmer that can handle variable lengths of adaptor sequence, I have used cutadapt with great success. Then try realigning without your paired end and you should have better results.

Otherwise....make a new library.
chadn737 is offline   Reply With Quote
Old 11-01-2012, 09:01 AM   #5
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,079
Default

If your reads are overlapping significantly you may want to try this as an alternative to stitch the two ends together.

http://bioinformatics.oxfordjournals...tr507.full.pdf

Updated citation:

Tanja Magoč and Steven L. Salzberg

FLASH: fast length adjustment of short reads to improve genome assemblies Bioinformatics (2011) 27(21): 2957-2963

Last edited by GenoMax; 11-01-2012 at 09:04 AM.
GenoMax is offline   Reply With Quote
Old 11-01-2012, 09:04 AM   #6
nr23
Member
 
Location: Ireland

Join Date: Oct 2012
Posts: 42
Default

I ran the 65bp trimmed reads through FLASH (http://genomics.jhu.edu/software/FLASH/index.shtml) to confirm that, post trim, there's no overlap.

As I understand it bowtie and tophat map the pairs independently, so I would expect that dispensing of 1/2 of my reads would result in the same % mapped reads, maybe I'm wrong though?

My primary concern is that the % of reads mapped is so low, I'm less concerned about the pairing of the reads (I'm interested in differential expression rather than resolving isoforms etc) but can't help but feel that the two are linked...
nr23 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:18 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO