SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   Samtools flagstat - low % reads mapping (http://seqanswers.com/forums/showthread.php?t=24653)

nr23 11-01-2012 07:35 AM

Samtools flagstat - low % reads mapping
 
Hi,

I'm working with RNA-Seq and using bowtie and tophat to align 65bp PE reads to a reference genome. My reads were sequenced from X.laevis and I'm attempting to first map to X.tropicalis (X.laevis genome is still draft version).

After trimming and filtering my reads I am left with 31*2 = 62M reads but running samtools on my accepted_hits.bam file shows that only 12M reads have mapped in total. I'm completely confused about why the number of reads mapping is so low - I've tried fine tuning the options in tophat (-r value, -N value) and using differently trimmed reads - but have seen little improvement on 20% mapping success.

In addition almost none of my reads pair properly (samtools flagstat 'properly paired' = 0.01%).

Any help would be hugely appreciated,

Thanks

chadn737 11-01-2012 08:08 AM

How have you trimmed your reads? Have you looked for adaptor sequence in your reads?

nr23 11-01-2012 08:18 AM

I've trimmed the reads using fastq_quality_trimmer & filter and fastx_trimmer.

One of the problems I've had is that the RNA fragment size is ~130 bp (post adapter removal) and my 100bp reads therefore overlap considerably. I've been using fastx_trimmer to cut the reads to 65bp to ensure no overlap - but they don't seem to be pairing properly in mapping.

I haven't checked for adapters - I ran the .txt files through fastqc and there were no over-represented sequences.

N

chadn737 11-01-2012 08:37 AM

Thats what I thought.

Even at 65 bp you may still have overlap and/or adaptor sequence.

Is it critical that you have paired end data? I had a similar situation with some paired end data. I simply dispensed with the second set of reads and treated it as single end reads. With that amount of overlap, its probably going to be impossible for tophat to get the insert size right.

Also try adaptor trimming with a trimmer that can handle variable lengths of adaptor sequence, I have used cutadapt with great success. Then try realigning without your paired end and you should have better results.

Otherwise....make a new library.

GenoMax 11-01-2012 09:01 AM

If your reads are overlapping significantly you may want to try this as an alternative to stitch the two ends together.

http://bioinformatics.oxfordjournals...tr507.full.pdf

Updated citation:

Tanja Magoč and Steven L. Salzberg

FLASH: fast length adjustment of short reads to improve genome assemblies Bioinformatics (2011) 27(21): 2957-2963

nr23 11-01-2012 09:04 AM

I ran the 65bp trimmed reads through FLASH (http://genomics.jhu.edu/software/FLASH/index.shtml) to confirm that, post trim, there's no overlap.

As I understand it bowtie and tophat map the pairs independently, so I would expect that dispensing of 1/2 of my reads would result in the same % mapped reads, maybe I'm wrong though?

My primary concern is that the % of reads mapped is so low, I'm less concerned about the pairing of the reads (I'm interested in differential expression rather than resolving isoforms etc) but can't help but feel that the two are linked...


All times are GMT -8. The time now is 03:11 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.