SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Mixing paired-end and single-end reads in Tophat: Do I have to reverse the SE-reads f JonB Bioinformatics 5 12-11-2015 01:08 AM
For paired end libraries, how do I know which is forward and which is reverse? skmotay RNA Sequencing 10 10-09-2014 10:41 AM
How to count number of mapped paired-end and single-end rna-seq reads repinementer Bioinformatics 8 01-06-2013 05:06 AM
MetaSim: why paired end reverse read is much shorter than forward read?? gen_argentino Bioinformatics 0 09-06-2012 06:38 AM
rmdup can not move duplicates in forward and reverse strand for single-end reads ct586 Bioinformatics 4 03-11-2012 05:01 PM

Reply
 
Thread Tools
Old 09-28-2021, 12:21 PM   #1
rachpetersen
Junior Member
 
Location: Denver, CO

Join Date: Sep 2021
Posts: 2
Default Paired-end RNA-seq: large discrepancy in number of forward versus reverse reads

Hi all,

I am having some issues interpreting the output from the bamtools stats command. I am working with paired end RNAseq data generated from olive baboon vaginal swabs (so we are expecting some bacterial contamination). I mapped the sequences to the olive baboon reference genome using STAR, and then used the bamtools stats command to see how many reads are mapping. The output is a bit perplexing because the samples consistently have a much higher proportion of forward strand reads than reverse strand reads, while the number of R1 reads is equal to the number of R2 reads. I have pasted the output of one file below:

**********************************************
Stats for BAM file(s):
**********************************************

Total reads: 30135821
Mapped reads: 1274337 (4.22865%)
Forward strand: 29498803 (97.8862%)
Reverse strand: 637018 (2.11382%)
Failed QC: 0 (0%)
Duplicates: 0 (0%)
Paired-end reads: 30135821 (100%)
'Proper-pairs': 1272676 (4.22313%)
Both pairs mapped: 1272676 (4.22313%)
Read 1: 15067605
Read 2: 15068216
Singletons: 1661 (0.00551171%)


Does anyone have any ideas of what might be going on here? I've looked at the raw reads and there are approximately the same number of reads in the R1 and the R2 files.

Thank you in advance for your advice!
rachpetersen is offline   Reply With Quote
Old 09-29-2021, 10:22 AM   #2
luc
Senior Member
 
Location: US

Join Date: Dec 2010
Posts: 466
Default

Sorry, I have no idea what could be happening here. Have you checked the quality of both forward and the reverse reads with FASTQC, FASTP or similar?
luc is offline   Reply With Quote
Old 10-11-2021, 11:39 AM   #3
rachpetersen
Junior Member
 
Location: Denver, CO

Join Date: Sep 2021
Posts: 2
Default

Thanks for your response, luc!

I tried mapping the R1 and R2 files separately in single end mode, and interestingly, got really high mapping percentages. Using the same sample that I gave the output for in my original posting after running paired end mapping, here is the bamtools stats output for the R1 and R2 files separately:

R1 single end mapping
**********************************************
Stats for BAM file(s):
**********************************************

Total reads: 99555409
Mapped reads: 96735523 (97.1675%)
Forward strand: 51635582 (51.8662%)
Reverse strand: 47919827 (48.1338%)
Failed QC: 0 (0%)
Duplicates: 0 (0%)
Paired-end reads: 0 (0%)

R2 single end mapping
**********************************************
Stats for BAM file(s):
**********************************************

Total reads: 99786078
Mapped reads: 97634672 (97.844%)
Forward strand: 50328230 (50.4361%)
Reverse strand: 49457848 (49.5639%)
Failed QC: 0 (0%)
Duplicates: 0 (0%)
Paired-end reads: 0 (0%)


I'm a little confused by this, because when I check the R1 and R2 files, there are the same number of sequences, yet in this output it looks like there are a different number of "Total Reads" in the two files. I also tried sorting the seqs by name in each file prior to running the paired end mapping again, but that didn't help the issue of low mapping + many more forward strand vs reverse strand reads.

Any advice would be greatly appreciated! Thanks in advance for your help!
rachpetersen is offline   Reply With Quote
Old 10-12-2021, 06:17 AM   #4
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 511
Default

Your reads have been processed in some manner (i.e., filtered by quality) so that R1 and R2 are no longer properly paired. For unprocessed reads, the number of R1 and R2 should be identical, and you should have zero singletons: the stats in your first post indicate otherwise. Sorting by name does not help b/c, as soon as this first singleton is encountered, all subsequent reads are out of register/mispaired.

Apparently, your aligner constrains the R2 search space based on R1 alignment; since R1 and R2 are mispaired, it is unable to find a match for R2 in that space.

The solution is to fix pairing with BBTools Repair, then repeat the alignment. Report back whether or not that solved your problem.

Last edited by HESmith; 10-12-2021 at 06:20 AM.
HESmith is offline   Reply With Quote
Old 10-13-2021, 06:01 PM   #5
luc
Senior Member
 
Location: US

Join Date: Dec 2010
Posts: 466
Default

I very much agree. Thanks HESmith!
Quote:
Originally Posted by HESmith View Post
Your reads have been processed in some manner (i.e., filtered by quality) so that R1 and R2 are no longer properly paired. For unprocessed reads, the number of R1 and R2 should be identical, and you should have zero singletons: the stats in your first post indicate otherwise. Sorting by name does not help b/c, as soon as this first singleton is encountered, all subsequent reads are out of register/mispaired. ...
luc is offline   Reply With Quote
Reply

Tags
bamtools, paired end mapping, rna-seq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:08 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO