SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Chip-Seq Combining the Replicates priya Bioinformatics 0 07-17-2014 06:43 AM
Tophat/GSNAP: proper-paired reads riziai RNA Sequencing 0 04-26-2013 01:30 PM
Discard reads from file that fail to map as a proper pair. paolo.kunder Bioinformatics 6 02-05-2013 07:16 AM
Reads not mapped in proper pair: Bowtie output RGP Bioinformatics 4 12-14-2011 07:21 AM
Cancer transcriptome: combining incomplete replicates Ooinp RNA Sequencing 0 07-17-2011 10:06 PM

Reply
 
Thread Tools
Old 09-20-2017, 10:35 AM   #1
mkareta
Junior Member
 
Location: SD, USA

Join Date: May 2017
Posts: 6
Default Bowtie2 not giving the proper # reads when combining replicates

Hi Folks,

I was hoping that someone could shed some light on an issue that is driving me crazy. I am aligning some trimmed reads (from cutadapt 1.9.1, PE mode) using bowtie 2 v2.3.2. I aligned my biological replicates and then aligned all replicates together to generate a concatenated file. When peak calling with macs2 I noticed that the concatenated file did not have the expected number of peaks. Since I thought this could be an issue with piping to samtools v1.5 to make bam files, I instead kept things simple and realigned sending my output to a sam file instead (line breaks added for clarity):

#rep1
bowtie2 -p 4 -q -x hg19 \
-1 rep1_1_1.fastq.gz,rep1_2_1.fastq.gz,rep1_3_1.fastq.gz \
-2 rep1_1_2.fastq.gz,rep1_2_2.fastq.gz,rep1_3_2.fastq.gz \
-S rep1.sam

#rep2
bowtie2 -p 4 -q -x hg19 \
-1 rep2_1_1.fastq.gz,rep2_2_1.fastq.gz,rep2_3_1.fastq.gz \
-2 rep2_1_2.fastq.gz,rep2_2_2.fastq.gz,rep2_3_2.fastq.gz \
-S rep2.sam

#rep3
bowtie2 -p 4 -q -x hg19 \
-1 rep3_1_1.fastq.gz,rep3_2_1.fastq.gz,rep3_3_1.fastq.gz \
-2 rep3_1_2.fastq.gz,rep3_2_2.fastq.gz,rep3_3_2.fastq.gz \
-S rep3.sam

To make the concatenated alignment, I ran with all fastq files:

#rep cat
bowtie2 -p 4 -q -x hg19 \
-1 rep1_1_1.fastq.gz,rep1_2_1.fastq.gz,rep1_3_1.fastq.gz,rep2_1_1.fastq.gz,rep2_2_1.fastq.gz,rep2_3_1.fastq.gz,rep3_1_1.fastq.gz,rep3_2_1.fastq.gz,rep3_3_1.fastq.gz \
-2 rep1_1_2.fastq.gz,rep1_2_2.fastq.gz,rep1_3_2.fastq.gz,rep2_1_2.fastq.gz,rep2_2_2.fastq.gz,rep2_3_2.fastq.gz,rep3_1_2.fastq.gz,rep3_2_2.fastq.gz,rep3_3_2.fastq.gz \
-S cat.sam

However when I check the file sizes, I observe this:

wc -l *.sam
44436462 rep1.sam
45941920 rep2.sam
46204354 rep3.sam
44436462 cat.sam <-!!!

Shouldn't the concatenated file be the sum of the replicates 44436462 + 45941920 + 46204354 = 136582736 (minus a few lines for duplicated headers)? It's also noteworthy that it is giving the number of lines as the first replicate. I checked if there were any errant spaces in my comma separated fastq lists, and there aren't any. It's almost like bowtie2 stops after reading so many fastq files.

Yes, I know that I could use samtools merge on my sorted bams to get a concatenated file. However, I have always been leery about putting on a new header. In theory aligning to all files should overcome this, so any ideas why it is not?

Thanks
mkareta is offline   Reply With Quote
Old 09-20-2017, 10:53 AM   #2
mkareta
Junior Member
 
Location: SD, USA

Join Date: May 2017
Posts: 6
Default

I should also note that I have not found any hidden characters in my file using both

sed -n 'l' alignment_cmd.sh
cat -A alignment_cmd.sh
mkareta is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:10 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO