![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Chip-Seq Combining the Replicates | priya | Bioinformatics | 0 | 07-17-2014 06:43 AM |
Tophat/GSNAP: proper-paired reads | riziai | RNA Sequencing | 0 | 04-26-2013 01:30 PM |
Discard reads from file that fail to map as a proper pair. | paolo.kunder | Bioinformatics | 6 | 02-05-2013 07:16 AM |
Reads not mapped in proper pair: Bowtie output | RGP | Bioinformatics | 4 | 12-14-2011 07:21 AM |
Cancer transcriptome: combining incomplete replicates | Ooinp | RNA Sequencing | 0 | 07-17-2011 10:06 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: SD, USA Join Date: May 2017
Posts: 6
|
![]()
Hi Folks,
I was hoping that someone could shed some light on an issue that is driving me crazy. I am aligning some trimmed reads (from cutadapt 1.9.1, PE mode) using bowtie 2 v2.3.2. I aligned my biological replicates and then aligned all replicates together to generate a concatenated file. When peak calling with macs2 I noticed that the concatenated file did not have the expected number of peaks. Since I thought this could be an issue with piping to samtools v1.5 to make bam files, I instead kept things simple and realigned sending my output to a sam file instead (line breaks added for clarity): #rep1 bowtie2 -p 4 -q -x hg19 \ -1 rep1_1_1.fastq.gz,rep1_2_1.fastq.gz,rep1_3_1.fastq.gz \ -2 rep1_1_2.fastq.gz,rep1_2_2.fastq.gz,rep1_3_2.fastq.gz \ -S rep1.sam #rep2 bowtie2 -p 4 -q -x hg19 \ -1 rep2_1_1.fastq.gz,rep2_2_1.fastq.gz,rep2_3_1.fastq.gz \ -2 rep2_1_2.fastq.gz,rep2_2_2.fastq.gz,rep2_3_2.fastq.gz \ -S rep2.sam #rep3 bowtie2 -p 4 -q -x hg19 \ -1 rep3_1_1.fastq.gz,rep3_2_1.fastq.gz,rep3_3_1.fastq.gz \ -2 rep3_1_2.fastq.gz,rep3_2_2.fastq.gz,rep3_3_2.fastq.gz \ -S rep3.sam To make the concatenated alignment, I ran with all fastq files: #rep cat bowtie2 -p 4 -q -x hg19 \ -1 rep1_1_1.fastq.gz,rep1_2_1.fastq.gz,rep1_3_1.fastq.gz,rep2_1_1.fastq.gz,rep2_2_1.fastq.gz,rep2_3_1.fastq.gz,rep3_1_1.fastq.gz,rep3_2_1.fastq.gz,rep3_3_1.fastq.gz \ -2 rep1_1_2.fastq.gz,rep1_2_2.fastq.gz,rep1_3_2.fastq.gz,rep2_1_2.fastq.gz,rep2_2_2.fastq.gz,rep2_3_2.fastq.gz,rep3_1_2.fastq.gz,rep3_2_2.fastq.gz,rep3_3_2.fastq.gz \ -S cat.sam However when I check the file sizes, I observe this: wc -l *.sam 44436462 rep1.sam 45941920 rep2.sam 46204354 rep3.sam 44436462 cat.sam <-!!! Shouldn't the concatenated file be the sum of the replicates 44436462 + 45941920 + 46204354 = 136582736 (minus a few lines for duplicated headers)? It's also noteworthy that it is giving the number of lines as the first replicate. I checked if there were any errant spaces in my comma separated fastq lists, and there aren't any. It's almost like bowtie2 stops after reading so many fastq files. Yes, I know that I could use samtools merge on my sorted bams to get a concatenated file. However, I have always been leery about putting on a new header. In theory aligning to all files should overcome this, so any ideas why it is not? Thanks |
![]() |
![]() |
![]() |
#2 |
Junior Member
Location: SD, USA Join Date: May 2017
Posts: 6
|
![]()
I should also note that I have not found any hidden characters in my file using both
sed -n 'l' alignment_cmd.sh cat -A alignment_cmd.sh |
![]() |
![]() |
![]() |
Thread Tools | |
|
|