Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bowtie2 not giving the proper # reads when combining replicates

    Hi Folks,

    I was hoping that someone could shed some light on an issue that is driving me crazy. I am aligning some trimmed reads (from cutadapt 1.9.1, PE mode) using bowtie 2 v2.3.2. I aligned my biological replicates and then aligned all replicates together to generate a concatenated file. When peak calling with macs2 I noticed that the concatenated file did not have the expected number of peaks. Since I thought this could be an issue with piping to samtools v1.5 to make bam files, I instead kept things simple and realigned sending my output to a sam file instead (line breaks added for clarity):

    #rep1
    bowtie2 -p 4 -q -x hg19 \
    -1 rep1_1_1.fastq.gz,rep1_2_1.fastq.gz,rep1_3_1.fastq.gz \
    -2 rep1_1_2.fastq.gz,rep1_2_2.fastq.gz,rep1_3_2.fastq.gz \
    -S rep1.sam

    #rep2
    bowtie2 -p 4 -q -x hg19 \
    -1 rep2_1_1.fastq.gz,rep2_2_1.fastq.gz,rep2_3_1.fastq.gz \
    -2 rep2_1_2.fastq.gz,rep2_2_2.fastq.gz,rep2_3_2.fastq.gz \
    -S rep2.sam

    #rep3
    bowtie2 -p 4 -q -x hg19 \
    -1 rep3_1_1.fastq.gz,rep3_2_1.fastq.gz,rep3_3_1.fastq.gz \
    -2 rep3_1_2.fastq.gz,rep3_2_2.fastq.gz,rep3_3_2.fastq.gz \
    -S rep3.sam

    To make the concatenated alignment, I ran with all fastq files:

    #rep cat
    bowtie2 -p 4 -q -x hg19 \
    -1 rep1_1_1.fastq.gz,rep1_2_1.fastq.gz,rep1_3_1.fastq.gz,rep2_1_1.fastq.gz,rep2_2_1.fastq.gz,rep2_3_1.fastq.gz,rep3_1_1.fastq.gz,rep3_2_1.fastq.gz,rep3_3_1.fastq.gz \
    -2 rep1_1_2.fastq.gz,rep1_2_2.fastq.gz,rep1_3_2.fastq.gz,rep2_1_2.fastq.gz,rep2_2_2.fastq.gz,rep2_3_2.fastq.gz,rep3_1_2.fastq.gz,rep3_2_2.fastq.gz,rep3_3_2.fastq.gz \
    -S cat.sam

    However when I check the file sizes, I observe this:

    wc -l *.sam
    44436462 rep1.sam
    45941920 rep2.sam
    46204354 rep3.sam
    44436462 cat.sam <-!!!

    Shouldn't the concatenated file be the sum of the replicates 44436462 + 45941920 + 46204354 = 136582736 (minus a few lines for duplicated headers)? It's also noteworthy that it is giving the number of lines as the first replicate. I checked if there were any errant spaces in my comma separated fastq lists, and there aren't any. It's almost like bowtie2 stops after reading so many fastq files.

    Yes, I know that I could use samtools merge on my sorted bams to get a concatenated file. However, I have always been leery about putting on a new header. In theory aligning to all files should overcome this, so any ideas why it is not?

    Thanks

  • #2
    I should also note that I have not found any hidden characters in my file using both

    sed -n 'l' alignment_cmd.sh
    cat -A alignment_cmd.sh

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Techniques and Challenges in Conservation Genomics
      by seqadmin



      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

      Avian Conservation
      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
      03-08-2024, 10:41 AM
    • seqadmin
      The Impact of AI in Genomic Medicine
      by seqadmin



      Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
      02-26-2024, 02:07 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 03-14-2024, 06:13 AM
    0 responses
    32 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 03-08-2024, 08:03 AM
    0 responses
    71 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 03-07-2024, 08:13 AM
    0 responses
    80 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 03-06-2024, 09:51 AM
    0 responses
    68 views
    0 likes
    Last Post seqadmin  
    Working...
    X