Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bowtie2 not giving the proper # reads when combining replicates

    Hi Folks,

    I was hoping that someone could shed some light on an issue that is driving me crazy. I am aligning some trimmed reads (from cutadapt 1.9.1, PE mode) using bowtie 2 v2.3.2. I aligned my biological replicates and then aligned all replicates together to generate a concatenated file. When peak calling with macs2 I noticed that the concatenated file did not have the expected number of peaks. Since I thought this could be an issue with piping to samtools v1.5 to make bam files, I instead kept things simple and realigned sending my output to a sam file instead (line breaks added for clarity):

    #rep1
    bowtie2 -p 4 -q -x hg19 \
    -1 rep1_1_1.fastq.gz,rep1_2_1.fastq.gz,rep1_3_1.fastq.gz \
    -2 rep1_1_2.fastq.gz,rep1_2_2.fastq.gz,rep1_3_2.fastq.gz \
    -S rep1.sam

    #rep2
    bowtie2 -p 4 -q -x hg19 \
    -1 rep2_1_1.fastq.gz,rep2_2_1.fastq.gz,rep2_3_1.fastq.gz \
    -2 rep2_1_2.fastq.gz,rep2_2_2.fastq.gz,rep2_3_2.fastq.gz \
    -S rep2.sam

    #rep3
    bowtie2 -p 4 -q -x hg19 \
    -1 rep3_1_1.fastq.gz,rep3_2_1.fastq.gz,rep3_3_1.fastq.gz \
    -2 rep3_1_2.fastq.gz,rep3_2_2.fastq.gz,rep3_3_2.fastq.gz \
    -S rep3.sam

    To make the concatenated alignment, I ran with all fastq files:

    #rep cat
    bowtie2 -p 4 -q -x hg19 \
    -1 rep1_1_1.fastq.gz,rep1_2_1.fastq.gz,rep1_3_1.fastq.gz,rep2_1_1.fastq.gz,rep2_2_1.fastq.gz,rep2_3_1.fastq.gz,rep3_1_1.fastq.gz,rep3_2_1.fastq.gz,rep3_3_1.fastq.gz \
    -2 rep1_1_2.fastq.gz,rep1_2_2.fastq.gz,rep1_3_2.fastq.gz,rep2_1_2.fastq.gz,rep2_2_2.fastq.gz,rep2_3_2.fastq.gz,rep3_1_2.fastq.gz,rep3_2_2.fastq.gz,rep3_3_2.fastq.gz \
    -S cat.sam

    However when I check the file sizes, I observe this:

    wc -l *.sam
    44436462 rep1.sam
    45941920 rep2.sam
    46204354 rep3.sam
    44436462 cat.sam <-!!!

    Shouldn't the concatenated file be the sum of the replicates 44436462 + 45941920 + 46204354 = 136582736 (minus a few lines for duplicated headers)? It's also noteworthy that it is giving the number of lines as the first replicate. I checked if there were any errant spaces in my comma separated fastq lists, and there aren't any. It's almost like bowtie2 stops after reading so many fastq files.

    Yes, I know that I could use samtools merge on my sorted bams to get a concatenated file. However, I have always been leery about putting on a new header. In theory aligning to all files should overcome this, so any ideas why it is not?

    Thanks

  • #2
    I should also note that I have not found any hidden characters in my file using both

    sed -n 'l' alignment_cmd.sh
    cat -A alignment_cmd.sh

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Strategies for Sequencing Challenging Samples
      by seqadmin


      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
      03-22-2024, 06:39 AM
    • seqadmin
      Techniques and Challenges in Conservation Genomics
      by seqadmin



      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

      Avian Conservation
      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
      03-08-2024, 10:41 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, Yesterday, 06:37 PM
    0 responses
    12 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, Yesterday, 06:07 PM
    0 responses
    10 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 03-22-2024, 10:03 AM
    0 responses
    51 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 03-21-2024, 07:32 AM
    0 responses
    68 views
    0 likes
    Last Post seqadmin  
    Working...
    X