Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bowtie paired-end versus single-end

    Hi guys,

    I'm totally new to NGS and have 2 fastq files corresponding to a paired-end illumina chipseq experiment. I am using bowtie to align the fastq : first i align only one fastq file then both of them. The % of reads with at least 1 reported alignment is so different for each case: 83% when i used only 1 fastq compared to ONLY 47% when i used both of them (as i should have). Anyone can explain me why ?

    here are the results:


    /bowtie -t -p 4 --sam --chunkmbs 1000 hg19/hg19 reads/V400can_chipseq_1.fastq > results/v400.sam

    # reads processed: 15749589
    # reads with at least one reported alignment: 13044920 (82.83%)
    # reads that failed to align: 2704669 (17.17%)
    Reported 13044920 alignments to 1 output stream(s)
    Time searching: 00:32:23

    AS OPPOSED TO:

    ./bowtie -t -p 4 --sam --chunkmbs 1000 -m 1 hg19/hg19 -1 reads/V400can_chipseq_1.fastq -2 reads/V400can_chipseq_2.fastq > results/v400_paired.sam

    # reads processed: 15749589
    # reads with at least one reported alignment: 7813517 (49.61%)
    # reads that failed to align: 7480588 (47.50%)
    # reads with alignments suppressed due to -m: 455484 (2.89%)
    Reported 7813517 paired-end alignments to 1 output stream(s)
    Time searching: 01:44:03


    Thanks a mil in advance,

    NZ

  • #2
    I don't use Bowtie much, but there's a setting for expected insert size, and I think Bowtie behaves very badly with pairs that are too far from that insert size. Crank up the maximum insert size, and try again

    Comment


    • #3
      Originally posted by swbarnes2 View Post
      I don't use Bowtie much, but there's a setting for expected insert size, and I think Bowtie behaves very badly with pairs that are too far from that insert size. Crank up the maximum insert size, and try again
      Max insert size -X, default is only 250 (ie. total size including both reads + insert has to be 250 or less). I set mine at 1000 so nothing should be excluded.

      Comment


      • #4
        Did you try inspecting reads that mapped the first time, and not the second?

        Again, I don't use bowtie, but the first time you used 1 fastq, and the second time you used 2, is it normal for the # of total reads to be the same?

        Also, rather than

        > results/v400_paired.sam
        consider

        | samtools view -bSh - > v400_paired.bam
        You can convert a subset of that file to .sam later to eyeball things.

        Comment


        • #5
          Thanks for all replies !
          i will have a look at the insert size story. True, the total nb of reads is given the same in both cases, and this is the output of bowtie. Can it be that bowtie counts each pair when reports on paired-end reads ?

          Comment


          • #6
            What are you aligning to, a full genome or genomic scaffolds?
            It makes sense that if you map PE data to scaffolds (which are not a continuous fragment) then a lot of sequences will fail to map if your insert size causes them to fall off the end of the fragment that the first PE maps to.

            If you do not care about your insert size i.e. not trying to re-sequence large regions of the genome, and have genomic scaffolds I would concatenate the PEs and map in single end mode

            Comment


            • #7
              honestly, that seems about right. bowtie2 made improvements to paired-end, so you may want to check that out. paired-end specific options: http://bowtie-bio.sourceforge.net/bo...ed-end-options

              Comment


              • #8
                I align against indexed hg19 downloaded from the bowtie website. i'll read that link. look at the beginning of the sam file bowtie gives me. Does anybody know what the 0 in the insert size position mean ?

                SRR424618.6 HWIUSI-EAS523_0001:5:1:999:17802 77 * 0 0 * * 0 0 NGGCTTTAGTCAAAGTACAGAAGACATTAGAAGAAAATTGCAGAAACAGGCTGGGTTTGCANGCATGAATNCGNCA #''''52)+.88633AAAAAAAAAAAA7AA7AAA7A72A8AAAAAA7AA########################### XM:i:1
                SRR424618.6 HWIUSI-EAS523_0001:5:1:999:17802 141 * 0 0 * * 0 0 NCAAACACCTGGTTGGCTATCTCCAATAACTGTGACGTATTCATGCCTGCAAACCCAGCNNNNNNNNNCANNNNNC #***('**+'::4:20*523AAA7AAAAAA############################################## XM:i:1
                SRR424618.10 99 chr20 42794368 255 76M = 42794395 103 NATGGAACCACCTCAGGGCCTTGGTATTGCTGTTCCCTCTACCTGTAATGCCCTTCCTCCAGATACCTACNTGGCT #'**'0.0..AAAAA8AA77::85:AAAAA############################################## XA:i:1 MD:Z:0C69A5 NM:i:2
                SRR424618.10 147 chr20 42794395 255 76M = 42794368 -103 TNNNNNTCNNNNNNNNNGTAATGCCCTTCCTCCAGATACCTACATGGCTCACCCTCTTGCCGTCTTCAAGCCTTTN ############################################################################ XA:i:1 MD:Z:1G0C0T0G0T2C0C0T0C0T0A0C0C0T58A0 NM:i:15
                SRR424618.9 163 chr13 99753904 255 76M = 99753933 105 NAGACCAGCCGGAGCAACAAAAAATTAGCTAGGCATGGTGGTGCATGCCAGTGGTCCCANNNNNNNNNGANNNNNG #''**00222AAAAAAAAAAA27*7626667AAAA######################################### XA:i:1 MD:Z:0G58G0C0T0A0C0T0T0T0G2G0G0G0T0G0A0 NM:i:16
                SRR424618.9 83 chr13 99753933 255 76M = 99753904 -105 TAGGCNTGGTGGTGCATGCCAGTGGTCCCAGCTACTTTGGAGGGTGAGATGTGAAGATCCCCTGAGCCCAGGAGTN ##################AAAA7AAA896:820*+*7AAAAAAAAAAAAAAAAAA8AAAAAAAAAA20.),*'*'# XA:i:1 MD:Z:5A69T0 NM:i:2

                Comment


                • #9
                  Did you look at the binary flags? 77 means that neither read of the pair mapped.

                  141 means the same thing. Notice how neither has a mapping position either? the quality turns to junk in the end, that might be part of the problem.

                  Comment


                  • #10
                    oh, thanks for that actually, i have started to figure out some of the flags numbers but these are new to me. If i align only the first of the fast files , with the -m 1 option, it gives: reads with at least 1 alignment: 70,66%
                    The second fast file gives 59.07% reads with at least 1 alignment.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM
                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    31 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    33 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    28 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    53 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X