Seqanswers Leaderboard Ad

Collapse
X
Collapse
+ More Options
Posts
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • reventropy
    Junior Member
    • Apr 2014
    • 7

    High discordant alignments

    I've set up a galaxy workflow for paired end first stranded RNAseq, and I've gotten some odd summary results from Tophat2 alignment. At least I think they're odd as I'm new to this.

    Left reads:
    Input : 218685181
    Mapped : 193500858 (88.5% of input)
    of these: 14727362 ( 7.6%) have multiple alignments (40016 have >20)
    Right reads:
    Input : 218685181
    Mapped : 196263585 (89.7% of input)
    of these: 14724480 ( 7.5%) have multiple alignments (40380 have >20)
    Unpaired reads:
    Input : 5950944
    Mapped : 5300035 (89.1% of input)
    of these: 227937 ( 4.3%) have multiple alignments (142 have >20)
    89.1% overall read mapping rate.

    Aligned pairs: 173668750
    of these: 13863688 ( 8.0%) have multiple alignments
    170432898 (98.1%) are discordant alignments
    1.5% concordant pair alignment rate.
    Here's the flagstat output


    490744296 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    490744296 + 0 mapped (100.00%:-nan%)
    486148534 + 0 paired in sequencing
    241299292 + 0 read1
    244849242 + 0 read2
    523372 + 0 properly paired (0.11%:-nan%)
    443477134 + 0 with itself and mate mapped
    42671400 + 0 singletons (8.78%:-nan%)
    418612688 + 0 with mate mapped to a different chr
    312416516 + 0 with mate mapped to a different chr (mapQ>=5)
    For the number of reads mapped, the concordant pairs seem extremely low. I'm wondering if I missed a parameter in Tophat or Bowtie? Notably, I have not set a read group identifier in Bowtie (necessary?), nor could I figure out how from the Bowtie documentation. I also wonder if something could be awry with my fastq files, as they have been concatenated from a split dataset. Here are the first couple reads from the foreward and reverse data respectively.

    @HW-ST997:217:C3KKGACXX:4:1101:1432:2038 1:N:0:TGACCA
    TTCATCTTTAGATAATGAATTATATCCAAGATCAGACTGGCCACCTGTACTAGATCTATCATCAGTAGCATATACTTTGATTAAACCCG
    +
    FF00B<<FFFFFFBBFFFBFIFBBF0BBFFFFBFFFFIF<FFF<FBFF7BBBB<<B<''<B<BBB<<BBBBBFFFBBF<<B<7B7<BBB
    @HW-ST997:217:C3KKGACXX:4:1101:1474:2051 1:N:0:TGACCA
    GAGGGAGTATAGGGCTGTGACTAGTATGTTGAGTCCTGTAAGTAGGAGAGTGATATTTGATCAGGAGAACGTGGTTACTAGCACAGAGA
    +
    FIFIIBFBBFFFIIFFFFFFFFFFFBFFIIIFFFIIIFFFFFFFFFBF<BBBBF0BFFFBFFBFFFFFFFBFBFBFB<BBBBBBBBBFB
    @HW-ST997:217:C3KKGACXX:4:1101:1451:2106 1:N:0:TGACCA
    ACTGGGAAACGTTCACGCTGGGTCCAGCATTTGCCATGGACAAGATGCCAGGACCCGTATGCTTCAGGATGAAGTTCTTGTCATCAAAT
    +
    FIIFFBBFFFFFFBB7<7BBFFF77BBFFIFFFIFBFFFIFFIIF<B<0<BB7BBBBB<BBBBBBBB0BBBB0<7<BBBB0'0B<B<BB




    @HW-ST997:217:C3KKGACXX:4:1101:1452:2018 2:N:0:TGACCA
    TTACCCCCATACTCCTTACACTATTCCTCATCANCCNACTAAAAATATTAAACACAAACTACCACCTACCTCCCTCACCAAAGCCCATA
    +
    FFFFFFFF7FFFIIIIIFFFFFFFIIFFFFFFB#0B#07<FFFIFFFFIFBFFIFFFFFFFFBFF<BB<BFFFFB<BBBBBFBFFB<BB
    @HW-ST997:217:C3KKGACXX:4:1101:1474:2051 2:N:0:TGACCA
    AGTCATTCTCATAATCGCCCACGGGCTTACATCNTCNTTACTATTCTGCCTAGCAAACTCAAACTACGAACGCACTCACAGTCGCATCA
    +
    FFFIIFFFIIFIIFFBFBFFFIIIIFFFIFFFF#0<#07<BBFFFBBFBFFBBFFFFFBFFFFFFFFFFFFFBBBBFFBFFBBBFBBFB
    @HW-ST997:217:C3KKGACXX:4:1101:1409:2234 2:N:0:TGACCA
    ATCTCAGAAAAGAAGACATGGAATATGCCCTGNNTANACTGGATGACACCAAATTCCGCTCTCATGAGGGTGAAACTTCCTACATCCGA
    +
    <BFFFIFFIIIBBFFBFBBFFFFF7FFFFFII##07#07BFFBFFBFFFIFFFBF7BBFFBBBBBBB<BB0<B<'7<BBBBBBBBBBB<
    Thanks in advance for any help!

    -Jeremy
  • yueluo
    Member
    • Aug 2013
    • 82

    #2
    What options did you use when running tophat/bowtie ?
    Since you use stranded-data, you might want to check the '--library-type' option.

    Comment

    • reventropy
      Junior Member
      • Apr 2014
      • 7

      #3
      Thanks for the response yueluo. I ran it through a galaxy wrapper but I selected the first-strand option, so the wrapper should be passing the command onto Bowtie. I just spoke with a colleague who informed me that my paired end reads appear to be out of order.

      For instance:

      Read1-foreward:
      1101:1432:2038 1:N:0:TGACCA
      Read1-Reverse
      1101:1452:2018 2:N:0:TGACCA

      This may have happened when I concatenated the files, or it might just be how I received the sequencing data. Do you have any ideas about how I can re-sort by coordinates?

      Comment

      • Brian Bushnell
        Super Moderator
        • Jan 2014
        • 2709

        #4
        I suggest you go back to the raw files, and map them without modifying them in any way. If you want to merge multiple datasets, you can do that after you have the sam/bam files.

        Comment

        • reventropy
          Junior Member
          • Apr 2014
          • 7

          #5
          I suggest you go back to the raw files, and map them without modifying them in any way. If you want to merge multiple datasets, you can do that after you have the sam/bam files.
          After looking into this some more, I'm not sure there is a way to feed multiple files into the galaxy Tophat2 wrapper. Fortunately it looks like they have tool specifically for combining paired end read files (which I swear I looked for before ). We'll see if this works. As a backup, we'll run another instance of Tophat2 via command line arguments.

          You suggest not modifying them in any way. Does this include trimming/clipping and other QC measures? I am worried about this as it seems that if a read has enough low scoring bases, then it might be cut from say the forward file but not the reverse, leading again to misalignment.

          Comment

          • Brian Bushnell
            Super Moderator
            • Jan 2014
            • 2709

            #6
            Originally posted by reventropy View Post
            You suggest not modifying them in any way. Does this include trimming/clipping and other QC measures? I am worried about this as it seems that if a read has enough low scoring bases, then it might be cut from say the forward file but not the reverse, leading again to misalignment.
            That's exactly why I made the suggestion; there are a lot of poorly-written tools that break read pairing, and that's usually the culprit.

            If you need to do quality or adapter trimming, I can suggest BBDuk, which is made to handle single or paired files, keeping reads together. It's extremely fast and uses a better quality-trimming algorithm than most alternatives, as well as being more sensitive in adapter-trimming (you can specify the number of mismatches allowed). You can also use it for contaminant removel (phiX, e.coli, various spike-ins or vectors).

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Pathogen Surveillance with Advanced Genomic Tools
              by seqadmin




              The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
              03-24-2025, 11:48 AM
            • seqadmin
              New Genomics Tools and Methods Shared at AGBT 2025
              by seqadmin


              This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

              The Headliner
              The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
              03-03-2025, 01:39 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 10:17 AM
            0 responses
            7 views
            0 reactions
            Last Post seqadmin  
            Started by seqadmin, 03-20-2025, 05:03 AM
            0 responses
            49 views
            0 reactions
            Last Post seqadmin  
            Started by seqadmin, 03-19-2025, 07:27 AM
            0 responses
            59 views
            0 reactions
            Last Post seqadmin  
            Started by seqadmin, 03-18-2025, 12:50 PM
            0 responses
            50 views
            0 reactions
            Last Post seqadmin  
            Working...