Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • High discordant alignments

    I've set up a galaxy workflow for paired end first stranded RNAseq, and I've gotten some odd summary results from Tophat2 alignment. At least I think they're odd as I'm new to this.

    Left reads:
    Input : 218685181
    Mapped : 193500858 (88.5% of input)
    of these: 14727362 ( 7.6%) have multiple alignments (40016 have >20)
    Right reads:
    Input : 218685181
    Mapped : 196263585 (89.7% of input)
    of these: 14724480 ( 7.5%) have multiple alignments (40380 have >20)
    Unpaired reads:
    Input : 5950944
    Mapped : 5300035 (89.1% of input)
    of these: 227937 ( 4.3%) have multiple alignments (142 have >20)
    89.1% overall read mapping rate.

    Aligned pairs: 173668750
    of these: 13863688 ( 8.0%) have multiple alignments
    170432898 (98.1%) are discordant alignments
    1.5% concordant pair alignment rate.
    Here's the flagstat output


    490744296 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    490744296 + 0 mapped (100.00%:-nan%)
    486148534 + 0 paired in sequencing
    241299292 + 0 read1
    244849242 + 0 read2
    523372 + 0 properly paired (0.11%:-nan%)
    443477134 + 0 with itself and mate mapped
    42671400 + 0 singletons (8.78%:-nan%)
    418612688 + 0 with mate mapped to a different chr
    312416516 + 0 with mate mapped to a different chr (mapQ>=5)
    For the number of reads mapped, the concordant pairs seem extremely low. I'm wondering if I missed a parameter in Tophat or Bowtie? Notably, I have not set a read group identifier in Bowtie (necessary?), nor could I figure out how from the Bowtie documentation. I also wonder if something could be awry with my fastq files, as they have been concatenated from a split dataset. Here are the first couple reads from the foreward and reverse data respectively.

    @HW-ST997:217:C3KKGACXX:4:1101:1432:2038 1:N:0:TGACCA
    TTCATCTTTAGATAATGAATTATATCCAAGATCAGACTGGCCACCTGTACTAGATCTATCATCAGTAGCATATACTTTGATTAAACCCG
    +
    FF00B<<FFFFFFBBFFFBFIFBBF0BBFFFFBFFFFIF<FFF<FBFF7BBBB<<B<''<B<BBB<<BBBBBFFFBBF<<B<7B7<BBB
    @HW-ST997:217:C3KKGACXX:4:1101:1474:2051 1:N:0:TGACCA
    GAGGGAGTATAGGGCTGTGACTAGTATGTTGAGTCCTGTAAGTAGGAGAGTGATATTTGATCAGGAGAACGTGGTTACTAGCACAGAGA
    +
    FIFIIBFBBFFFIIFFFFFFFFFFFBFFIIIFFFIIIFFFFFFFFFBF<BBBBF0BFFFBFFBFFFFFFFBFBFBFB<BBBBBBBBBFB
    @HW-ST997:217:C3KKGACXX:4:1101:1451:2106 1:N:0:TGACCA
    ACTGGGAAACGTTCACGCTGGGTCCAGCATTTGCCATGGACAAGATGCCAGGACCCGTATGCTTCAGGATGAAGTTCTTGTCATCAAAT
    +
    FIIFFBBFFFFFFBB7<7BBFFF77BBFFIFFFIFBFFFIFFIIF<B<0<BB7BBBBB<BBBBBBBB0BBBB0<7<BBBB0'0B<B<BB




    @HW-ST997:217:C3KKGACXX:4:1101:1452:2018 2:N:0:TGACCA
    TTACCCCCATACTCCTTACACTATTCCTCATCANCCNACTAAAAATATTAAACACAAACTACCACCTACCTCCCTCACCAAAGCCCATA
    +
    FFFFFFFF7FFFIIIIIFFFFFFFIIFFFFFFB#0B#07<FFFIFFFFIFBFFIFFFFFFFFBFF<BB<BFFFFB<BBBBBFBFFB<BB
    @HW-ST997:217:C3KKGACXX:4:1101:1474:2051 2:N:0:TGACCA
    AGTCATTCTCATAATCGCCCACGGGCTTACATCNTCNTTACTATTCTGCCTAGCAAACTCAAACTACGAACGCACTCACAGTCGCATCA
    +
    FFFIIFFFIIFIIFFBFBFFFIIIIFFFIFFFF#0<#07<BBFFFBBFBFFBBFFFFFBFFFFFFFFFFFFFBBBBFFBFFBBBFBBFB
    @HW-ST997:217:C3KKGACXX:4:1101:1409:2234 2:N:0:TGACCA
    ATCTCAGAAAAGAAGACATGGAATATGCCCTGNNTANACTGGATGACACCAAATTCCGCTCTCATGAGGGTGAAACTTCCTACATCCGA
    +
    <BFFFIFFIIIBBFFBFBBFFFFF7FFFFFII##07#07BFFBFFBFFFIFFFBF7BBFFBBBBBBB<BB0<B<'7<BBBBBBBBBBB<
    Thanks in advance for any help!

    -Jeremy

  • #2
    What options did you use when running tophat/bowtie ?
    Since you use stranded-data, you might want to check the '--library-type' option.

    Comment


    • #3
      Thanks for the response yueluo. I ran it through a galaxy wrapper but I selected the first-strand option, so the wrapper should be passing the command onto Bowtie. I just spoke with a colleague who informed me that my paired end reads appear to be out of order.

      For instance:

      Read1-foreward:
      1101:1432:2038 1:N:0:TGACCA
      Read1-Reverse
      1101:1452:2018 2:N:0:TGACCA

      This may have happened when I concatenated the files, or it might just be how I received the sequencing data. Do you have any ideas about how I can re-sort by coordinates?

      Comment


      • #4
        I suggest you go back to the raw files, and map them without modifying them in any way. If you want to merge multiple datasets, you can do that after you have the sam/bam files.

        Comment


        • #5
          I suggest you go back to the raw files, and map them without modifying them in any way. If you want to merge multiple datasets, you can do that after you have the sam/bam files.
          After looking into this some more, I'm not sure there is a way to feed multiple files into the galaxy Tophat2 wrapper. Fortunately it looks like they have tool specifically for combining paired end read files (which I swear I looked for before ). We'll see if this works. As a backup, we'll run another instance of Tophat2 via command line arguments.

          You suggest not modifying them in any way. Does this include trimming/clipping and other QC measures? I am worried about this as it seems that if a read has enough low scoring bases, then it might be cut from say the forward file but not the reverse, leading again to misalignment.

          Comment


          • #6
            Originally posted by reventropy View Post
            You suggest not modifying them in any way. Does this include trimming/clipping and other QC measures? I am worried about this as it seems that if a read has enough low scoring bases, then it might be cut from say the forward file but not the reverse, leading again to misalignment.
            That's exactly why I made the suggestion; there are a lot of poorly-written tools that break read pairing, and that's usually the culprit.

            If you need to do quality or adapter trimming, I can suggest BBDuk, which is made to handle single or paired files, keeping reads together. It's extremely fast and uses a better quality-trimming algorithm than most alternatives, as well as being more sensitive in adapter-trimming (you can specify the number of mismatches allowed). You can also use it for contaminant removel (phiX, e.coli, various spike-ins or vectors).

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Advancing Precision Medicine for Rare Diseases in Children
              by seqadmin




              Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
              12-16-2024, 07:57 AM
            • seqadmin
              Recent Advances in Sequencing Technologies
              by seqadmin



              Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

              Long-Read Sequencing
              Long-read sequencing has seen remarkable advancements,...
              12-02-2024, 01:49 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 12-17-2024, 10:28 AM
            0 responses
            26 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 12-13-2024, 08:24 AM
            0 responses
            43 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 12-12-2024, 07:41 AM
            0 responses
            29 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 12-11-2024, 07:45 AM
            0 responses
            42 views
            0 likes
            Last Post seqadmin  
            Working...
            X