Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • bob-loblaw
    Member
    • Jun 2012
    • 59

    Bowtie2 detecting human transcripts that STAR misses

    Hi everyone,

    I'm having the problem mentioned in the title above and it's not making any sense to me. In the RNA-Seq dataset that I have I run STAR, then I look at the left over transcripts, usually blast some of them or something. Often they are still mostly human (which get aligned to hg20 using bowtie2). I can't understand this at all, STAR being a spliced aligner should be aligning far more than bowtie2 does. I was thinking it could indicate human DNA contamination but even then shouldn't STAR still align continuous sequences? Here are two such reads that weren't aligned by STAR but are by Bowtie2 (They're not paired end, so this is two different reads). I'd hate to stop using STAR, love that speed.

    TATCGTAGTTATCTACACGACGGGGAGTCAGGCAACTATGGATGAACGAAATAGACAGATCGCTGAGATAGGTGCCTCACTGATTAAGCATTGGTAACTGT

    ACCTTCTAGTGGTGTTTACTTGAGACCTTTTGTCATTTAATGTGTGCTGAATAAATGCCAGCACCCCTGAGTAGAAAGCAATCATGTACCTGCAGATGGTC

    Hopefully someone can point me in the right direction!
    Thanks!
  • mikep
    Member
    • Feb 2011
    • 45

    #2
    Did you mean you look at the leftover reads (as opposed to transcripts)?

    Also, whats the quality like on those reads, and what do the bowtie alignments look like?

    Comment

    • bob-loblaw
      Member
      • Jun 2012
      • 59

      #3
      Originally posted by mikep View Post
      Did you mean you look at the leftover reads (as opposed to transcripts)?

      Also, whats the quality like on those reads, and what do the bowtie alignments look like?
      Yeah the leftover reads are what I meant. The quality varies a bit, there are some bad in there, but plenty of good too. But the quality on all of these reads should be enough to allow an accurate alignment.

      The alignments look fine, as I said in the previous post I blasted a lot of these reads first, then they were hitting human sequences so thats when I decided to do bowtie2. So I think the bowtie2 alignments are accurate, or relatively anyway. I just don't understand why STAR didn't detect these.

      Comment

      • mikep
        Member
        • Feb 2011
        • 45

        #4
        Well, I dunno what bowtie2 is doing, but that first sequence you posted above has a 100% hit to various bacterial sequences, and no hits to human using megablast, so I'd be rather glad star aint aligning it. The 2nd seems to hit some random stretch of the hg not associated with any gene, and it looks chimeric, and it needs balst against nr, finding no hits with megabalst vs hg

        I'd be not worrying about them. What % of your reads fall in this category?

        Any chance your username comes from Arrested Development?

        Comment

        • bob-loblaw
          Member
          • Jun 2012
          • 59

          #5
          Originally posted by mikep View Post
          Well, I dunno what bowtie2 is doing, but that first sequence you posted above has a 100% hit to various bacterial sequences, and no hits to human using megablast, so I'd be rather glad star aint aligning it. The 2nd seems to hit some random stretch of the hg not associated with any gene, and it looks chimeric, and it needs balst against nr, finding no hits with megabalst vs hg

          I'd be not worrying about them. What % of your reads fall in this category?

          Any chance your username comes from Arrested Development?

          Oh sorry my bad, that first sequence must be from some other source.

          Well that's the problem, in some files its as high as 50%. I've had problems with contamination in this dataset before though so I wouldn't be surprised if there was more.

          Comment

          • bob-loblaw
            Member
            • Jun 2012
            • 59

            #6
            Originally posted by mikep View Post
            Well, I dunno what bowtie2 is doing, but that first sequence you posted above has a 100% hit to various bacterial sequences, and no hits to human using megablast, so I'd be rather glad star aint aligning it. The 2nd seems to hit some random stretch of the hg not associated with any gene, and it looks chimeric, and it needs balst against nr, finding no hits with megabalst vs hg

            I'd be not worrying about them. What % of your reads fall in this category?

            Any chance your username comes from Arrested Development?
            and yeah it comes from Arrested Development. Bob loblaws law blog

            You know come to think of it, I have seen something like this in RNA-Seq datasets before, even published ones, where one sequences the transcritpome of human or mouse or whatever, but not all of it aligns back to the reference database (in my exp sometimes up to as much as 10 or 15%). I was never really able to find an answer as why that was, I always just figured it was chimeric reads and stuff, perhaps that is the case and bowtie2 is able to align them where STAR is not... or maybe I'm reaching at straws here.

            Comment

            • Brian Bushnell
              Super Moderator
              • Jan 2014
              • 2709

              #7
              Perhaps STAR has trouble with reads containing sequencing errors. Do the alignments in bowtie2 but not STAR contain lots of mismatches and/or clipping?

              Comment

              • mikep
                Member
                • Feb 2011
                • 45

                #8
                I normally get about a 10% miss rate with mapping, finished a bunch of star runs this morning to find a miss rate of 25%.

                If I find anything in it I'll get back, otherwise 'fraid I got nothing.

                Comment

                • Brian Bushnell
                  Super Moderator
                  • Jan 2014
                  • 2709

                  #9
                  If you want a higher mapping rate... you might give BBMap a try. It's splice-aware and substantially more sensitive than Tophat.

                  Comment

                  • alexdobin
                    Senior Member
                    • Feb 2009
                    • 161

                    #10
                    hi @bob-loblaw,

                    As @mikep pointed out, the second sequence maps chimerically. You would need to enable chimeric output with --chimSegmentMin 20, and then STAR will output it into Chimeric.out.sam:

                    1 0 chr10 110358273 3 61M40S * 0 0 ACCTTCTAGTGGTGTTTACTTGAGACCTTTTGTCATTTAATGTGTGCTGAATAAATGCCAGCACCCCTGAGTAGAAAGCAATCATGTACCTGCAGATGGTC * NH:i:2 HI:i:1 AS:i:62 NM:i:0 MD:Z:61
                    1 272 chr10 110358218 3 40M61S * 0 0 GACCATCTGCAGGTACATGATTGCTTTCTACTCAGGGGTGCTGGCATTTATTCAGCACACATTAAATGACAAAAGGTCTCAAGTAAACACCACTAGAAGGT * NH:i:2 HI:i:2 AS:i:43 NM:i:0 MD:Z:40
                    I believe this is the same as the BLAST alignment. This is a strange chimeric sequence, with two pieces mapping in the same locus on the opposite strands.

                    You can also allow the output of the longer segment into Aligned.out.sam file by reducing the max mapped score/length requirement, e.g. --outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0.5:
                    1 0 chr10 110358273 255 63M38S * 0 0 ACCTTCTAGTGGTGTTTACTTGAGACCTTTTGTCATTTAATGTGTGCTGAATAAATGCCAGCACCCCTGAGTAGAAAGCAATCATGTACCTGCAGATGGTC * NH:i:1 HI:i:1 AS:i:62 NM:i:0 MD:Z:63

                    The low mapping rate maybe caused by various factors. The Log.final.out file can give you some hints about mapped length, error rate, multi-mappers etc (if you post it I can have a look at it). You can try to reduce the --outFilterMatchNminOverLread value to check the whether only small portions of the reads can be mapped. The most typical reasons for low mappability are
                    (i) rRNA. Normally they appear multimappers, make sure that you include unplaced scaffolds in the genome, since one of them contains very highly expressed rRNA loci.
                    (ii) poor sequencing quality of the read ends (then reducing --outFilterMatchNminOverLread will help)
                    (iii) contamination

                    Hopefully, that strange chimeric sequence is not representative of the reads that cannot be mapped - if so, it would mean some strange library making artifact.

                    Cheers
                    Alex

                    Comment

                    Latest Articles

                    Collapse

                    • GATTACAT
                      Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                      by GATTACAT
                      Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                      07-01-2026, 11:43 AM
                    • SEQadmin2
                      Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                      by SEQadmin2


                      I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                      Here are nine questions we think about, in roughly the order they matter, before...
                      06-18-2026, 07:11 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by SEQadmin2, Yesterday, 11:08 AM
                    0 responses
                    7 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-30-2026, 05:37 AM
                    0 responses
                    11 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-26-2026, 11:10 AM
                    0 responses
                    19 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-17-2026, 06:09 AM
                    0 responses
                    53 views
                    0 reactions
                    Last Post SEQadmin2  
                    Working...