Unconfigured Ad

**mikep** · 08-07-2014, 12:11 AM

Did you mean you look at the leftover reads (as opposed to transcripts)?

Also, whats the quality like on those reads, and what do the bowtie alignments look like?

**bob-loblaw** · 08-07-2014, 12:59 AM

Originally posted by mikep View Post

Did you mean you look at the leftover reads (as opposed to transcripts)?

Also, whats the quality like on those reads, and what do the bowtie alignments look like?

Yeah the leftover reads are what I meant. The quality varies a bit, there are some bad in there, but plenty of good too. But the quality on all of these reads should be enough to allow an accurate alignment.

The alignments look fine, as I said in the previous post I blasted a lot of these reads first, then they were hitting human sequences so thats when I decided to do bowtie2. So I think the bowtie2 alignments are accurate, or relatively anyway. I just don't understand why STAR didn't detect these.

**mikep** · 08-07-2014, 01:16 AM

Well, I dunno what bowtie2 is doing, but that first sequence you posted above has a 100% hit to various bacterial sequences, and no hits to human using megablast, so I'd be rather glad star aint aligning it. The 2nd seems to hit some random stretch of the hg not associated with any gene, and it looks chimeric, and it needs balst against nr, finding no hits with megabalst vs hg

I'd be not worrying about them. What % of your reads fall in this category?

Any chance your username comes from Arrested Development?

**bob-loblaw** · 08-07-2014, 01:46 AM

Originally posted by mikep View Post

Well, I dunno what bowtie2 is doing, but that first sequence you posted above has a 100% hit to various bacterial sequences, and no hits to human using megablast, so I'd be rather glad star aint aligning it. The 2nd seems to hit some random stretch of the hg not associated with any gene, and it looks chimeric, and it needs balst against nr, finding no hits with megabalst vs hg

I'd be not worrying about them. What % of your reads fall in this category?

Any chance your username comes from Arrested Development?

Oh sorry my bad, that first sequence must be from some other source.

Well that's the problem, in some files its as high as 50%. I've had problems with contamination in this dataset before though so I wouldn't be surprised if there was more.

**bob-loblaw** · 08-07-2014, 05:48 AM

Originally posted by mikep View Post

Well, I dunno what bowtie2 is doing, but that first sequence you posted above has a 100% hit to various bacterial sequences, and no hits to human using megablast, so I'd be rather glad star aint aligning it. The 2nd seems to hit some random stretch of the hg not associated with any gene, and it looks chimeric, and it needs balst against nr, finding no hits with megabalst vs hg

I'd be not worrying about them. What % of your reads fall in this category?

Any chance your username comes from Arrested Development?

and yeah it comes from Arrested Development. Bob loblaws law blog

You know come to think of it, I have seen something like this in RNA-Seq datasets before, even published ones, where one sequences the transcritpome of human or mouse or whatever, but not all of it aligns back to the reference database (in my exp sometimes up to as much as 10 or 15%). I was never really able to find an answer as why that was, I always just figured it was chimeric reads and stuff, perhaps that is the case and bowtie2 is able to align them where STAR is not... or maybe I'm reaching at straws here.

**Brian Bushnell** · 08-07-2014, 08:43 AM

Perhaps STAR has trouble with reads containing sequencing errors. Do the alignments in bowtie2 but not STAR contain lots of mismatches and/or clipping?

**mikep** · 08-07-2014, 07:16 PM

I normally get about a 10% miss rate with mapping, finished a bunch of star runs this morning to find a miss rate of 25%.

If I find anything in it I'll get back, otherwise 'fraid I got nothing.

**Brian Bushnell** · 08-07-2014, 08:30 PM

If you want a higher mapping rate... you might give BBMap a try. It's splice-aware and substantially more sensitive than Tophat.

**alexdobin** · 08-14-2014, 02:38 PM

hi @bob-loblaw,

As @mikep pointed out, the second sequence maps chimerically. You would need to enable chimeric output with --chimSegmentMin 20, and then STAR will output it into Chimeric.out.sam:

1 0 chr10 110358273 3 61M40S * 0 0 ACCTTCTAGTGGTGTTTACTTGAGACCTTTTGTCATTTAATGTGTGCTGAATAAATGCCAGCACCCCTGAGTAGAAAGCAATCATGTACCTGCAGATGGTC * NH:i:2 HI:i:1 AS:i:62 NM:i:0 MD:Z:61
1 272 chr10 110358218 3 40M61S * 0 0 GACCATCTGCAGGTACATGATTGCTTTCTACTCAGGGGTGCTGGCATTTATTCAGCACACATTAAATGACAAAAGGTCTCAAGTAAACACCACTAGAAGGT * NH:i:2 HI:i:2 AS:i:43 NM:i:0 MD:Z:40
I believe this is the same as the BLAST alignment. This is a strange chimeric sequence, with two pieces mapping in the same locus on the opposite strands.

You can also allow the output of the longer segment into Aligned.out.sam file by reducing the max mapped score/length requirement, e.g. --outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0.5:
1 0 chr10 110358273 255 63M38S * 0 0 ACCTTCTAGTGGTGTTTACTTGAGACCTTTTGTCATTTAATGTGTGCTGAATAAATGCCAGCACCCCTGAGTAGAAAGCAATCATGTACCTGCAGATGGTC * NH:i:1 HI:i:1 AS:i:62 NM:i:0 MD:Z:63

The low mapping rate maybe caused by various factors. The Log.final.out file can give you some hints about mapped length, error rate, multi-mappers etc (if you post it I can have a look at it). You can try to reduce the --outFilterMatchNminOverLread value to check the whether only small portions of the reads can be mapped. The most typical reasons for low mappability are
(i) rRNA. Normally they appear multimappers, make sure that you include unplaced scaffolds in the genome, since one of them contains very highly expressed rRNA loci.
(ii) poor sequencing quality of the read ends (then reducing --outFilterMatchNminOverLread will help)
(iii) contamination

Hopefully, that strange chimeric sequence is not representative of the reads that cannot be mapped - if so, it would mean some strange library making artifact.

Cheers
Alex

Topics	Statistics	Last Post
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, Yesterday, 11:08 AM	0 responses 7 views 0 reactions	Last Post by SEQadmin2 Yesterday, 11:08 AM
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, 06-30-2026, 05:37 AM	0 responses 11 views 0 reactions	Last Post by SEQadmin2 06-30-2026, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 19 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 53 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM

Unconfigured Ad

Bowtie2 detecting human transcripts that STAR misses

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News