Seqanswers Leaderboard Ad

**dpryan** · 08-14-2015, 12:03 PM

Keep in mind that tophat2 is mapping to the transcriptome first, so if these exons contain repeat regions or are similar to pseudo genes then that's the reason (i.e., tophat2 might only be reporting alignments there due to not looking yet at the whole genome).

**Brian Bushnell** · 08-14-2015, 12:23 PM

When you're mapping RNA-seq data to the genome with BBMap, you should not set the flag "minidentity=0.9" - that will eliminate most of the spliced alignments! And kfilter=85 will also require an 85bp exact match to the reference, which you will not get in virtually any read that spans an intron. I recommend eliminating those flags and adding 'maxindel=100k', like this:

bbmap.sh in=$file outu=$NEW_FILE.bbmap.unmapped.bam outm=$NEW_FILE.bbmap.bam bamscript=$NEW_FILE.bbmap.sh ambiguous=toss maxindel=100k bhist=bhist.$NEW_FILE.bbmap.txt 2>$NEW_FILE.bbmap_stat

**frymor** · 08-17-2015, 10:24 PM

Originally posted by Brian Bushnell View Post

When you're mapping RNA-seq data to the genome with BBMap, you should not set the flag "minidentity=0.9" - that will eliminate most of the spliced alignments! And kfilter=85 will also require an 85bp exact match to the reference, which you will not get in virtually any read that spans an intron. I recommend eliminating those flags and adding 'maxindel=100k'...

Hi Brian,
thanks for the remarks. I am trying now to run the suggested corrections. I was though wondering, how I can control the number of allowed mismatches in the mapping process or the quality of reads if using the kfilter and minidentity parameters is not recommended.

thanks,
Assa

**Brian Bushnell** · 08-18-2015, 12:09 AM

If you are using the latest version, the "subfilter=X" flag will be useful. It removes alignments with more than X mismatches. kfilter and minidentity are nice for DNA mapping, but not for RNA to genome.

**frymor** · 08-18-2015, 03:14 AM

yes I have the newest version. I'll give it a go with this parameter.

what parameter can I use to control the overall read quality?

and

Is there a way to collect splice junction/ as tophat is doing?

**alexdobin** · 08-18-2015, 08:18 AM

Originally posted by frymor View Post

When I took a closer look at the mapped bam files and compared tophat2 and STAR I have noticed the STAR has some problems with long exons. Even though it shows almost everywhere a higher mapping rate, in long exons it seems not to be able to map the reads correctly, while tophat2 can map there quite a lot of them.

What might be the reasons (if any), that STAR doesn't map these regions?

You are using --outFilterMultimapNmax 1, which only allows uniquely mapped reads to be output in the SAM/BAM. No alignments will be reported for multi-mapping reads.
My guess would be that these reads in the long exons are multi-mappers, and thus STAR does not output them.
Please try mapping with multi-mappers allowed and check these regions again.

As far as I can tell, with -g 1 option TopHat will report one random alignment out of all multi-mappers with the same highest score. Also, as @dpryan pointed out, TopHat maps to the transcriptome first, and may consider the alignments unique because of that.

Originally posted by frymor View Post

another difference I can see here is the behaviour around splice junctions, tophat can identify splice junction over a much longer region. Some of them we know to be true, but STAR can't see them. But this is another topic

This caused by --alignIntronMax 1, I replied in more detail at the link above.

In general, when making the first runs and comparisons, I would recommend running STAR with default parameters.

Cheers
Alex

**Brian Bushnell** · 08-18-2015, 02:17 PM

Originally posted by frymor View Post

what parameter can I use to control the overall read quality?

Well, you could use the "subfilter=X" flag to reject mapping sites with more than X substitutions, but normally for RNA-seq, I only set maxindel to an appropriate value for the organism's intron lengths, and leave everything else at default.

Is there a way to collect splice junction/ as tophat is doing?

BBMap does not do that, but you can find the introns by pileup with Samtools (they will be listed as deletions), and you could probably find the exon locations by using Bedtools to list the locations with coverage above a certain threshold, if you treat bases covered by deletion events as not covered. It's very obvious where the introns and exons are when you look at the bam file in IGV, but I've never actually tried to generate annotation from mapping; there's probably a better way than what I'm suggesting.

Topics	Statistics	Last Post
Evaluating Genome Sequencing for ECMO Patients in the NICU by seqadmin Started by seqadmin, 12-17-2024, 10:28 AM	0 responses 33 views 0 likes	Last Post by seqadmin 12-17-2024, 10:28 AM
New Genetic Toolkit Refines Studies on Gene Function and Disease by seqadmin Started by seqadmin, 12-13-2024, 08:24 AM	0 responses 48 views 0 likes	Last Post by seqadmin 12-13-2024, 08:24 AM
Study Links Brain Mechanism to Emotional Responses in Animals and Humans by seqadmin Started by seqadmin, 12-12-2024, 07:41 AM	0 responses 34 views 0 likes	Last Post by seqadmin 12-12-2024, 07:41 AM
Study Identifies Ribosomal RNA Fingerprints as Early Cancer Biomarkers by seqadmin Started by seqadmin, 12-11-2024, 07:45 AM	0 responses 46 views 0 likes	Last Post by seqadmin 12-11-2024, 07:45 AM

Seqanswers Leaderboard Ad

Announcement

STAR vs tophat2 - mapping reads to long exons

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News