Seqanswers Leaderboard Ad

**hrajasim** · 02-20-2013, 04:38 PM

Originally posted by washy View Post

We have questions regarding TopHat "-M/--prefilter-multihits" option and Unmapped.bam file.

First, adapter sequences and low quality bases were removed from FASTQ file, all of reads should be mapped on Mouse genome sequences.
The versioin of TopHat2 and Bowtie2 used in this test were v.2.0.3 and v.2.0.0-beta6 respectively.
We run TopHat with following options.

# tophat2 -o $output_dir -G $annotation_gtf -p 2 $bowtie2_index $fastq1 $fastq2

From this result, only 60% of total reads were without "secondary aligment" flag in BAM file, which means that 60% of reads were correctly mapped to Mouse genome.

On the other hand, many reads, 30% of total reads were saved into Unmapped.bam.
Since we don't know why so many reads were saved in Unmapped.bam, we investigated what kind of reads were in Unmapped.bam.
We found that many kind of Mouse repetitive sequences, such as Transposable element, Ribosomal proteins and Ribosomal rRNAs were in Unmapped.bam.

From the result, we have three questions regarding "-M" option and Unmapped.bam file.

---------------------------------------
Q1. Was -M option automatically enabled when -G option was used?

The TopHat Manual told that "-M/--prefilter-multihits" option must be used with -G/--GTF option as follows.
------------
(The following options in this section are only used when the transcriptome search was activated with -G/--GTF and/or --transcriptome-index)

404 Not Found

http://tophat.cbcb.umd.edu/manual.html

------------

But we didn't use -M option but -G option.
So, Repetitive sequences (ie, multihit reads) were saved into Unmapped.bam file without -M option.
"-M" option was automatically enabled when -G option was used?

---------------------------------------
Q2. The filtered reads were dumped into Unmapped.bam file if -M option was used?

We are wondering why many reads were dumped into Unmapped.bam file.
Multihit reads were saved into Unmapped.bam file if "-M" option is used?

------------
Q3. How to distinguish between multihit reads and unmapped reads in Unmapped.bam file?

What kind of reads were dumped in Unmapped.bam file?
If the sample was contaminated with Bacteria, these Bacterial unmapped reads will be saved in same Unmapped.bam file?
If so, how to distinguish between multihit reads and Bacterial unmapped reads in Unmapped.bam file?
------------

Thank you for your coperation.

This clarification will help many in this forum. Awaiting a response myself.
Thanks,

**swaraj** · 05-24-2013, 06:20 AM

1. I have noticed that disabling the -M option allows for more reads to map especially in repeat rich regions. I do not think -M is automatically enabled with -G, at least in tophat 2.0.8b which I am using.

2. The counting of unmapped reads can be accomplished by a combination of samtools and unix. The following command should work

samtools view file.bam|cut -f 1,3,10|sort|uniq -d|sort -nr > multi.out

3. You can get the meaning of flags in the output (column 2) file from http://picard.sourceforge.net/explain-flags.html

----------******-----------

Hope it helps.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

TopHat "-M" option and Unmapped.bam file

Comment

Comment

Latest Articles

ad_right_rmr

News