Seqanswers Leaderboard Ad

**Dario1984** · 12-12-2011, 08:00 PM

What does samtools view -c accepted_hits.sam tell you ? Counting lines is the wrong way to do it because there are other lines in the SAM file as well.

**biznatch** · 12-12-2011, 08:14 PM

I think how it works is as follows:

The bowtie.left_kept_reads.fixmap.log file indicates how many sequences were able to align when the entire sequence was used, in your case, 72bp. But since the reads are from RNA, a lot of them span introns and won't align. After this initial alignment, Tophat breaks the sequence down into smaller pieces (25bp by default) and tries to align these, which will let it align additional sequences.

If you open up one of the bowtie.left_kept_reads_seg#.fixmap.log files, you should see that the first line, "reads processed", is 2928767, ie. the reads that failed to align in the first attempt.

**Jon_Keats** · 12-14-2011, 02:56 PM

Run Samtools flagstat and Picard MappingStats. What you will notice is the difference in the number of mapped reads. Samtools will show more than Picard. This is because samtools tells you how many reads are aligned, while picard will tell you how many unique reads are aligned.

What you are seeing is an irritating result of the tophat alignments. If a read can map to multiple locations it will report each possible alignment. Therefore, if you input 10 reads and 7 have "at least one reported alignment" you can end up with 12 alignment events (ie. 120% mapping, but only 70% uniquely mapped reads)

**biznatch** · 12-14-2011, 06:49 PM

I got this command: samtools view accepted_hits.bam | cut -f 1 | sort | uniq | wc -l

from here: http://vallandingham.me/RNA_seq_diff...xpression.html

It's supposed to tell you how many unique lines and therefore how many unique reads there are in the bam file.

**jbrwn** · 12-21-2011, 05:20 PM

Originally posted by biznatch View Post

I got this command: samtools view accepted_hits.bam | cut -f 1 | sort | uniq | wc -l

from here: http://vallandingham.me/RNA_seq_diff...xpression.html

It's supposed to tell you how many unique lines and therefore how many unique reads there are in the bam file.

right, and for total reads counts the lines of your fastq and divide by 4.

**Dario1984** · 12-21-2011, 06:00 PM

That command is wrong. What you're extracting is the first column from the BAM file which is a read ID, like GAPC:2:86:13315:7719#0

Every read has it's own ID, so you're not getting unique reads but just all of the reads you had before.

**jbrwn** · 12-21-2011, 06:03 PM

Originally posted by Dario1984 View Post

That command is wrong. What you're extracting is the first column from the BAM file which is a read ID, like GAPC:2:86:13315:7719#0

Every read has it's own ID, so you're not getting unique reads but just all of the reads you had before.

you're right about each read having a unique name, but reads with the same id will sometimes map multiple times. thus, giving you multiple entries in the bam with the same read name.

**Dario1984** · 12-21-2011, 07:00 PM

Ah, good point. Thanks.

**nrockweiler** · 01-13-2012, 12:31 PM

reads with the same id will sometimes map multiple times. thus, giving you multiple entries in the bam with the same read name.

Correct, but a uniq on the list will give you a set of read IDs that mapped at least 1 time. This list will not include reads that did not align. This is because the BAM file from tophat (v1.3.2) only reports aligned reads.

However, the command does not work for paired-end data. This is because the read1/read2 flag is stripped off in the read ID in the BAM. For example, if the read ID is 'GAPC:2:86:13315:7719#0\1' for read 1 and 'GAPC:2:86:13315:7719#0\2' for read 2 in the fastq files, the read ID in the BAM file will be 'GAPC:2:86:13315:7719#0. Thus, a uniq on the read IDs will erroneously disregard the read1/read2 information.

**jbrwn** · 01-13-2012, 12:36 PM

Originally posted by nrockweiler View Post

However, the command does not work for paired-end data. This is because the read1/read2 flag is stripped off in the read ID in the BAM. For example, if the read ID is 'GAPC:2:86:13315:7719#0\1' for read 1 and 'GAPC:2:86:13315:7719#0\2' for read 2 in the fastq files, the read ID in the BAM file will be 'GAPC:2:86:13315:7719#0. Thus, a uniq on the read IDs will erroneously disregard the read1/read2 information.

it's a good point, but this thread was to answer the original poster's question about single end reads.

**Tuinhof** · 01-08-2013, 04:39 AM

Originally posted by linsson View Post

Hi all.

I have a question about tophat (v > 1.3.1). I am running tophat with 72 single end reads from RNA dataset (Illumina Genome Analyzer was used); I have both a gff3 file and a reference genome.

What was your command line for running single end reads?
I have a running paired end command line.

The only thing I can come up with is:

tophat -o path/to/file -p 6 --library-type fr-firststrand --b2-very-sensitive --no-coverage-search --GTF file.gtf genome_reference single_read.fastq

Thanks in advance!

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Today, 08:47 AM	0 responses 10 views 0 likes	Last Post by seqadmin Today, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 57 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

tophat: mapping percentages

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News