Seqanswers Leaderboard Ad

**lintianfeng** · 12-02-2011, 08:39 AM

I have the same question, does anyone have any clue? Thanks

**biznatch** · 12-02-2011, 09:28 AM

See the SAM spec for flags, http://samtools.sourceforge.net/, link at the top right. You can also use this to explain them http://picard.sourceforge.net/explain-flags.html.

You don't have to use "--library-type fr-unstranded" that is the default setting (I know it doesn't say that in the manual but one of the Tophat devs posted that on here somewhere).

I'm not sure about the -g stuff.

**polyatail** · 12-02-2011, 10:57 AM

Basically, is -g 1 applying to the PAIR or each member individually?

From what I can tell, -g is passed by TopHat directly to bowtie as the -k (maximum number of alignments reported) and -m (suppress all alignments if >n are found) parameters. As each mate is aligned separately, -g 1 is applied to each member individually. Information from one mate is not used to direct the other mate to the correct location.

TopHat splits each read into a number of segments (set by --segment-length) that are aligned independently. Interestingly, it appears the values of -k and -m passed to bowtie in these alignments are twice that specified by -g. Anyone know why that is?

I wish to have the power to resolve expression differences between genes with similar sequences, and I only want to quantify reads as originating from a gene if that read can ultimately only be attributed to one gene.

My sense is that you're going to bias expression quantification with this approach. Just because one copy of the gene has more unique sequences than a few of its paralogs doesn't necessarily mean it is more highly expressed. Cufflinks is pretty good at estimating expression in situations like this--have your -g 20 (default) runs produced bad results? If there are only a handful of paralogs, you might try identifying unique regions from a multiple alignment and going from there.

And then got my BAM->SAM, and that upped the number of lines to ~14000, out of a possible 20000 (i.e. 10000 from both parts of the mate pairs), ~75% mapping. That's acceptable.

Each line can contain only one mapping, and reads aligning to multiple locations, or multiple ways a single read can align to the same position (i.e. they have different CIGAR strings) will appear on multiple lines. Can you confirm that the number of lines accurately represents the number of aligned reads?

Perhaps compare the results to:

Code:

samtools view accepted_hits.bam | awk '{printf $1"\n"}' | sort -u | wc -l

Hope this helps!

Andrew

**lintianfeng** · 12-03-2011, 12:07 PM

Hello, I'm still not very clear about the -g stuff for the paired end reads. For example, if one mate has only one match to the reference genome, but the other mating read have several matches, will tophat report the paired-end read when setting -g=1?

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Questions about Top-Hat and Paired End Reads

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News