Seqanswers Leaderboard Ad

**honey** · 02-14-2011, 11:06 AM

aligned reads

Infect I am also looking for how we can actually align the reads from TopHat out put in IGV/ any other browser. Any suggestion will be highly appreciated

**Simon Anders** · 02-15-2011, 06:49 AM

The newest version of IGV can look at SAM files natively, or more precisely, at sorted BAM files.

So, use samtools to convert the accepted_hits.sam file from TopHat to the binary BAM format, sort the BAM file by position and index it. Then, you can look at it with IGV, but only at high zoom levels. To get a coverage overview that you can see while being zoomed out, you need to create a tdf file from the BAM or SAM file using igvtools.

See the IGV documentation, in particular:
- http://www.broadinstitute.org/software/igv/SAM
- http://www.broadinstitute.org/software/igv/igvtools

**honey** · 02-16-2011, 07:01 AM

plotting FPKM values

OK. I want to plot FPKM reads from an RNA-seq experiment. I had run Cufflink. The purpose is just visualization of these values on a reference genome. I searched this forum and these two posts are quite important:

FPKM confidence interval - cufflinks - SEQanswers

http://seqanswers.com/forums/showthread.php?t=4300

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

Cufflinks transcripts.expr file - SEQanswers

http://seqanswers.com/forums/showthread.php?t=3961&highlight=cufflinks+read+count

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

and also from Google:

RACKJ - An Example for RNAseq Analysis

http://rackj.sourceforge.net/RNAseqExample/index.html

I am still new to NGS. The confusion I have:
1. Cufflink produces 3 values of FPKM- Conf lo(lower bound)Conf hi (upper bound) and FPKM values for each tracking ID (transcript). Which value should be taken out of three values of FPKM- average or Confhi.
2. Secondly, FPKM values should be taken at transcript/ exon or gene level.
3. How FPKM values will be calculated in case of overlapping exons.

Thanks.

**Simon Anders** · 02-16-2011, 07:08 AM

Do you want to do some statistical analysis, or do you just want to look at your data for now?

In my opinion, it is always a good idea to look at once data in its raw form, before doing an normalization or preprocessing. This is why I advised you to take your SAM files as they are and look at them with IGV.

Once you calculate summary statistics, such as read count per gene, FPKM, or whatever, a genome browser is no longer the right tool, anyway.

So, please clarify what kind of plot you envision, if you say you want to plot FPKM values. A scatter plot? A plot along chromosomes? A histogram?

And for clarification of your questions, maybe read the cufflinks paper.

**honey** · 02-16-2011, 10:50 AM

Thanks Simon,

I used the Bam file generated by Tophat to plot the raw reads in IGV. IGv can also provide me the coverage. I was thinking of plotting Cufflink out put FPKM values to visualize the differences in two samples. Perhaps histogram/ scatter plot.
Thanks for your help. I am not sure if that is correct approach (I am still a newbie to sequencing), but I think it may be good idea to visualize the normalized FPKM values instead of what tables are reporting.

**billstevens** · 04-23-2012, 09:37 AM

Hi,

I'm having trouble converting the sorted file to tdf. Whenever I run it in IGVtools, I get this error.

Error: cannot convert files of type '.bam' to TDF format.
Try specifying the file type with the --fileType parameter.

The IGV tools website also states that it only supports these formats:
.wig, .cn, .snp, .igv, .res, and .gct

Is there a new way to do this now?

**billstevens** · 04-23-2012, 01:35 PM

So I figured out that you use Count to get it to become a tdf file.

As a follow-up, for those doing gene expression analysis, is it worrying when the two conditions look very similar on IGV, but cuffdiff says they are significant?

Both my coverage files, and my generated .tdf files look so alike between the two conditions....

**sdriscoll** · 04-23-2012, 04:44 PM

it is worrying. i have seen that myself...i'm often puzzled as to why cuffdiff assigns the expressions it assigns. i'm probably puzzled over the 5% that's expected in their false discovery rate but still, it bothers me when the expression values and/or the differential expression results don't make sense when i review the coverages.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 31 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

how to get coverage from TopHat output?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News