SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
TopHat 1.3.1 coverage islands Irina Pulyakhina Bioinformatics 0 08-15-2011 02:24 AM
Tophat and deep coverage question mathew Bioinformatics 2 05-09-2011 08:34 AM
MIRA output for Illumina mapping giving 100% coverage! Kasycas Bioinformatics 1 09-10-2010 03:45 AM
Bowtie/TopHat hitting maximum coverage Ender985 Bioinformatics 0 03-17-2010 10:13 AM
TopHat coverage.wig file wish list Pepe Bioinformatics 1 01-12-2010 11:56 PM

Reply
 
Thread Tools
Old 02-12-2011, 06:46 PM   #1
joseph
Member
 
Location: ca

Join Date: Feb 2008
Posts: 39
Default how to get coverage from TopHat output?

Hi
the latest versions of TopHat do not generate the coverage.wig file. I used the file accepted_hits.bam with samsamtools and bedtools to generate a bedgraph.

Code:
samtools sort accepted_hits.bam accepted_hits.sorted
genomeCoverageBed -bg -ibam accepted_hits.sorted.bam -g my.genome > accepted_hits.bedgraph
But when I looked at it in UCSC GB, I see that some of the junctions did not have reads aligned to them.
Any kind of help will be appreciated.
Joseph
joseph is offline   Reply With Quote
Old 02-14-2011, 10:06 AM   #2
honey
Senior Member
 
Location: Pittsburgh

Join Date: Feb 2010
Posts: 151
Default aligned reads

Infect I am also looking for how we can actually align the reads from TopHat out put in IGV/ any other browser. Any suggestion will be highly appreciated
honey is offline   Reply With Quote
Old 02-15-2011, 05:49 AM   #3
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

The newest version of IGV can look at SAM files natively, or more precisely, at sorted BAM files.

So, use samtools to convert the accepted_hits.sam file from TopHat to the binary BAM format, sort the BAM file by position and index it. Then, you can look at it with IGV, but only at high zoom levels. To get a coverage overview that you can see while being zoomed out, you need to create a tdf file from the BAM or SAM file using igvtools.

See the IGV documentation, in particular:
- http://www.broadinstitute.org/software/igv/SAM
- http://www.broadinstitute.org/software/igv/igvtools
Simon Anders is offline   Reply With Quote
Old 02-16-2011, 06:01 AM   #4
honey
Senior Member
 
Location: Pittsburgh

Join Date: Feb 2010
Posts: 151
Default plotting FPKM values

OK. I want to plot FPKM reads from an RNA-seq experiment. I had run Cufflink. The purpose is just visualization of these values on a reference genome. I searched this forum and these two posts are quite important:
http://seqanswers.com/forums/showthread.php?t=4300
http://seqanswers.com/forums/showthr...nks+read+count
and also from Google:
http://rackj.sourceforge.net/RNAseqExample/index.html
I am still new to NGS. The confusion I have:
1. Cufflink produces 3 values of FPKM- Conf lo(lower bound)Conf hi (upper bound) and FPKM values for each tracking ID (transcript). Which value should be taken out of three values of FPKM- average or Confhi.
2. Secondly, FPKM values should be taken at transcript/ exon or gene level.
3. How FPKM values will be calculated in case of overlapping exons.



Thanks.
honey is offline   Reply With Quote
Old 02-16-2011, 06:08 AM   #5
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Do you want to do some statistical analysis, or do you just want to look at your data for now?

In my opinion, it is always a good idea to look at once data in its raw form, before doing an normalization or preprocessing. This is why I advised you to take your SAM files as they are and look at them with IGV.

Once you calculate summary statistics, such as read count per gene, FPKM, or whatever, a genome browser is no longer the right tool, anyway.

So, please clarify what kind of plot you envision, if you say you want to plot FPKM values. A scatter plot? A plot along chromosomes? A histogram?

And for clarification of your questions, maybe read the cufflinks paper.

Last edited by Simon Anders; 02-16-2011 at 06:11 AM.
Simon Anders is offline   Reply With Quote
Old 02-16-2011, 09:50 AM   #6
honey
Senior Member
 
Location: Pittsburgh

Join Date: Feb 2010
Posts: 151
Default

Thanks Simon,

I used the Bam file generated by Tophat to plot the raw reads in IGV. IGv can also provide me the coverage. I was thinking of plotting Cufflink out put FPKM values to visualize the differences in two samples. Perhaps histogram/ scatter plot.
Thanks for your help. I am not sure if that is correct approach (I am still a newbie to sequencing), but I think it may be good idea to visualize the normalized FPKM values instead of what tables are reporting.
honey is offline   Reply With Quote
Old 04-23-2012, 09:37 AM   #7
billstevens
Senior Member
 
Location: Baltimore

Join Date: Mar 2012
Posts: 120
Default

Hi,

I'm having trouble converting the sorted file to tdf. Whenever I run it in IGVtools, I get this error.

Error: cannot convert files of type '.bam' to TDF format.
Try specifying the file type with the --fileType parameter.

The IGV tools website also states that it only supports these formats:
.wig, .cn, .snp, .igv, .res, and .gct

Is there a new way to do this now?
billstevens is offline   Reply With Quote
Old 04-23-2012, 01:35 PM   #8
billstevens
Senior Member
 
Location: Baltimore

Join Date: Mar 2012
Posts: 120
Default

So I figured out that you use Count to get it to become a tdf file.

As a follow-up, for those doing gene expression analysis, is it worrying when the two conditions look very similar on IGV, but cuffdiff says they are significant?

Both my coverage files, and my generated .tdf files look so alike between the two conditions....
billstevens is offline   Reply With Quote
Old 04-23-2012, 04:44 PM   #9
sdriscoll
I like code
 
Location: San Diego, CA, USA

Join Date: Sep 2009
Posts: 438
Default

it is worrying. i have seen that myself...i'm often puzzled as to why cuffdiff assigns the expressions it assigns. i'm probably puzzled over the 5% that's expected in their false discovery rate but still, it bothers me when the expression values and/or the differential expression results don't make sense when i review the coverages.
sdriscoll is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:57 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO