Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to get coverage from TopHat output?

    Hi
    the latest versions of TopHat do not generate the coverage.wig file. I used the file accepted_hits.bam with samsamtools and bedtools to generate a bedgraph.

    Code:
    samtools sort accepted_hits.bam accepted_hits.sorted
    genomeCoverageBed -bg -ibam accepted_hits.sorted.bam -g my.genome > accepted_hits.bedgraph
    But when I looked at it in UCSC GB, I see that some of the junctions did not have reads aligned to them.
    Any kind of help will be appreciated.
    Joseph

  • #2
    aligned reads

    Infect I am also looking for how we can actually align the reads from TopHat out put in IGV/ any other browser. Any suggestion will be highly appreciated

    Comment


    • #3
      The newest version of IGV can look at SAM files natively, or more precisely, at sorted BAM files.

      So, use samtools to convert the accepted_hits.sam file from TopHat to the binary BAM format, sort the BAM file by position and index it. Then, you can look at it with IGV, but only at high zoom levels. To get a coverage overview that you can see while being zoomed out, you need to create a tdf file from the BAM or SAM file using igvtools.

      See the IGV documentation, in particular:
      - http://www.broadinstitute.org/software/igv/SAM
      - http://www.broadinstitute.org/software/igv/igvtools

      Comment


      • #4
        plotting FPKM values

        OK. I want to plot FPKM reads from an RNA-seq experiment. I had run Cufflink. The purpose is just visualization of these values on a reference genome. I searched this forum and these two posts are quite important:
        Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

        Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

        and also from Google:

        I am still new to NGS. The confusion I have:
        1. Cufflink produces 3 values of FPKM- Conf lo(lower bound)Conf hi (upper bound) and FPKM values for each tracking ID (transcript). Which value should be taken out of three values of FPKM- average or Confhi.
        2. Secondly, FPKM values should be taken at transcript/ exon or gene level.
        3. How FPKM values will be calculated in case of overlapping exons.



        Thanks.

        Comment


        • #5
          Do you want to do some statistical analysis, or do you just want to look at your data for now?

          In my opinion, it is always a good idea to look at once data in its raw form, before doing an normalization or preprocessing. This is why I advised you to take your SAM files as they are and look at them with IGV.

          Once you calculate summary statistics, such as read count per gene, FPKM, or whatever, a genome browser is no longer the right tool, anyway.

          So, please clarify what kind of plot you envision, if you say you want to plot FPKM values. A scatter plot? A plot along chromosomes? A histogram?

          And for clarification of your questions, maybe read the cufflinks paper.
          Last edited by Simon Anders; 02-16-2011, 07:11 AM.

          Comment


          • #6
            Thanks Simon,

            I used the Bam file generated by Tophat to plot the raw reads in IGV. IGv can also provide me the coverage. I was thinking of plotting Cufflink out put FPKM values to visualize the differences in two samples. Perhaps histogram/ scatter plot.
            Thanks for your help. I am not sure if that is correct approach (I am still a newbie to sequencing), but I think it may be good idea to visualize the normalized FPKM values instead of what tables are reporting.

            Comment


            • #7
              Hi,

              I'm having trouble converting the sorted file to tdf. Whenever I run it in IGVtools, I get this error.

              Error: cannot convert files of type '.bam' to TDF format.
              Try specifying the file type with the --fileType parameter.

              The IGV tools website also states that it only supports these formats:
              .wig, .cn, .snp, .igv, .res, and .gct

              Is there a new way to do this now?

              Comment


              • #8
                So I figured out that you use Count to get it to become a tdf file.

                As a follow-up, for those doing gene expression analysis, is it worrying when the two conditions look very similar on IGV, but cuffdiff says they are significant?

                Both my coverage files, and my generated .tdf files look so alike between the two conditions....

                Comment


                • #9
                  it is worrying. i have seen that myself...i'm often puzzled as to why cuffdiff assigns the expressions it assigns. i'm probably puzzled over the 5% that's expected in their false discovery rate but still, it bothers me when the expression values and/or the differential expression results don't make sense when i review the coverages.
                  /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
                  Salk Institute for Biological Studies, La Jolla, CA, USA */

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  31 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  32 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  28 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  53 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X