Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to identify number of fragments are produced for given gene in RNA-seq?

    Hi to all

    I have RNA -seq data, for calculation of FPKM value manually, we should need to know no of fragments were generated during RNA-seq for given gene am I right? Please any one tell me where to look to get the no of fragments produced for a given gene or any feature liks cds etc.,

    I followed one procedure Please let me know if my way of knowing read is wrong?

    first i open bam files into IGV , then I calculate the no of fragments for a given gene. Whether I did a right? and also please tell if above procedure is right for pair end sequencing whether i have to count both left and right fragments separately or combine into 1.

  • #2
    Hi Muthukumar,

    In principle, you are rigth. A fragment is given by the read/read-pair. Unfortunately, each read/read-pair can map to several positions on your annotation and cause a bit of ambiguity.
    Therefore, there are many ways to count the reads/read-pairs for a certain gene/transcript or feature. You may start with the Tuxedo-Suite pipeline http://www.ncbi.nlm.nih.gov/pubmed/22383036. Other methods are Salmon, featureCount, RSem, and many many more.

    Cheers,

    Michael

    Comment


    • #3
      @Muthukumar: You don't want to do this by hand. There are software packages featureCounts and htseq-count that do this for one (or more) aligned BAM files. Both packages require a genomic feature definition file (GFF/GTF). If you are using a model organism then they are easy to find. Make sure you use one that matches the genome build used for your alignments.

      Comment


      • #4
        thanking you for answering the question. I am already following the nature protocol which u were mentioned. when i ran a command for cuffdiff and cufflinks , I got one column as FPKM , I want to do check my manually calculated FPKM and cuffdiff generated FPKM are same.Unfortunately I am not getting the exactly same answer.

        Here is the procedure that I was followed for calculation of FPKM.

        1. I counted the reads using IGV for specific gene.
        (here I want to clarify one doubt I am using pairwise end seq data, some of the reads were found both on left and right i mean overlapping reads for some genes whether I have to calculate as 2 reads or 1 read

        for instance:

        --------------> (read 1) <-------------- (read L)
        ------------------->(read R)
        ______________Exon1_______|___________________________|____Exon2___________

        for exon2 whether should I have 2 fragments combine into 1 or into separately. Pls tell calrify me. For my manual calculation I calculated as 2 fragments .

        2. I calculated using following formula

        # of fragments
        FPKM = ___________________________ * 10^9
        length of gene. Total no of reads

        Whether above formula is r8?

        One more doubt => my gene of interest contains 17 exons , All 17 exons are not having read fragments and some of the fragments for a exon is small and some of the fragments are lengthy. So whether I have the count the small reads also pls rectify me?

        Comment


        • #5
          Actually, the FPK-values are different, since Cuffdiff performs some extra heuristics.
          As GenoMax posted, you should assess the read counts not manually, but use an accepted tool.
          The FPKM is usually computed on transcript level and taking its exons' length as part of the denominator.
          The read-length is something which should be controlled for in the alignment. Therefore, if your aligner reported a read/read-pair to map there, I would take it as a valid read/read-pair. In case you are doubting it, you might re-align your data with length-filtered reads (e.g. using bbduk.sh from bbmap).

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Today, 08:47 AM
          0 responses
          12 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          60 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          59 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          54 views
          0 likes
          Last Post seqadmin  
          Working...
          X