Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cufflinks reporting differnt FPKMS for the same gene

    Hello,

    I am analyzing some bacterial RNA-seq data with cufflinks. Since in bacterial RNA-seq, splicing isn't an issue, I am using a mapping program that does not take splicing into account. I just wanted to get FPKM's for my genes and do some differential expression even though I am aware that cufflinks was created for euk RNA-seq. In the FAQ's, it states that cufflinks will work with bacterial RNA-seq given that I map with a fasta file of already annotated genes. I know cufflinks assembles transcripts, but when I feed it my sam file (generated from mapping program perM by mapping reads to a mulitfasta file of genes in the genome), cufflinks returns multiple locations of one gene in separate rows with all of their own FPKM's. I wanted just an FPKM for each gene. Does anyone have any way to resolve this issue? Cheers.

  • #2
    Could it be multiple isoforms of the same gene?

    Comment


    • #3
      Hey Nicolas,

      thats what I was thinking it might be. However, I looked at the read pileups with IGV, and cufflinks is just assembling different transcripts de novo of the same gene based on clustering reads. So, for example, looking at one gene that I am mapping reads too, instead of calculating the FPKM for all reads hitting that gene, cufflinks is splitting the gene up into thirds based on where reads are piling up and calculating three different FPKMs for each region of the gene and then reporting it as different "genes" (transcripts). So, rather than different isoforms, it looks like it is just splitting up genes based on where reads fall. I am also using a mapping program unaware of splicing. I am trying my luck with a few other programs to compare.

      Comment


      • #4
        Could you post the command you're using?
        Which Cufflinks mode are you using, de novo (default), with a reference annotation (-G) or RABT (-g)?
        Is there a complete coverage of your gene? If not (and if you're using de novo mode), then Cufflinks has no information supporting the fact that the 3 regions are actually one single gene...

        Please provide more info.

        Comment


        • #5
          Hello Nicolas, my command is below:

          cufflinks -N -u seq1_380-380_r1_out.sorted.sam

          It is in default mode i think. I aligned my reads to a multifasta file with annotated genes in hopes that it would be sufficient for cufflinks to assign reads to only these genes but I was wrong, and cufflinks assembled transcripts because no GTF was supplied. I was trying to search for anything on how to obtain or generate a reference GTF file for my bacterium, but I cannot seem to find it. Surely, that would probably fix my problem. do you know how I might generate one with an annotated reference genome in fasta format. Thank you for your inquiries! I am still quite new to this

          Comment


          • #6
            Does your multifasta file contains one entry per gene?
            If so, it should be easy to count the number of reads mapping to each entry (samtools idxstats <aln.bam> for instance). You can then normalize by exon size and library depth to achieve something similar to FPKM.
            I don't think Cufflinks could do what you want, but I am also not sure you really need it!

            Comment


            • #7
              Thank you for the replies Nicolas. Much appreciated. good luck with everything

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 08:47 AM
              0 responses
              16 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              60 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              60 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              54 views
              0 likes
              Last Post seqadmin  
              Working...
              X