Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • transcriptome analysis using tophat and cufflinks,cuffcompare,

    I have just started exploring the Transcriptomics through NGS. I went through the paper Whole Transcriptome Sequencing Reveals Gene Expression and Splicing Differences in Brain Regions Affected by Alzheimer's disease
    I tried to replicate the results for temporal lobe obtained in paper using tophat v1.2.0 and cufflinks v0.9.3.

    Normal temporal lobe sample SRR085471.fastq
    Alzhmier Disease (AD) temporal lobe sample SRR085473.fastq

    I followed these steps:

    1. Filtered reads using fastx toolkit with Qvalue =25 and 65% of bases satisfying Q value
    2. Edited reference .gtf file ; removed all noncoding enteries in ref gtf file.

    3. tophat -o tophat_filter_out --segment-mismatches 0 --segment-length 18 -p 4 ../hg19 SRR085471.fastq (normal temporal lobe sample).

    4. cufflinks -p 4 -G ../../../human_gtf/human_protein_coding.gtf -r ../../../hg19.fa ../accepted_hits_normal.bam (normal temporal lobe FPKM calculation)

    5. tophat -o tophat_filter_out --segment-mismatches 0 --segment-length 18 -p 4 ../hg19 SRR085473.fastq (Alzheimer temporal lobe sample).

    6. cufflinks -p 4 -G ../../../human_gtf/human_protein_coding.gtf -r ../../../hg19.fa ../accepted_hits_ad.bam (Alzheimer temporal lobe FPKM calculation)

    7. cuffcompare -o normal_ad_comapare -r ../human_gtf/human_protein_coding.gtf -R -s ../../human_chr/ transcripts_normal.gtf transcripts_ad.gtf

    8. cuffdiff -o normal_ad_filt_cuffdiff -p 4 -N -r ../../hg19.fa ../normal_ad_comapare.combined.gtf ../accepted_hits_norm_filt.bam ../accepted_hits_ad.bam

    To my disappointment I am not able to replicate the FPKM values for top 10 'up an down regulated genes' for the temporal lobe sample.
    my qyery is :

    1. Is the protocol i followed is correct?

    2. Do I use -N (Quartile normalization) for assembly through cufflinks? Does it affect final outcome?

    3. I am getting very high FPKM values. What is the range within which FPKM values should fall?

    4. While looking at differential gene expression do one look at the FPKM values obtained in gene.expr through cufflinks for both samples and then calculate the fold? OR look at the output gene_exp.diff obtained through cuffdiff ?

    Apologies for the lengthy post..!

  • #2
    Originally posted by harshinamdar View Post
    I have just started exploring the Transcriptomics through NGS. I went through the paper Whole Transcriptome Sequencing Reveals Gene Expression and Splicing Differences in Brain Regions Affected by Alzheimer's disease
    I tried to replicate the results for temporal lobe obtained in paper using tophat v1.2.0 and cufflinks v0.9.3.

    Normal temporal lobe sample SRR085471.fastq
    Alzhmier Disease (AD) temporal lobe sample SRR085473.fastq

    I followed these steps:

    1. Filtered reads using fastx toolkit with Qvalue =25 and 65% of bases satisfying Q value
    2. Edited reference .gtf file ; removed all noncoding enteries in ref gtf file.

    3. tophat -o tophat_filter_out --segment-mismatches 0 --segment-length 18 -p 4 ../hg19 SRR085471.fastq (normal temporal lobe sample).

    4. cufflinks -p 4 -G ../../../human_gtf/human_protein_coding.gtf -r ../../../hg19.fa ../accepted_hits_normal.bam (normal temporal lobe FPKM calculation)

    5. tophat -o tophat_filter_out --segment-mismatches 0 --segment-length 18 -p 4 ../hg19 SRR085473.fastq (Alzheimer temporal lobe sample).

    6. cufflinks -p 4 -G ../../../human_gtf/human_protein_coding.gtf -r ../../../hg19.fa ../accepted_hits_ad.bam (Alzheimer temporal lobe FPKM calculation)

    7. cuffcompare -o normal_ad_comapare -r ../human_gtf/human_protein_coding.gtf -R -s ../../human_chr/ transcripts_normal.gtf transcripts_ad.gtf

    8. cuffdiff -o normal_ad_filt_cuffdiff -p 4 -N -r ../../hg19.fa ../normal_ad_comapare.combined.gtf ../accepted_hits_norm_filt.bam ../accepted_hits_ad.bam

    To my disappointment I am not able to replicate the FPKM values for top 10 'up an down regulated genes' for the temporal lobe sample.
    my qyery is :

    1. Is the protocol i followed is correct?

    2. Do I use -N (Quartile normalization) for assembly through cufflinks? Does it affect final outcome?

    3. I am getting very high FPKM values. What is the range within which FPKM values should fall?

    4. While looking at differential gene expression do one look at the FPKM values obtained in gene.expr through cufflinks for both samples and then calculate the fold? OR look at the output gene_exp.diff obtained through cuffdiff ?

    Apologies for the lengthy post..!
    I assume they are using old versions of these softwares....check if there are big changes in cufflinks/cuffdiff/tophat

    Comment


    • #3
      Cufflink and Cuffdiff

      For sure they have used the older version but with new improvements in these tools it has just improved efficency, automation but will not flip flop results. I would email to the authors they are also in genomic cores and are in bioinformatics to get the sense what they feel.
      I must say this is one of those few selected papers whcih has published RNA-seq using Cufflink

      Comment


      • #4
        For cufflinks if I use the -N/--quartile normalization, the FPKM values for a gene are very high and if I don't use -N option the values come down drastically.

        example:

        Gene FPKM
        APOE 4085.26 With -N
        APOE 743.98 Without -N

        Using -N option improves robustness of differential expression calls for less abundant genes but should the difference be so high? I am kind of inclined more towards accepting the values obtained without using the -N option.

        Comments are appreciated!!.

        Comment


        • #5
          Hi harshinamdar,

          Originally posted by harshinamdar View Post
          For cufflinks if I use the -N/--quartile normalization, the FPKM values for a gene are very high and if I don't use -N option the values come down drastically.
          I've noticed the same. I think it is because normalization ignores the top 25% most expressed genes, therefore the raw expression of each transcript is divided by a much smaller total count compared to non-normalized transcripts.

          Using -N option improves robustness of differential expression calls for less abundant genes but should the difference be so high? I am kind of inclined more towards accepting the values obtained without using the -N option.
          In my datasets normalize and non-normalized FPKMs are almost perfectly correlated, so I'm also tempted not to use -N.

          All the best
          Dario

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          18 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          22 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          17 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          49 views
          0 likes
          Last Post seqadmin  
          Working...
          X