Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Selecting most abundant transcript per gene (Trinity, RSEM)

    Is anyone aware of a program or simple script to select the most abundant transcript (for ex., based on FPKM values) for each gene, from a Trinity assembled transcriptome that has been run through RSEM? I have the RSEM output file RSEM.isoform.results that looks like this:

    transcript_id gene_id length effective_length expected_count TPM FPKM IsoPct

    comp1000093_c0_seq1 comp1000093_c0 257 180.57 2.00 0.33 0.31 100.00
    comp1000100_c0_seq1 comp1000100_c0 308 231.21 4.00 0.51 0.49 100.00
    comp1000106_c0_seq1 comp1000106_c0 279 202.37 2.00 0.29 0.28 100.00
    comp135533_c0_seq1 comp135533_c0 233 156.94 0.00 0.00 0.00 0.00
    comp135533_c0_seq2 comp135533_c0 288 211.31 4.00 0.56 0.54 48.65
    comp135533_c0_seq3 comp135533_c0 235 158.90 0.00 0.00 0.00 0.00
    comp135533_c0_seq4 comp135533_c0 426 349.02 7.00 0.60 0.57 51.35

    As well as a fasta file with all the transcripts.

    So I would want to end of with a fasta file with only a single transcript_id per gene_id.

  • #2
    Hello!gevieir! You can chose the RSEM.gene.results.

    Comment


    • #3
      Hi gevielr..
      I just wondering how to run RSEM correctly, sorry if I am not helping your problem as I am still newbie here.

      I tried doing RSEM calculation of my transcripts, but somehow it did not work ( I ran it using RSEM/1.12.15).

      Firstly I prepared the reference:

      -bash-4.1$ rsem-prepare-reference trinity_out_dir.Trinity.fasta refAB
      rsem-synthesis-reference-transcripts refAB 0 0 trinity_out_dir.Trinity.fasta
      Transcript Information File is generated!
      Group File is generated!
      Chromosome List File is generated!
      Extracted Sequences File is generated!

      rsem-preref refAB.transcripts.fa 0 refAB -l 125
      Refs.makeRefs finished!
      Refs.saveRefs finished!
      refAB.idx.fa is generated!
      refAB.n2g.idx.fa is generated!

      I have several output files from this process (refAB.n2g.idx.fa; refAB.seq; refAB.ti; refAB.transcripts.fa and a few more), sadly I have no idea which one should I use for the reference when I have to run this command:

      rsem-calculate-expression [options] --paired-end upstream_read_file(s) downstream_read_file(s) reference_name sample_name

      and then I just run the RSEm-calculate-expression as follows with :
      -bash-4.1$ rsem-calculate-expression --paired-end A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta refAB.transcripts.fa AB

      and here was the output that I got :

      bowtie -q --phred33-quals -n 2 -e 99999999 -l 25 -I 1 -X 1000 -p 1 -a -m 200 -S refAB.transcripts.fa -1 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq -2 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta | samtools view -S -b -o AB.temp/AB.bam -
      Warning: Same mate file "A_1_C4HUHACXX_AGTCAA_L003_R1.fastq" appears as argument to both -1 and -2
      Could not locate a Bowtie index corresponding to basename "refAB.transcripts.fa"
      Command: bowtie -q --phred33-quals -n 2 -e 99999999 -l 25 -I 1 -X 1000 -p 1 -a -m 200 -S -1 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq -2 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta refAB.transcripts.fa
      [samopen] no @SQ lines in the header.
      [sam_read1] missing header? Abort!
      "bowtie -q --phred33-quals -n 2 -e 99999999 -l 25 -I 1 -X 1000 -p 1 -a -m 200 -S refAB.transcripts.fa -1 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq -2 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq-bas-bas-bas-bash-4.1$ rsem-calculate-expression --paired-end A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fa-bash-4.1$ rsem-calculate-expression --paired-end A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fast-bash-4.1$ rsem-calculate-expression --paired-end A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fa-bash-4.1$ rsem-calculate-expression --paired-end A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta refAB.transcripts.fa AB
      bowtie -q --phred33-quals -n 2 -e 99999999 -l 25 -I 1 -X 1000 -p 1 -a -m 200 -S refAB.transcripts.fa -1 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq -2 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta | samtools view -S -b -o AB.temp/AB.bam -
      Warning: Same mate file "A_1_C4HUHACXX_AGTCAA_L003_R1.fastq" appears as argument to both -1 and -2
      Could not locate a Bowtie index corresponding to basename "refAB.transcripts.fa"
      Command: bowtie -q --phred33-quals -n 2 -e 99999999 -l 25 -I 1 -X 1000 -p 1 -a -m 200 -S -1 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq -2 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta refAB.transcripts.fa
      [samopen] no @SQ lines in the header.
      [sam_read1] missing header? Abort!
      "bowtie -q --phred33-quals -n 2 -e 99999999 -l 25 -I 1 -X 1000 -p 1 -a -m 200 -S refAB.transcripts.fa -1 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq -2 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta | samtools view -S -b -o AB.temp/AB.bam -" failed! Plase check if you provide correct parameters/options for the pipeline!


      Any suggestion what went wrong based on your experience running RSEM?

      Thanks
      Didi

      Comment


      • #4
        This line looks like your answer:

        Warning: Same mate file "A_1_C4HUHACXX_AGTCAA_L003_R1.fastq" appears as argument to both -1 and -2

        You have put A_1_C4HUHACXX_AGTCAA_L003_R1.fastq instead of A_1_C4HUHACXX_AGTCAA_L003_R2.fastq as the reverse read

        Comment


        • #5
          Thanks kopi-o

          I have corrected that line but still could not get through it.. here is my log file:
          -bash-4.1$ rsem-calculate-expression --paired-end A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq A_1_C4HUHACXX_AGTCAA_L003_R2.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta refAB.transcripts.fa AB
          bowtie -q --phred33-quals -n 2 -e 99999999 -l 25 -I 1 -X 1000 -p 1 -a -m 200 -S refAB.transcripts.fa -1 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq -2 A_1_C4HUHACXX_AGTCAA_L003_R2.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta | samtools view -S -b -o AB.temp/AB.bam -
          Could not locate a Bowtie index corresponding to basename "refAB.transcripts.fa"
          Command: bowtie -q --phred33-quals -n 2 -e 99999999 -l 25 -I 1 -X 1000 -p 1 -a -m 200 -S -1 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq -2 A_1_C4HUHACXX_AGTCAA_L003_R2.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta refAB.transcripts.fa
          [samopen] no @SQ lines in the header.
          [sam_read1] missing header? Abort!
          "bowtie -q --phred33-quals -n 2 -e 99999999 -l 25 -I 1 -X 1000 -p 1 -a -m 200 -S refAB.transcripts.fa -1 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq -2 A_1_C4HUHACXX_AGTCAA_L003_R2.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta | samtools view -S -b -o AB.temp/AB.bam -" failed! Plase check if you provide correct parameters/options for the pipeline!
          -bash-4.1$


          Any suggestion? Did I provide correct information for the RSEM? or something wrong with my RSEM installation?

          Thanks
          Didi

          Comment


          • #6
            There is no software available to allows you to do that. I read the isoforms file into R, then group the isoforms by the gene names of the BLAST hits, then choose the isoform I want. The gene_id is unreliable for identifying genes, because multiple IDs can be of the same gene. You should identify the genes by BLAST matching.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            22 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            24 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            19 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            52 views
            0 likes
            Last Post seqadmin  
            Working...
            X