Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Richard Barker
    Member
    • Apr 2012
    • 47

    Cuffmerge error coultn't finda data file for Mt or Pt?

    I'm tring to merge my transcript data produced from 8 samples analysed using TopHat and compiled using Cufflinks. Unfortunately when i run cuffmerge I get an error message about "missing fasta data files for 'Mt' and 'Pt'" but can't find reference to these files anywhere else in the literature or on-line?
    Any advice greatfully received.

    richard@ubuntu:~/RNA_seq_analysis/Cuffmerge$ cuffmerge -g arabidopsis_thaliana.TAIR10.60.gtf -s TAIR10_chr_all.fas -p 6 run297_transcript_cuffmerge.txt

    [Thu Aug 9 15:36:29 2012] Beginning transcriptome assembly merge
    -------------------------------------------

    [Thu Aug 9 15:36:29 2012] Preparing output location ./merged_asm/
    [Thu Aug 9 15:36:36 2012] Converting GTF files to SAM
    [15:36:36] Loading reference annotation.
    [15:36:37] Loading reference annotation.
    [15:36:38] Loading reference annotation.
    [15:36:39] Loading reference annotation.
    [15:36:40] Loading reference annotation.
    [15:36:41] Loading reference annotation.
    [15:36:42] Loading reference annotation.
    [15:36:44] Loading reference annotation.
    [Thu Aug 9 15:36:45 2012] Quantitating transcripts
    You are using Cufflinks v2.0.2, which is the most recent release.
    Command line:
    cufflinks -o ./merged_asm/ -F 0.05 -g arabidopsis_thaliana.TAIR10.60.gtf -q --overhang-tolerance 200 --library-type=transfrags -A 0.0 --min-frags-per-transfrag 0 --no-5-extend -p 6 ./merged_asm/tmp/mergeSam_filejHHJWI
    [bam_header_read] EOF marker is absent.
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    File ./merged_asm/tmp/mergeSam_filejHHJWI doesn't appear to be a valid BAM file, trying SAM...
    [15:36:45] Loading reference annotation.
    [15:36:47] Inspecting reads and determining fragment length distribution.
    Processed 26332 loci.
    > Map Properties:
    > Normalized Map Mass: 194074.00
    > Raw Map Mass: 194074.00
    > Fragment Length Distribution: Truncated Gaussian (default)
    > Default Mean: 200
    > Default Std Dev: 80
    [15:36:48] Assembling transcripts and estimating abundances.
    Processed 26332 loci.
    [Thu Aug 9 15:44:22 2012] Comparing against reference file arabidopsis_thaliana.TAIR10.60.gtf
    You are using Cufflinks v2.0.2, which is the most recent release.
    No fasta index found for TAIR10_chr_all.fas. Rebuilding, please wait..
    Fasta index rebuilt.
    Warning: couldn't find fasta record for 'Mt'!
    Warning: couldn't find fasta record for 'Pt'!
    [Thu Aug 9 15:44:34 2012] Comparing against reference file arabidopsis_thaliana.TAIR10.60.gtf
    You are using Cufflinks v2.0.2, which is the most recent release.
    Warning: couldn't find fasta record for 'Mt'!
    Warning: couldn't find fasta record for 'Pt'!
  • kmcarr
    Senior Member
    • May 2008
    • 1181

    #2
    Richard,

    This is most likely caused by a mismatch between the reference names between the fasta file and the gtf file you are using.

    The official TAIR10 genome sequence release names the mitochondrial and plastid (chloroplast) chromsomes ChrM and ChrC respectively. These are the names used in TAIR10_chr_all.fas. It would appear that your reference GTF file, arabidopsis_thaliana.TAIR10.60.gtf, is from a different source and uses different names (Mt and Pt). You will need to make sure the chromosome names match exactly between your FASTA and GTF files.

    Comment

    • Richard Barker
      Member
      • Apr 2012
      • 47

      #3
      Thanks for the swift response (again)

      Your advice worked perfectly, i searched the directories near where i downloaded the TAIR10.fasta file and found a TAIR10_GFF3_genes.gff file.

      The following script appears to be working, but whats the difference between a GFF and GTF file?

      cuffmerge -g TAIR10_GFF3_genes -s TAIR10_chr_all.fas -p 6 run297_transcript_cuffmerge.txt

      Comment

      • kmcarr
        Senior Member
        • May 2008
        • 1181

        #4
        The following script appears to be working, but whats the difference between a GFF and GTF file?
        Cufflinks documentation

        Comment

        • Richard Barker
          Member
          • Apr 2012
          • 47

          #5
          Ooops spoke too soon now i get the following error message?

          richard@ubuntu:~/RNA_seq_analysis/Cuffmerge$ cuffmerge -g TAIR10_GFF3_genes.gff -s TAIR10_chr_all.fas -p 6 run297_transcript_cuffmerge.txt

          [Fri Aug 10 07:41:48 2012] Beginning transcriptome assembly merge
          -------------------------------------------

          [Fri Aug 10 07:41:48 2012] Preparing output location ./merged_asm/
          [Fri Aug 10 07:41:52 2012] Converting GTF files to SAM
          [07:41:52] Loading reference annotation.
          [07:41:53] Loading reference annotation.
          [07:41:54] Loading reference annotation.
          [07:41:56] Loading reference annotation.
          [07:41:57] Loading reference annotation.
          [07:41:58] Loading reference annotation.
          [07:41:59] Loading reference annotation.
          [07:42:00] Loading reference annotation.
          [Fri Aug 10 07:42:02 2012] Quantitating transcripts
          You are using Cufflinks v2.0.2, which is the most recent release.
          Command line:
          cufflinks -o ./merged_asm/ -F 0.05 -g TAIR10_GFF3_genes.gff -q --overhang-tolerance 200 --library-type=transfrags -A 0.0 --min-frags-per-transfrag 0 --no-5-extend -p 6 ./merged_asm/tmp/mergeSam_filefWraGs
          [bam_header_read] EOF marker is absent.
          [bam_header_read] invalid BAM binary header (this is not a BAM file).
          File ./merged_asm/tmp/mergeSam_filefWraGs doesn't appear to be a valid BAM file, trying SAM...
          [07:42:02] Loading reference annotation.
          [07:42:03] Inspecting reads and determining fragment length distribution.
          Processed 47416 loci.
          > Map Properties:
          > Normalized Map Mass: 194074.00
          > Raw Map Mass: 194074.00
          > Fragment Length Distribution: Truncated Gaussian (default)
          > Default Mean: 200
          > Default Std Dev: 80
          [07:42:05] Assembling transcripts and estimating abundances.
          Processed 47416 loci.
          [Fri Aug 10 07:53:04 2012] Comparing against reference file TAIR10_GFF3_genes.gff
          You are using Cufflinks v2.0.2, which is the most recent release.
          Warning: couldn't find fasta record for 'Chr1'!
          Warning: couldn't find fasta record for 'Chr2'!
          Warning: couldn't find fasta record for 'Chr3'!
          Warning: couldn't find fasta record for 'Chr4'!
          Warning: couldn't find fasta record for 'Chr5'!
          Warning: couldn't find fasta record for 'ChrC'!
          Warning: couldn't find fasta record for 'ChrM'!
          [Fri Aug 10 07:53:20 2012] Comparing against reference file TAIR10_GFF3_genes.gff
          You are using Cufflinks v2.0.2, which is the most recent release.
          Warning: couldn't find fasta record for 'Chr1'!
          Warning: couldn't find fasta record for 'Chr2'!
          Warning: couldn't find fasta record for 'Chr3'!
          Warning: couldn't find fasta record for 'Chr4'!
          Warning: couldn't find fasta record for 'Chr5'!
          Warning: couldn't find fasta record for 'ChrC'!
          Warning: couldn't find fasta record for 'ChrM'!

          Comment

          • Richard Barker
            Member
            • Apr 2012
            • 47

            #6
            I've found a TAIR10_GFF file (ftp://ftp.arabidopsis.org/home/tair/...enome_release/) which was also near the location where i downloaded my genome fasta file (ftp://ftp.arabidopsis.org/home/tair/...omosome_files/) and one was able to completed the alignment!
            Thanks for your help!

            Comment

            • Richard Barker
              Member
              • Apr 2012
              • 47

              #7
              Shouldn't the cuffmerge out put have the gene names (Arabidopsis ATG codes?). What methods are there for adding your genome annotation, i thought that was the reason for using the GFF/gtf files during TopHat and/or cuffmerge?

              Comment

              • shinigam123
                Junior Member
                • Aug 2017
                • 3

                #8
                I have the same problem, How you solve it?






                Originally posted by Richard Barker View Post
                Ooops spoke too soon now i get the following error message?

                richard@ubuntu:~/RNA_seq_analysis/Cuffmerge$ cuffmerge -g TAIR10_GFF3_genes.gff -s TAIR10_chr_all.fas -p 6 run297_transcript_cuffmerge.txt

                [Fri Aug 10 07:41:48 2012] Beginning transcriptome assembly merge
                -------------------------------------------

                [Fri Aug 10 07:41:48 2012] Preparing output location ./merged_asm/
                [Fri Aug 10 07:41:52 2012] Converting GTF files to SAM
                [07:41:52] Loading reference annotation.
                [07:41:53] Loading reference annotation.
                [07:41:54] Loading reference annotation.
                [07:41:56] Loading reference annotation.
                [07:41:57] Loading reference annotation.
                [07:41:58] Loading reference annotation.
                [07:41:59] Loading reference annotation.
                [07:42:00] Loading reference annotation.
                [Fri Aug 10 07:42:02 2012] Quantitating transcripts
                You are using Cufflinks v2.0.2, which is the most recent release.
                Command line:
                cufflinks -o ./merged_asm/ -F 0.05 -g TAIR10_GFF3_genes.gff -q --overhang-tolerance 200 --library-type=transfrags -A 0.0 --min-frags-per-transfrag 0 --no-5-extend -p 6 ./merged_asm/tmp/mergeSam_filefWraGs
                [bam_header_read] EOF marker is absent.
                [bam_header_read] invalid BAM binary header (this is not a BAM file).
                File ./merged_asm/tmp/mergeSam_filefWraGs doesn't appear to be a valid BAM file, trying SAM...
                [07:42:02] Loading reference annotation.
                [07:42:03] Inspecting reads and determining fragment length distribution.
                Processed 47416 loci.
                > Map Properties:
                > Normalized Map Mass: 194074.00
                > Raw Map Mass: 194074.00
                > Fragment Length Distribution: Truncated Gaussian (default)
                > Default Mean: 200
                > Default Std Dev: 80
                [07:42:05] Assembling transcripts and estimating abundances.
                Processed 47416 loci.
                [Fri Aug 10 07:53:04 2012] Comparing against reference file TAIR10_GFF3_genes.gff
                You are using Cufflinks v2.0.2, which is the most recent release.
                Warning: couldn't find fasta record for 'Chr1'!
                Warning: couldn't find fasta record for 'Chr2'!
                Warning: couldn't find fasta record for 'Chr3'!
                Warning: couldn't find fasta record for 'Chr4'!
                Warning: couldn't find fasta record for 'Chr5'!
                Warning: couldn't find fasta record for 'ChrC'!
                Warning: couldn't find fasta record for 'ChrM'!
                [Fri Aug 10 07:53:20 2012] Comparing against reference file TAIR10_GFF3_genes.gff
                You are using Cufflinks v2.0.2, which is the most recent release.
                Warning: couldn't find fasta record for 'Chr1'!
                Warning: couldn't find fasta record for 'Chr2'!
                Warning: couldn't find fasta record for 'Chr3'!
                Warning: couldn't find fasta record for 'Chr4'!
                Warning: couldn't find fasta record for 'Chr5'!
                Warning: couldn't find fasta record for 'ChrC'!
                Warning: couldn't find fasta record for 'ChrM'!

                Comment

                • Richard Barker
                  Member
                  • Apr 2012
                  • 47

                  #9
                  I used the pipeline that was made in the CyVerse Discovery environment. It's easy to use and really fast!

                  Comment

                  • shinigam123
                    Junior Member
                    • Aug 2017
                    • 3

                    #10
                    Can you tell me what that pipeline is, do not I know it?
                    regards

                    Comment

                    • Richard Barker
                      Member
                      • Apr 2012
                      • 47

                      #11
                      They have the HTprocess and Kalisto if you're in a rush

                      Comment

                      • shinigam123
                        Junior Member
                        • Aug 2017
                        • 3

                        #12
                        But what was the problem, the inputs gff anda fasta? I need the output merged.gtf without warnings

                        Comment

                        • vivekkeshri
                          Junior Member
                          • Jan 2019
                          • 3

                          #13
                          Cuffmerge output

                          I am trying to execute Cuffmerge (cuffmerge -p 5 -g Homo.gtf assemblies.txt), but unable to get FPKM values in output file ("merged.gtf).
                          Please let me know how to solve this problem.

                          Comment

                          • vivekkeshri
                            Junior Member
                            • Jan 2019
                            • 3

                            #14
                            Please let me know about how "Cuffdiff -L" [-L/--labels: comma-separated list of condition labels] command works. How it is labeling / merging the bam files.
                            Thanks

                            Comment

                            Latest Articles

                            Collapse

                            • SEQadmin2
                              From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                              by SEQadmin2


                              Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                              The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                              ...
                              06-02-2026, 10:05 AM
                            • SEQadmin2
                              Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                              by SEQadmin2


                              With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                              Introduction

                              Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                              05-22-2026, 06:42 AM
                            • SEQadmin2
                              Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                              by SEQadmin2

                              Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                              Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                              05-06-2026, 09:04 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by SEQadmin2, 06-02-2026, 12:03 PM
                            0 responses
                            19 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 06-02-2026, 11:40 AM
                            0 responses
                            14 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 05-28-2026, 11:40 AM
                            0 responses
                            29 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 05-26-2026, 10:12 AM
                            0 responses
                            31 views
                            0 reactions
                            Last Post SEQadmin2  
                            Working...