Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • cuffmerge failed

    any idea about this failure? how to make fasta have the same length? and what does this mean? thank you!

    [Wed Sep 28 20:19:22 2011] Beginning transcriptome assembly merge
    -------------------------------------------

    [Wed Sep 28 20:19:22 2011] Preparing output location ./merged_asm/
    [Wed Sep 28 20:19:22 2011] Converting GTF files to SAM
    gtf_to_sam: /usr/lib64/libz.so.1: no version information available (required by gtf_to_sam)
    [20:19:22] Loading reference annotation.
    gtf_to_sam: /usr/lib64/libz.so.1: no version information available (required by gtf_to_sam)
    [20:19:22] Loading reference annotation.
    gtf_to_sam: /usr/lib64/libz.so.1: no version information available (required by gtf_to_sam)
    [20:19:23] Loading reference annotation.
    [Wed Sep 28 20:19:23 2011] Quantitating transcripts
    cufflinks: /usr/lib64/libz.so.1: no version information available (required by cufflinks)
    You are using Cufflinks v1.1.0, which is the most recent release.
    [bam_header_read] EOF marker is absent.
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    File ./merged_asm/tmp/mergeSam_fileu98BBT doesn't appear to be a valid BAM file, trying SAM...
    [20:19:23] Loading reference annotation.
    [20:19:24] Inspecting reads and determining fragment length distribution.
    Processed 4610 loci.
    > Map Properties:
    > Total Map Mass: 27713.00
    > Read Type: 0bp single-end
    > Fragment Length Distribution: Truncated Gaussian (default)
    > Default Mean: 200
    > Default Std Dev: 80
    [20:19:24] Assembling transcripts and estimating abundances.
    Processed 4610 loci.
    [Wed Sep 28 20:19:53 2011] Comparing against reference file xxx.gtf
    You are using Cufflinks v1.1.0, which is the most recent release.
    No fasta index found for ../bowtie-0.12.7/genomes/chr.fasta. Rebuilding, please wait..
    Error: sequence lines in a FASTA record must have the same length!
    [FAILED]
    Error: could not execute cuffcompare

    Traceback (most recent call last):
    File "/home/student/yujinhai/cufflinks-1.1.0.Linux_x86_64/cuffmerge", line 573, in ?
    sys.exit(main())
    File "/home/student/yujinhai/cufflinks-1.1.0.Linux_x86_64/cuffmerge", line 556, in main
    compare_meta_asm_against_ref(params.ref_gtf, params.fasta, output_dir+"/transcripts.gtf")
    File "/home/student/yujinhai/cufflinks-1.1.0.Linux_x86_64/cuffmerge", line 406, in compare_meta_asm_against_ref
    tmap = compare_to_reference(gtf_input_file, ref_gtf, fasta_file)
    File "/home/student/yujinhai/cufflinks-1.1.0.Linux_x86_64/cuffmerge", line 342, in compare_to_reference
    exit(1)
    TypeError: 'str' object is not callable

  • #2
    The command line that you ran would be useful, as well as the first few lines of input files. The warnings about conversion and lack of GTF version information are a little odd.

    how to make fasta have the same length? and what does this mean?
    Looking at the cufflinks code, it's expecting a fairly standard fasta format with equal-length lines (except for the last line) in each fasta record. Something like this:
    Code:
    >gi|347448407|gb|JN582205.1| Dorylomorpha spinosa voucher KNWR:Ento:4382 cytochrome oxidase subunit 1 (COI) gene, partial cds; mitochondrial
    AACATTATATTTTATATTTGGTGCCTGAGCAGGAATAGTGGGTACATCCCTAAGAATCCTTATTCGAGCT
    GAACTAGGACATCCAGGATCACTAATTGGAGATGACCAAATTTATAACGTAATTGTAACAGCTCATGCTT
    TTGTGATAATTTTTTTTATAGTAATACCTATTATAATTGGAGGATTCGGGAATTGACTAGTACCCCTAAT
    ACTAGGAGCTCCTGACATAGCATTCCCTCGTATAAACAATATAAGATTTTGAATATTACCCCCATCATTA
    TCCCTTCTACTCCTTAGAAGAATAACTAACAACGGAGCTGGTACCGGATGAACGGTATACCCACCACTAT
    CATCAAACATCGCCCACGAAGGTGCATCAGTTGATTTAGCTATTTTTTCATTACATTTAGCAGGAATTTC
    ATCAATTCTAGGAGCAGTAAATTTTATTACTACAGTAATTAATATACGTTCAACAGGAATTTCATTTGAC
    CGAATACCTTTATTTGTATGGGCAGTAGTAATTACAGCATTATTACTTCTTTTATCATTACCAGTTCTTG
    CAGGAGCCATTACTATACTATTAACAGACCGAAATTTTAATACTTCATTCTTTGACCCGGCTGGAGGAGG
    TGACCCAATTTTATACCAACATTTATTT
    rather than this:
    Code:
    >gi|347448407|gb|JN582205.1| Dorylomorpha spinosa voucher KNWR:Ento:4382 cytochrome oxidase subunit 1 (COI) gene, partial cds; mitochondrial
    AACATTATATTTTATATTTGGTGCCTGAGCAGGAATAGTGGGTACATCCCTAAGAATCCTTATT
    CGAGCTGAACTAGGACATCCAGGATCACTAATTGGAGATGACCAAATTTATAACGTAATTGTAACAGCTCATGCTT
    TTGTGATAATTTTTTTTATAGTAATACCTATTATAATTGGAGGATTCGG
    GAATTGACTAGTACCCCTAATACTAGGAGCTCCTGACATAGCATTCCCTCGTATAAACAATATAAGATTTTGAATATTACCCCCATCATTA
    TCCCTTCTACTCCTTAGAAGAATAACTAACAACGGAGCTGGTACCGGATGAACGGTATACCCACCAC
    TATCATCAAACATCGCCCACGAAGGTGCATCAGTTGATTTAGCTATTTTTTCATTACATTTAGCAGGAATTTC
    ATCAATTCTAGGAGCAGTAAATTTTATTACTACAGTAATTAATATACGTTCAACAGGAATTTCATTTGA
    CCGAATACCTTTATTTGTATGGGCAGTAGTAATTACAGCATTATTACTTCTTTTATCATTACCAGTTCTTG
    CAGGAGCCATTACTATACTATTAACAGACCGAAATTTTAATACTTC
    ATTCTTTGACCCGGCTGGAGGAGGTGACCCAATTTTATACCAACATTTATTT

    Comment


    • #3
      thank you for reply

      Thank you for your reply, i checked my fasta file, and it seems to be right as you mentioned.(though they seem not have same length, but actually they do.)

      >chr
      CCGCGGCGCTGCTCCCGGCGCTCCGCGCCGGGAGACGGGGCGAGTCGCTGCGCTCCCCGCCAGGGAGCCG
      CTGCGCGGCTCGCAGTGGGTCGATTCCCGTTGCCGTCGATCGAGTCGCTTCGCTCCTCTGAGTTTCCGAG
      ATTAGGTTCTCGCCTGCACTTTTCATCGTCCCGTTCGATCCGGTCCCCCGCACCCCAACGGGGCTGGAGA
      AGCGGGAGGGTGTGCCCGACCCGCCGCCCACTCGCCTTCCCGCACCGCTCCATGTCATACCCACAGCATA
      CCACCCGGCACCCTCGAATCCCAAAACAGACGAAAAACTTAAAACACCCATATCTGTTGATAATCAACCT
      TTTTCGAACCTTACAATCTGAAAAACGTGCACAACCCACGTAAAAACTTACTCACCAAGTAATTACCCAA
      ACATGTTGTCAATCAATACCTTTCAGAAACGGCTCGAAAACGGACGAAGCAGACACCCCCACGCCCGCCG
      ACACCCCGGCGCCGGCACTCACCCGACAGGTCGCCGACACCCCACATCACAAACCGGAGACATGTCATCA
      CACCAGGTCACAGCACCATTACGCCCGCGGGTGCCGAGGTCGCATCGATCCCACCCAGAATGGGCAGCAG
      AGATTCAGCAGCGGATCGCGTCGCTCGCGGCGACGCCTTGGGGCGGGAGCGCGTCATCGTGTTCGAGAGA
      TTCTCCGATCAAGCCCGCCACGTGGTCGTCCTCGCCGCCGGCGCCGCCCGCACCCACCACCAGAACTGGC


      so i don't know how to fix it.

      the warnings are really wired, i don't even have a clue what's going wrong.


      [QUOTE=gringer;52590]The command line that you ran would be useful, as well as the first few lines of input files. The warnings about conversion and lack of GTF version information are a little odd.



      Looking at the cufflinks code, it's expecting a fairly standard fasta format with equal-length lines (except for the last line) in each fasta record. Something like this:
      Code:
      >gi|347448407|gb|JN582205.1| Dorylomorpha spinosa voucher KNWR:Ento:4382 cytochrome oxidase subunit 1 (COI) gene, partial cds; mitochondrial
      AACATTATATTTTATATTTGGTGCCTGAGCAGGAATAGTGGGTACATCCCTAAGAATCCTTATTCGAGCT
      GAACTAGGACATCCAGGATCACTAATTGGAGATGACCAAATTTATAACGTAATTGTAACAGCTCATGCTT
      TTGTGATAATTTTTTTTATAGTAATACCTATTATAATTGGAGGATTCGGGAATTGACTAGTACCCCTAAT
      ACTAGGAGCTCCTGACATAGCATTCCCTCGTATAAACAATATAAGATTTTGAATATTACCCCCATCATTA
      TCCCTTCTACTCCTTAGAAGAATAACTAACAACGGAGCTGGTACCGGATGAACGGTATACCCACCACTAT
      CATCAAACATCGCCCACGAAGGTGCATCAGTTGATTTAGCTATTTTTTCATTACATTTAGCAGGAATTTC
      ATCAATTCTAGGAGCAGTAAATTTTATTACTACAGTAATTAATATACGTTCAACAGGAATTTCATTTGAC
      CGAATACCTTTATTTGTATGGGCAGTAGTAATTACAGCATTATTACTTCTTTTATCATTACCAGTTCTTG
      CAGGAGCCATTACTATACTATTAACAGACCGAAATTTTAATACTTCATTCTTTGACCCGGCTGGAGGAGG
      TGACCCAATTTTATACCAACATTTATTT
      Attached Files

      Comment


      • #4
        This is a really old thread, but just in case somebody has the same problem,
        I solved it removing the extra new line at the end of my fasta file.

        Since my fasta file only had one record I only removed one empty line.
        I don't know if all empty lines would need to be removed for multiple fasta files.

        Comment


        • #5
          I was getting this error until I removed all empty lines between separate sequences.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          30 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          32 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          28 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          53 views
          0 likes
          Last Post seqadmin  
          Working...
          X