Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cufflinks/Cuffmerge/Cuffdiff error 'subsequence cannot be larger than 16338'

    Hi all
    I'm doing DE analysis on cow samples.
    I use the reference genome of Ensembl UMD3.1, which I thought is the latest version.
    However, when I ran Tophat2-Cuffdiff2 pipeline (default parameter setting), I still get this warning:

    Warning: couldn't find fasta record for 'GJ058256.1'!
    This contig will not be bias corrected.
    Warning: couldn't find fasta record for 'GJ058424.1'!
    ......



    (1) What does it mean? Is it very trouble for my downstream analysis?

    And if I ran Tophat2-Cufflinks/Cuffmerge-Cuffdiff2 ((default parameter setting)), I got the error:
    Warning: couldn't find fasta record for 'GJ060129.1'!
    This contig will not be bias corrected.
    ......

    Error (GFaSeqGet): subsequence cannot be larger than 16338
    Error getting subseq for TCONS_00062149 (1..16448)!


    (2) Why could I get this result? How can I go on with Tophat2-Cufflinks/Cuffmerge-Cuffdiff2?


    Thank you !
    Last edited by super0925; 03-05-2015, 08:39 AM.

  • #2
    Cufflinks/Cuffmerge/Cuffdiff error 'subsequence cannot be larger than 16338'

    Hi all
    I'm doing DE analysis on cow samples.
    I use the reference genome of Ensembl UMD3.1, which I thought is the latest version.
    However, when I ran Tophat2-Cuffdiff2 pipeline (default parameter setting), I still get this warning:


    Warning: couldn't find fasta record for 'GJ058256.1'!
    This contig will not be bias corrected.
    Warning: couldn't find fasta record for 'GJ058424.1'!
    ......



    (1) What does it mean? Is it very trouble for my downstream analysis?

    And if I ran Tophat2-Cufflinks/Cuffmerge-Cuffdiff2 ((default parameter setting)), I got the error:

    Warning: couldn't find fasta record for 'GJ060129.1'!
    This contig will not be bias corrected.

    ......

    Error (GFaSeqGet): subsequence cannot be larger than 16338
    Error getting subseq for TCONS_00062149 (1..16448)!


    (2) Why could I get this result? How can I go on with Tophat2-Cufflinks/Cuffmerge-Cuffdiff2?


    Thank you !
    Last edited by super0925; 03-05-2015, 08:40 AM.

    Comment


    • #3
      I don't have the cow iGenomes set but my guess is that "GJ058424.1" is in the GTF file but is not in the genome sequence file. It appears to be SH3YL1 gene now.

      Someone else will need to comment on the other error.

      Comment


      • #4
        Originally posted by GenoMax View Post
        I don't have the cow iGenomes set but my guess is that "GJ058424.1" is in the GTF file but is not in the genome sequence file. It appears to be SH3YL1 gene now.

        Someone else will need to comment on the other error.
        Why do I get this warning in (1) and error in (2)?
        Is warning in (1) very critical for downstream analysis?
        How to solve it?
        Cheers

        Comment


        • #5
          Anyone could help?

          GTF file are genes.gtf from UMD3.1

          The first column at genes.gtf (I think it is chromosome) is
          1
          10
          11
          12
          13
          14
          15
          16
          17
          18
          19
          2
          20
          21
          22
          23
          24
          25
          26
          27
          28
          29
          3
          4
          5
          6
          7
          8
          9
          GJ058256.1
          GJ058424.1
          GJ058425.1
          GJ058430.1
          GJ058433.1
          GJ058437.1
          GJ058729.1
          GJ059463.1
          GJ059486.1
          GJ059509.1
          GJ059556.1
          GJ059670.1
          GJ060027.1
          GJ060032.1
          GJ060118.1
          GJ060120.1
          GJ060129.1
          MT
          X
          Last edited by super0925; 05-12-2015, 11:17 AM.

          Comment


          • #6
            Anyone could help?

            GTF file are genes.gtf from UMD3.1

            The first column at genes.gtf (I think it is chromosome) is
            1
            10
            11
            12
            13
            14
            15
            16
            17
            18
            19
            2
            20
            21
            22
            23
            24
            25
            26
            27
            28
            29
            3
            4
            5
            6
            7
            8
            9
            GJ058256.1
            GJ058424.1
            GJ058425.1
            GJ058430.1
            GJ058433.1
            GJ058437.1
            GJ058729.1
            GJ059463.1
            GJ059486.1
            GJ059509.1
            GJ059556.1
            GJ059670.1
            GJ060027.1
            GJ060032.1
            GJ060118.1
            GJ060120.1
            GJ060129.1
            MT
            X

            Comment


            • #7
              Did you get these files from iGenomes or is this something you put together by getting files (seq, annotation etc) from individual sources?

              Comment


              • #8
                Originally posted by GenoMax View Post
                Did you get these files from iGenomes or is this something you put together by getting files (seq, annotation etc) from individual sources?
                I downloaded from iGenome...
                Do you mean my files is abnormal?

                Comment


                • #9
                  Originally posted by super0925 View Post
                  I downloaded from iGenome...
                  Do you mean my files is abnormal?
                  No. One of the reasons to get this data from iGenomes is it has (supposedly) been checked for consistency so the kind of thing you have run into does not happen. It is possible that you may have downloaded a flawed version that has since been fixed (you could download a new copy and compare).

                  I hesitate to recommend that you get sequences of missing fasta from NCBI and append them to your genome.fa file (you will likely need to re-index it again). But this may get you past one of the errors.

                  I am not sure how much work you have put into this already but if the new download from iGenomes does have these sequences then you could use that genome.fa file.

                  As for your second error this thread seems to have some options: https://www.biostars.org/p/57249/

                  Comment


                  • #10
                    cufflinks warnings: could not find fasta records

                    Dar friends,

                    I am getting the same warnings. I have downloaded Galgal4 reference files from iGenome. When running cufflinks, I am getting "warning: couldn't find fasta record for LGE64 ...". I think these are contigs that are present in genes.gtf but not in the genome.fasta. My question is: could these warnings affect my downstream analyses? If so, what should I do to resolve these problem?

                    any comment would be appreciated
                    Karim

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM
                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    17 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    22 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    16 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    46 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X