Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cufflinks/Cuffmerge/Cuffdiff error 'subsequence cannot be larger than 16338'

    Hi all
    I'm doing DE analysis on cow samples.
    I use the reference genome of Ensembl UMD3.1, which I thought is the latest version.
    However, when I ran Tophat2-Cuffdiff2 pipeline (default parameter setting), I still get this warning:

    Warning: couldn't find fasta record for 'GJ058256.1'!
    This contig will not be bias corrected.
    Warning: couldn't find fasta record for 'GJ058424.1'!
    ......



    (1) What does it mean? Is it very trouble for my downstream analysis?

    And if I ran Tophat2-Cufflinks/Cuffmerge-Cuffdiff2 ((default parameter setting)), I got the error:
    Warning: couldn't find fasta record for 'GJ060129.1'!
    This contig will not be bias corrected.
    ......

    Error (GFaSeqGet): subsequence cannot be larger than 16338
    Error getting subseq for TCONS_00062149 (1..16448)!


    (2) Why could I get this result? How can I go on with Tophat2-Cufflinks/Cuffmerge-Cuffdiff2?


    Thank you !
    Last edited by super0925; 03-05-2015, 08:39 AM.

  • #2
    Cufflinks/Cuffmerge/Cuffdiff error 'subsequence cannot be larger than 16338'

    Hi all
    I'm doing DE analysis on cow samples.
    I use the reference genome of Ensembl UMD3.1, which I thought is the latest version.
    However, when I ran Tophat2-Cuffdiff2 pipeline (default parameter setting), I still get this warning:


    Warning: couldn't find fasta record for 'GJ058256.1'!
    This contig will not be bias corrected.
    Warning: couldn't find fasta record for 'GJ058424.1'!
    ......



    (1) What does it mean? Is it very trouble for my downstream analysis?

    And if I ran Tophat2-Cufflinks/Cuffmerge-Cuffdiff2 ((default parameter setting)), I got the error:

    Warning: couldn't find fasta record for 'GJ060129.1'!
    This contig will not be bias corrected.

    ......

    Error (GFaSeqGet): subsequence cannot be larger than 16338
    Error getting subseq for TCONS_00062149 (1..16448)!


    (2) Why could I get this result? How can I go on with Tophat2-Cufflinks/Cuffmerge-Cuffdiff2?


    Thank you !
    Last edited by super0925; 03-05-2015, 08:40 AM.

    Comment


    • #3
      I don't have the cow iGenomes set but my guess is that "GJ058424.1" is in the GTF file but is not in the genome sequence file. It appears to be SH3YL1 gene now.

      Someone else will need to comment on the other error.

      Comment


      • #4
        Originally posted by GenoMax View Post
        I don't have the cow iGenomes set but my guess is that "GJ058424.1" is in the GTF file but is not in the genome sequence file. It appears to be SH3YL1 gene now.

        Someone else will need to comment on the other error.
        Why do I get this warning in (1) and error in (2)?
        Is warning in (1) very critical for downstream analysis?
        How to solve it?
        Cheers

        Comment


        • #5
          Anyone could help?

          GTF file are genes.gtf from UMD3.1

          The first column at genes.gtf (I think it is chromosome) is
          1
          10
          11
          12
          13
          14
          15
          16
          17
          18
          19
          2
          20
          21
          22
          23
          24
          25
          26
          27
          28
          29
          3
          4
          5
          6
          7
          8
          9
          GJ058256.1
          GJ058424.1
          GJ058425.1
          GJ058430.1
          GJ058433.1
          GJ058437.1
          GJ058729.1
          GJ059463.1
          GJ059486.1
          GJ059509.1
          GJ059556.1
          GJ059670.1
          GJ060027.1
          GJ060032.1
          GJ060118.1
          GJ060120.1
          GJ060129.1
          MT
          X
          Last edited by super0925; 05-12-2015, 11:17 AM.

          Comment


          • #6
            Anyone could help?

            GTF file are genes.gtf from UMD3.1

            The first column at genes.gtf (I think it is chromosome) is
            1
            10
            11
            12
            13
            14
            15
            16
            17
            18
            19
            2
            20
            21
            22
            23
            24
            25
            26
            27
            28
            29
            3
            4
            5
            6
            7
            8
            9
            GJ058256.1
            GJ058424.1
            GJ058425.1
            GJ058430.1
            GJ058433.1
            GJ058437.1
            GJ058729.1
            GJ059463.1
            GJ059486.1
            GJ059509.1
            GJ059556.1
            GJ059670.1
            GJ060027.1
            GJ060032.1
            GJ060118.1
            GJ060120.1
            GJ060129.1
            MT
            X

            Comment


            • #7
              Did you get these files from iGenomes or is this something you put together by getting files (seq, annotation etc) from individual sources?

              Comment


              • #8
                Originally posted by GenoMax View Post
                Did you get these files from iGenomes or is this something you put together by getting files (seq, annotation etc) from individual sources?
                I downloaded from iGenome...
                Do you mean my files is abnormal?

                Comment


                • #9
                  Originally posted by super0925 View Post
                  I downloaded from iGenome...
                  Do you mean my files is abnormal?
                  No. One of the reasons to get this data from iGenomes is it has (supposedly) been checked for consistency so the kind of thing you have run into does not happen. It is possible that you may have downloaded a flawed version that has since been fixed (you could download a new copy and compare).

                  I hesitate to recommend that you get sequences of missing fasta from NCBI and append them to your genome.fa file (you will likely need to re-index it again). But this may get you past one of the errors.

                  I am not sure how much work you have put into this already but if the new download from iGenomes does have these sequences then you could use that genome.fa file.

                  As for your second error this thread seems to have some options: https://www.biostars.org/p/57249/

                  Comment


                  • #10
                    cufflinks warnings: could not find fasta records

                    Dar friends,

                    I am getting the same warnings. I have downloaded Galgal4 reference files from iGenome. When running cufflinks, I am getting "warning: couldn't find fasta record for LGE64 ...". I think these are contigs that are present in genes.gtf but not in the genome.fasta. My question is: could these warnings affect my downstream analyses? If so, what should I do to resolve these problem?

                    any comment would be appreciated
                    Karim

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM
                    • seqadmin
                      Techniques and Challenges in Conservation Genomics
                      by seqadmin



                      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                      Avian Conservation
                      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                      03-08-2024, 10:41 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Yesterday, 06:37 PM
                    0 responses
                    8 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, Yesterday, 06:07 PM
                    0 responses
                    8 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-22-2024, 10:03 AM
                    0 responses
                    49 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-21-2024, 07:32 AM
                    0 responses
                    66 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X