Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • reads mapping across scaffold boundaries

    Hi,

    I am struggling with assembling a transcriptome with SE 454 reads. I used bwa-sw to map the reads and want to use cufflinks for the assembly. Eventually I get errors like these
    >Error (GFaSeqGet): end coordinate (994) cannot be larger than sequence length 883
    >Error (GFaSeqGet): subsequence cannot be larger than 472
    >Error getting subseq for CUFF.18.1 (13..673)!
    Here's what I think is happening:
    I have about 12000 scaffolds, most of which are tiny. I have reads that map beyond the end of a scaffold, meaning that a part of the read is not aligned to the scaffold but "hangs over".

    read -----------------
    scaffold: |----------------------|

    If I then try to extract the transcript with cufflinks, I get the error, because Cufflinks uses the reference genome to extract the exons instead of the assembled reads.

    This now raised three questions for me:
    1. Why do I have quite a number of reads mapping to tiny scaffolds. Assuming that these scaffolds are repetitive DNA or junk or whatever excluded them from being assembled in a bigger scaffold.
    2. Is there a way to extract the transcripts with a different software?
    3. Or should I try to exclude either all small scaffolds or all reads mapping beyond scaffold ends from my transcriptome assembly?

  • #2
    I've got the same issue here... anyone found a solution yet?

    Comment


    • #3
      of note, you can eliminate this error by NOT using the -b option in cufflinks.. Also, this bug is exposed in bowtie2 (https://sourceforge.net/tracker/?fun...7&atid=1101606), but not in bowtie1, at least I've never had it happen with B1..

      Comment


      • #4
        Error (GFaSeqGet): subsequence cannot be larger than 16571
        Error getting subseq for CUFF.42374.1 (2..16614)!

        I am getting the same error....has anybody found a solution?? -b with or without does not work for me

        Comment


        • #5
          You could pad all your references with Ns so that nothing hangs off.

          Comment


          • #6
            Forgive my ignorance...but how do I do pad those sequences?

            Comment


            • #7
              Hi I have same problem. It seems that lots of people have this error.

              I use the Ensembl genome(*dna.toplevel.fa) and GTF file.

              If I ran cufflinks (even without -b genome.fa) but then run Cuffmerge with -s genome.fa, I got the error!

              If I omitted -b genome.fa in both Cufflinks and Cuffdiff and meanwhile omit -s genome.fa in Cuffmerge, I didn't get the error! However, I don't know if it was accurate.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin


                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                Yesterday, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              39 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              41 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              35 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              55 views
              0 likes
              Last Post seqadmin  
              Working...
              X