Hi,
I am struggling with assembling a transcriptome with SE 454 reads. I used bwa-sw to map the reads and want to use cufflinks for the assembly. Eventually I get errors like these
Here's what I think is happening:
I have about 12000 scaffolds, most of which are tiny. I have reads that map beyond the end of a scaffold, meaning that a part of the read is not aligned to the scaffold but "hangs over".
read -----------------
scaffold: |----------------------|
If I then try to extract the transcript with cufflinks, I get the error, because Cufflinks uses the reference genome to extract the exons instead of the assembled reads.
This now raised three questions for me:
1. Why do I have quite a number of reads mapping to tiny scaffolds. Assuming that these scaffolds are repetitive DNA or junk or whatever excluded them from being assembled in a bigger scaffold.
2. Is there a way to extract the transcripts with a different software?
3. Or should I try to exclude either all small scaffolds or all reads mapping beyond scaffold ends from my transcriptome assembly?
I am struggling with assembling a transcriptome with SE 454 reads. I used bwa-sw to map the reads and want to use cufflinks for the assembly. Eventually I get errors like these
>Error (GFaSeqGet): end coordinate (994) cannot be larger than sequence length 883
>Error (GFaSeqGet): subsequence cannot be larger than 472
>Error getting subseq for CUFF.18.1 (13..673)!
>Error (GFaSeqGet): subsequence cannot be larger than 472
>Error getting subseq for CUFF.18.1 (13..673)!
I have about 12000 scaffolds, most of which are tiny. I have reads that map beyond the end of a scaffold, meaning that a part of the read is not aligned to the scaffold but "hangs over".
read -----------------
scaffold: |----------------------|
If I then try to extract the transcript with cufflinks, I get the error, because Cufflinks uses the reference genome to extract the exons instead of the assembled reads.
This now raised three questions for me:
1. Why do I have quite a number of reads mapping to tiny scaffolds. Assuming that these scaffolds are repetitive DNA or junk or whatever excluded them from being assembled in a bigger scaffold.
2. Is there a way to extract the transcripts with a different software?
3. Or should I try to exclude either all small scaffolds or all reads mapping beyond scaffold ends from my transcriptome assembly?
Comment