Hi all,
I perform de novo assemble of 454 reads by using gs assembler (newbler, version 2.3). This program could give output information about alternative splicing isoforms -- there can be more than one transcripts (isotigs) assembled for each gene (isogroup). I have got a summary listed below:
It would be OK if they are indeed splicing variants. However, we actually only put one clone from each gene into one deepwell for sequencing. So ideally there should be only one isotig assembled in every isogroup. I have had a look at the report file (454IsotigsLayout.txt). There are something like this:
The only difference between isotig00022 and isotig00023 is the former one has an additional contig, which is only 4 nt in length! I don't think there is an exon only with 4 nucleotides...
My question is, should I trust in the isoforms generated by newbler? If yes, we should have picked >1 clones by mistake for the genes with multiple transcriptions assembled. If no, what can I do in case I want to "optimize" the results? Can I pairwise align all the isotigs in one isogroup and collapse similar isotigs (like isotig00022 and isotig00023) into one?
It's the first time I play with next-generation sequencing data... Any suggestion is appreciated.
Thanks in advance!
I perform de novo assemble of 454 reads by using gs assembler (newbler, version 2.3). This program could give output information about alternative splicing isoforms -- there can be more than one transcripts (isotigs) assembled for each gene (isogroup). I have got a summary listed below:
Code:
NumIsotigsInIsogroup NumIsogroups 1 35 2 12 3 11 5 1 6 1 7 1 12 1
Code:
... >isogroup00004 numIsotigs=6 numContigs=6 [FONT="Fixedsys"] Length : 4 1329 700 1074 16 196 (bp) Contig : 00016 00110 00133 00017 00134 00156 Total: isotig00022 >>>>> >>>>> <<<<< 1274 isotig00023 >>>>> <<<<< 1270 isotig00024 <<<<< <<<<< 1525 isotig00025 >>>>> >>>>> 716 isotig00026 >>>>> >>>>> 1078 isotig00027 >>>>> 1329[/FONT] ...
My question is, should I trust in the isoforms generated by newbler? If yes, we should have picked >1 clones by mistake for the genes with multiple transcriptions assembled. If no, what can I do in case I want to "optimize" the results? Can I pairwise align all the isotigs in one isogroup and collapse similar isotigs (like isotig00022 and isotig00023) into one?
It's the first time I play with next-generation sequencing data... Any suggestion is appreciated.
Thanks in advance!
Comment