dovah 09-27-2016 01:45 AM

unique isoform detection using IsoSeq
Hi all,

I launched Iso-Seq on my RNA data (D. melanogaster). My aim is to see how many isoforms per gene are there in my sample. I am following the tutorial on github.

The Iso-Seq pipeline being completed, now I got the output in two folders: Cluster (for consensus isoforms) and Classify (for "real" sequenced isoforms).

I would need somebody to guide me in the interpretation of these folder content, since it is my first experience with Iso-Seq. I think that unique full-lenght (non-chimeric) isoforms in folder Classify are those reported in file: isoseq_flnc.fa and the unique consensus full-length in folder Cluster are reported in file: polished_high_qv_consensus_isoforms.fastq.
Is this correct?

How can I link the isoforms to the gene? Shall I blast it?

Many thanks for the help!

Magdoll 10-28-2016 09:25 AM


Yes you are correct in your interpretation. Best way to proceed to visualizing isoforms:

-- take the HQ isoform output (polished_high_qv_consensus_isoforms.fastq) and align it to your genome. You can use a collapse script to remove some redundancy (there will be a little):

-- to compare it with existing annotation, use matchAnnot (

there's also a lightweight script that can parse matchAnnot results for you:

I don't check seqanswers often enough, so apologies for the late response.

Please consider joining the public Iso-Seq google group. Response from there is usually much faster. You can certainly post on both Seqanswers and Google Group at the same time.!forum/smrt_isoseq

