I got a novel plant genome in an advanced draft phase and 8 Illumina paired RNA-Seq lanes, The RNA-Seq data is mostly from other strains, and the read lengths differ between lanes.
The goal is to get as many "true" genes/transcripts as possible. Hence my questions:
* should I combine all filtered reads into giant 20+GB X_1 and X_2 files and run it with tophat, giving some rough --mate-inner-dist?
* or is it better to run each lane separately with more accurate mate-inner-dist value, then combine it and sort for cufflinks?
* what is the "good practice" to obtain long gene models and keep the number of artifacts low?
* does anybody have a good experience with some alternative to bowtie spliced read mapper, output of which can be used by cufflinks?
BTW, I am using latest tophat (1.1.4), cufflinks etc.
The goal is to get as many "true" genes/transcripts as possible. Hence my questions:
* should I combine all filtered reads into giant 20+GB X_1 and X_2 files and run it with tophat, giving some rough --mate-inner-dist?
* or is it better to run each lane separately with more accurate mate-inner-dist value, then combine it and sort for cufflinks?
* what is the "good practice" to obtain long gene models and keep the number of artifacts low?
* does anybody have a good experience with some alternative to bowtie spliced read mapper, output of which can be used by cufflinks?
BTW, I am using latest tophat (1.1.4), cufflinks etc.
Comment