I read in a number of papers that STAR 2-pass (making use of an initial round of alignments to define splice junctions) improves the accuracy of alignment, but I can't find a simple recipe for running STAR in this mode.
From reading the manual it sounds like I have to run STAR in readAlign mode, generate an SJ.out.tab file from that, redo the genomeGenerate step (?!) with that file as the sjdbFileChrStartEnd parameter, then rerun the readAlign step on that new genome.
I have already built the genome with Ensembl GTF annotations and since I have ~100 samples to align (all Human), I'm curious:
1) Does the 2 step really get me much improvement when I already have built the genome with annotations
2) In this scenario do people really align, regenerate genome, and realign 100 times? Or is aligning the first sample, regenerate genome once and aligning once to that genome for all other samples sufficient?
What's best practice basically?
From reading the manual it sounds like I have to run STAR in readAlign mode, generate an SJ.out.tab file from that, redo the genomeGenerate step (?!) with that file as the sjdbFileChrStartEnd parameter, then rerun the readAlign step on that new genome.
I have already built the genome with Ensembl GTF annotations and since I have ~100 samples to align (all Human), I'm curious:
1) Does the 2 step really get me much improvement when I already have built the genome with annotations
2) In this scenario do people really align, regenerate genome, and realign 100 times? Or is aligning the first sample, regenerate genome once and aligning once to that genome for all other samples sufficient?
What's best practice basically?
Comment