I am working on a plant, in which the genome has been sequenced but the gene-annotation quality are not as good as I expected. I am trying to use STAR and Cufflinks to annotate novel transcripts from pair-end RNA-seq data. When I used the default settings, I got many alternative-spliced transcripts, some of which don't look like real. Therefore, for each novel splicing junction, I would like to have a filter, that is, at least ten reads (unique mapped or multi-mapped) support the junction. I don't know how to do that. What I am currently using is to do 2-pass mapping using STAR. After the 1st pass mapping, I will have the unsupported (with less than 10 reads) junctions removed from the *SJ.out.tab, and use this modified *SJ.out.tab as "annotated" junctions (--sjdbFileChrStartEnd value) for the 2nd pass mapping. Will this step improve the the mapping and get a more reliable list of novel transcripts.
Thanks!
Thanks!
Comment