View Single Post
Old 04-16-2009, 02:26 PM   #6
Location: pasadena, ca

Join Date: Jun 2008
Posts: 14

Hi Warren,

ERANGE currently assumes a fixed read length & isn't particularly mixed read-length friendly. It's a topic that's been on my mind a lot lately and I plan on fixing it.

Some of these problems relate to read-length in a direct way.

When reads are less than 32bp long, few enough reads cross-splices that we need the splice-junctions explicitly in order to recover the known splices (notice that the TopHat developers are very happy to recover 80% of the known splices that ERANGE sees) and de novo splice discovery has a very high false-positive rate.

As reads get in the 40-75 bp range, you can now map novel splices with some good confidence.

But as reads get longer, an increasing fraction of reads cross more than one splice.... one of the upcoming versions of ERANGE will deal with that.

If your reads are long enough, then ERANGE now supports a splice-junction free way of mapping splices (which is described in the ERANGE help file). Essentially, just map the reads on the regular genome with bowtie, explicitly saving the unmapped reads in a separate file. Then map those reads with blat (yes, it's slow) and only import those that map well onto the genome & that have an intron.

alim is offline   Reply With Quote