View Single Post
Old 11-07-2012, 07:25 AM   #2
pbluescript
Senior Member
 
Location: Boston

Join Date: Nov 2009
Posts: 224
Default

Be very careful with all of this.

There can be a lot of reasons for seeing reads outside of annotated genes that have nothing to do with real biology. They might be artifacts of your library prep. Even if they were real, you have no guarantee that your coverage is deep enough to accurately determine an FPKM value. If you don't have enough coverage to determine the length of the transcribed region, then the K part of the FPKM could lead to biased expression values.

If you just extend GTF regions with an arbitrary number not informed by the biology of your system, you will create a lot of problems. You will be extending every gene by the same number, but it will not be the same relative to the actual length of the gene. This will lead to an underestimate of the expression of short genes to a higher degree than longer genes. Plus, what are you doing to ensure that your extensions don't create unwanted overlaps with other annotated genes?

Make sure you have a good reason for looking at regions outside of annotated coding regions before you start modifying those annotations.
pbluescript is offline   Reply With Quote