11302011, 09:45 AM  #1 
Member
Location: Detroit, MI, USA Join Date: Nov 2011
Posts: 10

Problem with agreement between cow annotation/build is screwing up my cufflinks run
I mapped my RNASeq data using TopHat as well as the build (.ebtw) and annotation (.gtf) of the cow (Bos taurus) genome that is offered on the Bowtie website.
I seems that the gene build and the annotation differ somewhat and this issue is effecting transitioning from top hat to cufflinks. I am getting an error message when I try to run cufflinks: "Error: sort order of reads in BAMs must be the same" I tried sorting to no avail, and also tried to organize the .gtf file to make sure the chromosomes were in sync. However, there are some unmapped regions that are included in the bowtie build but are not in the annotation. Here is a portion of the header of my bam which shows a bunch of these regions: @SQ SN:7180002026074 LN:2973 @SQ SN:7180002026116 LN:16115 @SQ SN:7180002026122 LN:6940 @SQ SN:7180002026132 LN:11037 @SQ SN:7180002026137 LN:6826 @SQ SN:7180002026154 LN:29991 @SQ SN:7180002026155 LN:14744 @SQ SN:7180002026160 LN:14503 @SQ SN:7180002026163 LN:8520 @SQ SN:7180002026164 LN:4564 @SQ SN:7180002026165 LN:12396 @SQ SN:7180002026166 LN:2711 @SQ SN:7180002026167 LN:7838 @SQ SN:7180002026168 LN:2523 @SQ SN:7180002026169 LN:3411 @SQ SN:7180002026170 LN:33473 @SQ SN:7180002026174 LN:6660 @SQ SN:7180002026177 LN:6771 @SQ SN:7180002026182 LN:5593 @SQ SN:7180002026183 LN:5759 @SQ SN:7180002026184 LN:4565 @SQ SN:7180002026194 LN:9952 @SQ SN:7180002026199 LN:6586 @SQ SN:7180002026202 LN:8074 @SQ SN:7180002026209 LN:7446 @SQ SN:7180002026216 LN:25672 @SQ SN:7180002026232 LN:37008 @SQ SN:7180002026250 LN:55516 @SQ SN:7180002026254 LN:11038 @SQ SN:7180002026256 LN:5833 @SQ SN:7180002026264 LN:25023 @SQ SN:7180002026278 LN:7842 @SQ SN:Chr1 LN:158337067 @SQ SN:Chr10 LN:104305016 @SQ SN:Chr11 LN:107310763 @SQ SN:Chr12 LN:91163125 @SQ SN:Chr13 LN:84240350 @SQ SN:Chr14 LN:84648390 @SQ SN:Chr15 LN:85296676 @SQ SN:Chr16 LN:81724687 @SQ SN:Chr17 LN:75158596 @SQ SN:Chr18 LN:66004023 @SQ SN:Chr19 LN:64057457 @SQ SN:Chr2 LN:137060424 @SQ SN:Chr20 LN:72042655 @SQ SN:Chr21 LN:71599096 @SQ SN:Chr22 LN:61435874 @SQ SN:Chr23 LN:52530062 @SQ SN:Chr24 LN:62714930 @SQ SN:Chr25 LN:42904170 @SQ SN:Chr26 LN:51681464 @SQ SN:Chr27 LN:45407902 @SQ SN:Chr28 LN:46312546 @SQ SN:Chr29 LN:51505224 @SQ SN:Chr3 LN:121430405 @SQ SN:Chr4 LN:120829699 @SQ SN:Chr5 LN:121191424 @SQ SN:Chr6 LN:119458736 @SQ SN:Chr7 LN:112638659 @SQ SN:Chr8 LN:113384836 @SQ SN:Chr9 LN:105708250 @SQ SN:ChrX LN:148823899 @PG ID:TopHat VN:1.3.3 CL:./tophat numthreads 6 mateinnerdist 191 matestddev 51 solexa1.3quals o tophat_out_Bos2 keeptmp G /Users/Papio/Annotations/Bos_taurus.UMD3.1.64.gtf /Users/Papio/bowtie0.12.7/indexes/b_taurus Cow.fq_1 Cow.fq_2 It seems the best course would be to remove these regions so that the build and annotation are in sync. So my question is how do you remove these regions using samtoolsI've tried to figure it out, but I'm still unclear. Thank you 
12012011, 07:06 PM  #2 
Member
Location: Detroit, MI, USA Join Date: Nov 2011
Posts: 10

Never mindproblem solvedI built the genome from Ensembl and used the annotation from there.

