Problem with agreement between cow annotation/build is screwing up my cufflinks run

mmcgo002

Member

Join Date: Nov 2011

Posts: 10
- Share
- Tweet
#1

Problem with agreement between cow annotation/build is screwing up my cufflinks run

11-30-2011, 10:45 AM

I mapped my RNA-Seq data using TopHat as well as the build (.ebtw) and annotation (.gtf) of the cow (Bos taurus) genome that is offered on the Bowtie website.

I seems that the gene build and the annotation differ somewhat and this issue is effecting transitioning from top hat to cufflinks.

I am getting an error message when I try to run cufflinks: "Error: sort order of reads in BAMs must be the same"

I tried sorting to no avail, and also tried to organize the .gtf file to make sure the chromosomes were in sync.

However, there are some unmapped regions that are included in the bowtie build but are not in the annotation.

Here is a portion of the header of my bam which shows a bunch of these regions:

@SQ SN:7180002026074 LN:2973
@SQ SN:7180002026116 LN:16115
@SQ SN:7180002026122 LN:6940
@SQ SN:7180002026132 LN:11037
@SQ SN:7180002026137 LN:6826
@SQ SN:7180002026154 LN:29991
@SQ SN:7180002026155 LN:14744
@SQ SN:7180002026160 LN:14503
@SQ SN:7180002026163 LN:8520
@SQ SN:7180002026164 LN:4564
@SQ SN:7180002026165 LN:12396
@SQ SN:7180002026166 LN:2711
@SQ SN:7180002026167 LN:7838
@SQ SN:7180002026168 LN:2523
@SQ SN:7180002026169 LN:3411
@SQ SN:7180002026170 LN:33473
@SQ SN:7180002026174 LN:6660
@SQ SN:7180002026177 LN:6771
@SQ SN:7180002026182 LN:5593
@SQ SN:7180002026183 LN:5759
@SQ SN:7180002026184 LN:4565
@SQ SN:7180002026194 LN:9952
@SQ SN:7180002026199 LN:6586
@SQ SN:7180002026202 LN:8074
@SQ SN:7180002026209 LN:7446
@SQ SN:7180002026216 LN:25672
@SQ SN:7180002026232 LN:37008
@SQ SN:7180002026250 LN:55516
@SQ SN:7180002026254 LN:11038
@SQ SN:7180002026256 LN:5833
@SQ SN:7180002026264 LN:25023
@SQ SN:7180002026278 LN:7842
@SQ SN:Chr1 LN:158337067
@SQ SN:Chr10 LN:104305016
@SQ SN:Chr11 LN:107310763
@SQ SN:Chr12 LN:91163125
@SQ SN:Chr13 LN:84240350
@SQ SN:Chr14 LN:84648390
@SQ SN:Chr15 LN:85296676
@SQ SN:Chr16 LN:81724687
@SQ SN:Chr17 LN:75158596
@SQ SN:Chr18 LN:66004023
@SQ SN:Chr19 LN:64057457
@SQ SN:Chr2 LN:137060424
@SQ SN:Chr20 LN:72042655
@SQ SN:Chr21 LN:71599096
@SQ SN:Chr22 LN:61435874
@SQ SN:Chr23 LN:52530062
@SQ SN:Chr24 LN:62714930
@SQ SN:Chr25 LN:42904170
@SQ SN:Chr26 LN:51681464
@SQ SN:Chr27 LN:45407902
@SQ SN:Chr28 LN:46312546
@SQ SN:Chr29 LN:51505224
@SQ SN:Chr3 LN:121430405
@SQ SN:Chr4 LN:120829699
@SQ SN:Chr5 LN:121191424
@SQ SN:Chr6 LN:119458736
@SQ SN:Chr7 LN:112638659
@SQ SN:Chr8 LN:113384836
@SQ SN:Chr9 LN:105708250
@SQ SN:ChrX LN:148823899
@PG ID:TopHat VN:1.3.3 CL:./tophat --num-threads 6 --mate-inner-dist 191 --mate-std-dev 51 --solexa1.3-quals -o tophat_out_Bos2 --keep-tmp -G /Users/Papio/Annotations/Bos_taurus.UMD3.1.64.gtf /Users/Papio/bowtie-0.12.7/indexes/b_taurus Cow.fq_1 Cow.fq_2

It seems the best course would be to remove these regions so that the build and annotation are in sync. So my question is how do you remove these regions using samtools--I've tried to figure it out, but I'm still unclear.

Thank you
Tags: annotation, cow, cufflinks, rnaseq, tophat
mmcgo002

Member

Join Date: Nov 2011

Posts: 10
- Share
- Tweet
#2

12-01-2011, 08:06 PM

Never mind--problem solved--I built the genome from Ensembl and used the annotation from there.
Comment

Previous template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 13 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Problem with agreement between cow annotation/build is screwing up my cufflinks run

Comment

Latest Articles

ad_right_rmr

News