Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with agreement between cow annotation/build is screwing up my cufflinks run

    I mapped my RNA-Seq data using TopHat as well as the build (.ebtw) and annotation (.gtf) of the cow (Bos taurus) genome that is offered on the Bowtie website.

    I seems that the gene build and the annotation differ somewhat and this issue is effecting transitioning from top hat to cufflinks.

    I am getting an error message when I try to run cufflinks: "Error: sort order of reads in BAMs must be the same"

    I tried sorting to no avail, and also tried to organize the .gtf file to make sure the chromosomes were in sync.

    However, there are some unmapped regions that are included in the bowtie build but are not in the annotation.

    Here is a portion of the header of my bam which shows a bunch of these regions:

    @SQ SN:7180002026074 LN:2973
    @SQ SN:7180002026116 LN:16115
    @SQ SN:7180002026122 LN:6940
    @SQ SN:7180002026132 LN:11037
    @SQ SN:7180002026137 LN:6826
    @SQ SN:7180002026154 LN:29991
    @SQ SN:7180002026155 LN:14744
    @SQ SN:7180002026160 LN:14503
    @SQ SN:7180002026163 LN:8520
    @SQ SN:7180002026164 LN:4564
    @SQ SN:7180002026165 LN:12396
    @SQ SN:7180002026166 LN:2711
    @SQ SN:7180002026167 LN:7838
    @SQ SN:7180002026168 LN:2523
    @SQ SN:7180002026169 LN:3411
    @SQ SN:7180002026170 LN:33473
    @SQ SN:7180002026174 LN:6660
    @SQ SN:7180002026177 LN:6771
    @SQ SN:7180002026182 LN:5593
    @SQ SN:7180002026183 LN:5759
    @SQ SN:7180002026184 LN:4565
    @SQ SN:7180002026194 LN:9952
    @SQ SN:7180002026199 LN:6586
    @SQ SN:7180002026202 LN:8074
    @SQ SN:7180002026209 LN:7446
    @SQ SN:7180002026216 LN:25672
    @SQ SN:7180002026232 LN:37008
    @SQ SN:7180002026250 LN:55516
    @SQ SN:7180002026254 LN:11038
    @SQ SN:7180002026256 LN:5833
    @SQ SN:7180002026264 LN:25023
    @SQ SN:7180002026278 LN:7842
    @SQ SN:Chr1 LN:158337067
    @SQ SN:Chr10 LN:104305016
    @SQ SN:Chr11 LN:107310763
    @SQ SN:Chr12 LN:91163125
    @SQ SN:Chr13 LN:84240350
    @SQ SN:Chr14 LN:84648390
    @SQ SN:Chr15 LN:85296676
    @SQ SN:Chr16 LN:81724687
    @SQ SN:Chr17 LN:75158596
    @SQ SN:Chr18 LN:66004023
    @SQ SN:Chr19 LN:64057457
    @SQ SN:Chr2 LN:137060424
    @SQ SN:Chr20 LN:72042655
    @SQ SN:Chr21 LN:71599096
    @SQ SN:Chr22 LN:61435874
    @SQ SN:Chr23 LN:52530062
    @SQ SN:Chr24 LN:62714930
    @SQ SN:Chr25 LN:42904170
    @SQ SN:Chr26 LN:51681464
    @SQ SN:Chr27 LN:45407902
    @SQ SN:Chr28 LN:46312546
    @SQ SN:Chr29 LN:51505224
    @SQ SN:Chr3 LN:121430405
    @SQ SN:Chr4 LN:120829699
    @SQ SN:Chr5 LN:121191424
    @SQ SN:Chr6 LN:119458736
    @SQ SN:Chr7 LN:112638659
    @SQ SN:Chr8 LN:113384836
    @SQ SN:Chr9 LN:105708250
    @SQ SN:ChrX LN:148823899
    @PG ID:TopHat VN:1.3.3 CL:./tophat --num-threads 6 --mate-inner-dist 191 --mate-std-dev 51 --solexa1.3-quals -o tophat_out_Bos2 --keep-tmp -G /Users/Papio/Annotations/Bos_taurus.UMD3.1.64.gtf /Users/Papio/bowtie-0.12.7/indexes/b_taurus Cow.fq_1 Cow.fq_2

    It seems the best course would be to remove these regions so that the build and annotation are in sync. So my question is how do you remove these regions using samtools--I've tried to figure it out, but I'm still unclear.

    Thank you

  • #2
    Never mind--problem solved--I built the genome from Ensembl and used the annotation from there.

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Essential Discoveries and Tools in Epitranscriptomics
      by seqadmin




      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
      04-22-2024, 07:01 AM
    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, Yesterday, 11:49 AM
    0 responses
    13 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-24-2024, 08:47 AM
    0 responses
    16 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    61 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    60 views
    0 likes
    Last Post seqadmin  
    Working...
    X