Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cufflinks error - nonsense gene merge

    Hello All,
    I am new to bioinformatics and am (trying to) use Cufflinks for the first time. My input to Cufflinks is an accepted_hits.sam file with ~19 million reads - generated (apparently without error) from Tophat. When I run cufflinks (cufflinks -o Results accepted_hits.sam) I first get a "Counting hits in map" message, and then "Error:nonsense gene merge. Exiting". By an iteriative process of truncating my input file, I find that Cufflinks apparently does not like a line (~ 4 millionth) in the input file.

    My accepted_hits.sam file at the error point looks like this (below). Shown are 9 reads - Cufflinks seems to generate the error message with 6th read shown. I have tried just eliminating this one line from the accepted_hits.sam file - I still get the same error (perhaps from some later line.) I have also tried Cufflinks with output from different files coming from Tophat - I consistently get this same error. (Cufflinks does run fine with the test file supplied...)

    Thanks for any help with this

    HWI-EAS288_8_2_20_941_1818_0 0 gi|51511724|ref|NC_000008.9|NC_000008 111450969 255 64M * 0 0 ATAGCATCTTCCC
    AGCTTCCATCTCCCTACAGTCCATCNTATTCAAGTCTTTAGCTATTTTGGA B@BBBB@BBBBABABA@@@A@@AA@??B;=7:=@?>A;%;>AB@?6?:?==?=@?@>?=@?>>8 NM:i:2
    HWI-EAS288_8_2_117_405_131_0 0 gi|51511724|ref|NC_000008.9|NC_000008 111450982 255 42M * 0 0 AGCTTCCATCTCC
    CTACAGTCCATCATATTCAAGTCTTTAGC <:A>9=,/8;297=;=;1208778=:2-2650'462-3586? NM:i:0
    HWI-EAS288_8_1_9_672_1871_0 0 gi|51511724|ref|NC_000008.9|NC_000008 111530035 255 62M * 0 0 GGGAAACATGGTG
    AAACCCTGTTTCTACTAAAAATACAAAAATTAGCCAGCTGTGGTGGCAA 6CCBBCCCCCB>BBBCCBAC@BCCCBBBCB@A>BCBBB@BACBAB<ABBB>B<>@??;%8@; NM:i:1
    HWI-EAS288_8_2_79_444_2024_0 0 gi|51511724|ref|NC_000008.9|NC_000008 111637724 1 22M17834N536870911M * 0 0 GCAGCAACAGCGGCAGCGGCA ABAAAB@@@>AAABAB?ABAA NM:i:2 XS:A:+ NS:i:2
    HWI-EAS288_8_2_79_444_2024_0 0 gi|51511724|ref|NC_000008.9|NC_000008 111637724 1 22M17840N536870911M * 0 0 GCAGCAACAGCGGCAGCGGCA ABAAAB@@@>AAABAB?ABAA NM:i:2 XS:A:+ NS:i:2
    HWI-EAS288_8_2_79_444_2024_0 0 gi|51511724|ref|NC_000008.9|NC_000008 111637724 1 22M5744N536870911M * 0 0 GCAGCAACAGCGGCAGCGGCA ABAAAB@@@>AAABAB?ABAA NM:i:2 XS:A:+ NS:i:2
    HWI-EAS288_8_2_56_390_1555_0 0 gi|51511724|ref|NC_000008.9|NC_000008 111643298 255 42M * 0 0 CAGCAAACCACCA
    TGGCCCACATTTACCTATGTAACAAATCA BCBCCCCCCCBBCA;>C73-?CCBC@CBBBBCACBCCACBCC NM:i:1
    HWI-EAS288_8_2_50_1601_1261_0 0 gi|51511724|ref|NC_000008.9|NC_000008 111809773 255 53M * 0 0 CTATTCTATACCA
    TTCCATTCCATTCCATTCCATTCCATTCCATGCCATTCCA BCBBCCBAC:CCCCBBACCBCBBBBBB@BBBBBB?BAAB@BA6>ABBAAABB1 NM:i:2
    HWI-EAS288_8_1_100_1688_1517_0 16 gi|51511724|ref|NC_000008.9|NC_000008 111829914 255 76M * 0 0 AGCCTTCAGTCTG
    TGGCCAAAGGCCCAAGGGTCCCCAGCGAACCACTGGTGTAAGTCCAAGAGTCCGAAGGCTGAG =+,:9===?9;AAA>?B=??A=>9>?A>A?A@BAAABBAAA?ABABBBAABABBBABBBBBBBBBABBBBBBBBBB NM:i:
    0

  • #2
    I think the CIGAR field in the 3 lines below caused the problem.

    HWI-EAS288_8_2_79_444_2024_0 0 gi|51511724|ref|NC_000008.9|NC_000008 111637724 1 22M17834N536870911M * 0 0 GCAGCAACAGCGGCAGCGGCA ABAAAB@@@>AAABAB?ABAA NM:i:2 XS:A:+ NS:i:2
    HWI-EAS288_8_2_79_444_2024_0 0 gi|51511724|ref|NC_000008.9|NC_000008 111637724 1 22M17840N536870911M * 0 0 GCAGCAACAGCGGCAGCGGCA ABAAAB@@@>AAABAB?ABAA NM:i:2 XS:A:+ NS:i:2
    HWI-EAS288_8_2_79_444_2024_0 0 gi|51511724|ref|NC_000008.9|NC_000008 111637724 1 22M5744N536870911M * 0 0 GCAGCAACAGCGGCAGCGGCA ABAAAB@@@>AAABAB?ABAA NM:i:2 XS:A:+ NS:i:2
    Xi Wang

    Comment


    • #3
      #!/usr/bin/perl

      use warnings;
      use strict;
      while (<>) {
      chomp;
      my @parts = split /\t/;
      if ( ( $parts[5]=~/(\d+)M\d+N(\d+)M/)&&($1>100)&&($2>100)) {
      next;
      }
      else {
      print "$_\n";
      }
      }

      I have ever faced this problem. You can use this perl code to process your reads.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Today, 08:47 AM
      0 responses
      10 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      60 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      57 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      53 views
      0 likes
      Last Post seqadmin  
      Working...
      X