Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cufflinks error - nonsense gene merge

    Hello All,
    I am new to bioinformatics and am (trying to) use Cufflinks for the first time. My input to Cufflinks is an accepted_hits.sam file with ~19 million reads - generated (apparently without error) from Tophat. When I run cufflinks (cufflinks -o Results accepted_hits.sam) I first get a "Counting hits in map" message, and then "Error:nonsense gene merge. Exiting". By an iteriative process of truncating my input file, I find that Cufflinks apparently does not like a line (~ 4 millionth) in the input file.

    My accepted_hits.sam file at the error point looks like this (below). Shown are 9 reads - Cufflinks seems to generate the error message with 6th read shown. I have tried just eliminating this one line from the accepted_hits.sam file - I still get the same error (perhaps from some later line.) I have also tried Cufflinks with output from different files coming from Tophat - I consistently get this same error. (Cufflinks does run fine with the test file supplied...)

    Thanks for any help with this

    HWI-EAS288_8_2_20_941_1818_0 0 gi|51511724|ref|NC_000008.9|NC_000008 111450969 255 64M * 0 0 ATAGCATCTTCCC
    AGCTTCCATCTCCCTACAGTCCATCNTATTCAAGTCTTTAGCTATTTTGGA B@BBBB@BBBBABABA@@@A@@AA@??B;=7:=@?>A;%;>AB@?6?:?==?=@?@>?=@?>>8 NM:i:2
    HWI-EAS288_8_2_117_405_131_0 0 gi|51511724|ref|NC_000008.9|NC_000008 111450982 255 42M * 0 0 AGCTTCCATCTCC
    CTACAGTCCATCATATTCAAGTCTTTAGC <:A>9=,/8;297=;=;1208778=:2-2650'462-3586? NM:i:0
    HWI-EAS288_8_1_9_672_1871_0 0 gi|51511724|ref|NC_000008.9|NC_000008 111530035 255 62M * 0 0 GGGAAACATGGTG
    AAACCCTGTTTCTACTAAAAATACAAAAATTAGCCAGCTGTGGTGGCAA 6CCBBCCCCCB>BBBCCBAC@BCCCBBBCB@A>BCBBB@BACBAB<ABBB>B<>@??;%8@; NM:i:1
    HWI-EAS288_8_2_79_444_2024_0 0 gi|51511724|ref|NC_000008.9|NC_000008 111637724 1 22M17834N536870911M * 0 0 GCAGCAACAGCGGCAGCGGCA ABAAAB@@@>AAABAB?ABAA NM:i:2 XS:A:+ NS:i:2
    HWI-EAS288_8_2_79_444_2024_0 0 gi|51511724|ref|NC_000008.9|NC_000008 111637724 1 22M17840N536870911M * 0 0 GCAGCAACAGCGGCAGCGGCA ABAAAB@@@>AAABAB?ABAA NM:i:2 XS:A:+ NS:i:2
    HWI-EAS288_8_2_79_444_2024_0 0 gi|51511724|ref|NC_000008.9|NC_000008 111637724 1 22M5744N536870911M * 0 0 GCAGCAACAGCGGCAGCGGCA ABAAAB@@@>AAABAB?ABAA NM:i:2 XS:A:+ NS:i:2
    HWI-EAS288_8_2_56_390_1555_0 0 gi|51511724|ref|NC_000008.9|NC_000008 111643298 255 42M * 0 0 CAGCAAACCACCA
    TGGCCCACATTTACCTATGTAACAAATCA BCBCCCCCCCBBCA;>C73-?CCBC@CBBBBCACBCCACBCC NM:i:1
    HWI-EAS288_8_2_50_1601_1261_0 0 gi|51511724|ref|NC_000008.9|NC_000008 111809773 255 53M * 0 0 CTATTCTATACCA
    TTCCATTCCATTCCATTCCATTCCATTCCATGCCATTCCA BCBBCCBAC:CCCCBBACCBCBBBBBB@BBBBBB?BAAB@BA6>ABBAAABB1 NM:i:2
    HWI-EAS288_8_1_100_1688_1517_0 16 gi|51511724|ref|NC_000008.9|NC_000008 111829914 255 76M * 0 0 AGCCTTCAGTCTG
    TGGCCAAAGGCCCAAGGGTCCCCAGCGAACCACTGGTGTAAGTCCAAGAGTCCGAAGGCTGAG =+,:9===?9;AAA>?B=??A=>9>?A>A?A@BAAABBAAA?ABABBBAABABBBABBBBBBBBBABBBBBBBBBB NM:i:
    0

  • #2
    I think the CIGAR field in the 3 lines below caused the problem.

    HWI-EAS288_8_2_79_444_2024_0 0 gi|51511724|ref|NC_000008.9|NC_000008 111637724 1 22M17834N536870911M * 0 0 GCAGCAACAGCGGCAGCGGCA ABAAAB@@@>AAABAB?ABAA NM:i:2 XS:A:+ NS:i:2
    HWI-EAS288_8_2_79_444_2024_0 0 gi|51511724|ref|NC_000008.9|NC_000008 111637724 1 22M17840N536870911M * 0 0 GCAGCAACAGCGGCAGCGGCA ABAAAB@@@>AAABAB?ABAA NM:i:2 XS:A:+ NS:i:2
    HWI-EAS288_8_2_79_444_2024_0 0 gi|51511724|ref|NC_000008.9|NC_000008 111637724 1 22M5744N536870911M * 0 0 GCAGCAACAGCGGCAGCGGCA ABAAAB@@@>AAABAB?ABAA NM:i:2 XS:A:+ NS:i:2
    Xi Wang

    Comment


    • #3
      #!/usr/bin/perl

      use warnings;
      use strict;
      while (<>) {
      chomp;
      my @parts = split /\t/;
      if ( ( $parts[5]=~/(\d+)M\d+N(\d+)M/)&&($1>100)&&($2>100)) {
      next;
      }
      else {
      print "$_\n";
      }
      }

      I have ever faced this problem. You can use this perl code to process your reads.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM
      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 03-27-2024, 06:37 PM
      0 responses
      13 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-27-2024, 06:07 PM
      0 responses
      11 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-22-2024, 10:03 AM
      0 responses
      53 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-21-2024, 07:32 AM
      0 responses
      69 views
      0 likes
      Last Post seqadmin  
      Working...
      X