Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • [Cufflinks] Multi-Hit error

    Hi, I'm running the latest build of Cufflinks (v1.0.3) on an Illumina paired-end RNA-seq library and its throwing me an error.

    [chiakhb@pegasus biasCorrect_multiRead]$ cufflinks --output-dir brain_l8 --num-threads 10 --GTF-guide ~/ref_genome/hg18/misc/hg18.gtf --frag-bias-correct ~/ref_genome/hg18/misc/hg18.fa --multi-read-correct /mnt/ScratchPool/burt/MAQC_HBR/Tophat/brain_l8/accepted_hits.bam
    cufflinks: /lib64/libz.so.1: no version information available (required by cufflinks)
    You are using Cufflinks v1.0.3, which is the most recent release.
    [17:57:40] Loading reference annotation.
    [17:57:44] Inspecting reads and determining fragment length distribution.
    > Processed 135078 loci. [*************************] 100%
    > Map Properties:
    > Total Map Mass: 7131868.26
    > Number of Multi-Reads: 1 (with 458350 total hits)
    > Read Type: 54bp x 54bp
    > Fragment Length Distribution: Empirical (learned)
    > Estimated Mean: 207.08
    > Estimated Std Dev: 23.29
    [18:00:08] Assembling transcripts and initializing abundances for multi-read correction.
    > Processing Locus chr1:227643666-227710711 [** ] 8%
    ERROR: Multi-Hit not found (227634990,227635040).

    Anyone has encountered a similar error?

    Regards
    -burt

  • #2
    I got the same error, using the GFF guide option..

    Here's what the error looks like..



    chr9:130812841-130820923 Finding a maximum matching to collapse scaffolds
    chr9:130812841-130820923 Will collapse 5 scaffolds
    chr9:130812841-130820923 Starting new collapse round
    chr9:130812841-130820923 Calculating scaffold densities
    intron: [130820321-130820447], 0
    intron: [130820321-130820447], 0
    intron: [130820321-130820447], 0
    intron: [130820321-130820447], 0
    intron: [130820321-130820447], 0
    intron: [130820321-130820447], 0
    intron: [130820321-130820447], 0
    intron: [130820321-130820447], 0
    intron: [130820321-130820447], 0
    intron: [130820321-130820447], 0
    intron: [130820321-130820447], 0
    intron: [130820321-130820447], 0
    intron: [130820321-130820447], 0
    intron: [130820321-130820447], 0
    intron: [130820321-130820447], 0
    intron: [130820321-130820447], 0
    intron: [130820321-130820447], 0
    intron: [130820321-130820447], 0
    chr9:130812841-130820923 Creating compatibility graph
    chr9:130812841-130820923 Performing final collapse round
    Extracted 8 contiguous transfrags from 8 scaffolds
    chr9:130812841-130820923 Starting new collapse round
    Convergence reached in 14 iterations
    Importance sampling posterior distribution

    ERROR: Multi-Hit not found (123409527,123409562).


    Brute - force deleting the entry should probably help, but I would like to know why the program crashes for not havin found this multihit, instead of just ignoring it..

    Thanks
    Shrutii

    Comment


    • #3
      I get the same Multi-Hit error as well while analyzing Illumina paired end data.
      I'm using cufflinks 1.1.0 linked against Boost version 104700 on bam files generated by tophat 1.3.3.
      I'm running command line: cufflinks -p 4 -u -b Sscrofa10.2.fa accepted_hits.bam

      Half of my 10 samples run just fine, 4 fail with the Multi-hit error, and one generates a segmentation fault, all during the MLE step. I have found that I only get these errors when using the -u/--multi-read-correct option (with or without the -b option).

      Comment


      • #4
        This line in the output

        Number of Multi-Reads: 1 (with 458350 total hits)

        indicates that there is only one multi-mapping read with a huge number of different mappings.
        This could be caused by using the same query name for all reads (or a large part of it) in the input file. Please note that the SAM specification does assume that the string specified in the query name field identifies a read, the only exception being "*" if the query name is unknown.

        Please check the input file and make sure that the query name (first column in SAM output) is either "*" or really something that identifies a read.

        Comment


        • #5
          I am bumping this few months old thread because I have a similar problem and in need of some help. I also get the "Multi-Hit not found"-error message on one of my test samples when using Cufflinks. I am using the --GTF-guide, --frag-bias-correct, --multi-read-correct and the --upper-quartile-norm parameters.

          Here is the prompt output message:

          Code:
          You are using Cufflinks v1.3.0, which is the most recent release.
          [13:19:01] Loading reference annotation.
          [13:19:02] Inspecting reads and determining fragment length distribution.
          > Processed 66947 loci.                        [*************************] 100%
          > Map Properties:
          >	Upper Quartile: 21.00
          >	Number of Multi-Reads: 386605 (with 950358 total hits)
          >	Fragment Length Distribution: Empirical (learned)
          >	              Estimated Mean: 161.13
          >	           Estimated Std Dev: 42.61
          [13:21:07] Assembling transcripts and initializing abundances for multi-read correction.
          > Processed 66947 loci.                        [*************************] 100%
          [13:35:17] Loading reference annotation and sequence.
          [13:35:27] Learning bias parameters.
          > Processing Locus scaffold_4:12410956-1241332 [***********              ]  44%
          WARNING: Multi-Hit not found (12414511,12416689).
          
          WARNING: Multi-Hit not found (12414511,12416689).
          
          WARNING: Multi-Hit not found (12414511,12416689).
          
          WARNING: Multi-Hit not found (12414511,12416689).
          > Processed 32442 loci.                        [*************************] 100%
          [13:37:20] Re-estimating abundances with bias and multi-read correction.
          > Processing Locus scaffold_4:12410956-1241332 [***********              ]  44%
          WARNING: Multi-Hit not found (12414511,12416689).
          
          WARNING: Multi-Hit not found (12414511,12416689).
          
          WARNING: Multi-Hit not found (12414511,12416689).
          
          WARNING: Multi-Hit not found (12414511,12416689).
          > Processing Locus scaffold_4:12413851-1241839 [***********              ]  44%
          WARNING: Multi-Hit not found (12414511,12416689).
          
          WARNING: Multi-Hit not found (12414511,12416689).
          > Processed 32442 loci.                        [*************************] 100%
          When looking in the gtf-file I can not find a gene spanning those two exact coordinates. Question: Are these warning messages anything I should worry about? And why does it only occur for some samples?

          Comment


          • #6
            Originally posted by glados View Post
            I am bumping this few months old thread because I have a similar problem and in need of some help. I also get the "Multi-Hit not found"-error message on one of my test samples when using Cufflinks. I am using the --GTF-guide, --frag-bias-correct, --multi-read-correct and the --upper-quartile-norm parameters.


            When looking in the gtf-file I can not find a gene spanning those two exact coordinates. Question: Are these warning messages anything I should worry about? And why does it only occur for some samples?

            hi glados,
            I am using Cufflinks 2.0.2 with GTF guide, frag-bias-correct, multi-read-correct, total-hits-norm on a Strand-specific human RNA-Seq data.
            I am getting this same warning of multi-hit not found. Did you find answer to your query? As in yours, Cufflinks doesn't terminate, just returns warnings.
            Should these be worried about? I read somewhere that this warning/ error is generated when using multi-read-correct. Is that right?

            Code:
            You are using Cufflinks v2.0.2, which is the most recent release.
            [11:44:02] Loading reference annotation.
            [11:44:06] Inspecting reads and determining fragment length distribution.
            > Processed 155919 loci.                       [*************************] 100%
            > Map Properties:
            >	Normalized Map Mass: 57703206.56
            >	Raw Map Mass: 57703206.56
            >	Number of Multi-Reads: 1548338 (with 3999071 total hits)
            >	Fragment Length Distribution: Empirical (learned)
            >	              Estimated Mean: 157.64
            >	           Estimated Std Dev: 32.05
            [12:03:00] Assembling transcripts and initializing abundances for multi-read correction.
            > Processed 155919 loci.                       [*************************] 100%
            [14:23:07] Loading reference annotation and sequence.
            [14:23:45] Learning bias parameters.
            > Processing Locus chr16:222845-223709         [********                 ]  32%
            WARNING: Multi-Hit not found (226782,227067).
            
            WARNING: Multi-Hit not found (226782,227067).
            
            WARNING: Multi-Hit not found (226782,227067).
            
            WARNING: Multi-Hit not found (226782,227067).
            > Processing Locus chr16:226678-227520         [********                 ]  32%
            WARNING: Multi-Hit not found (226782,227067).
            
            WARNING: Multi-Hit not found (226782,227067).
            > Processing Locus chr8:6844699-6847243        [*********************    ]  85%
            WARNING: Multi-Hit not found (6854483,6855187).
            
            WARNING: Multi-Hit not found (6854483,6855187).
            
            WARNING: Multi-Hit not found (6854483,6855187).
            
            WARNING: Multi-Hit not found (6854483,6855187).
            > Processing Locus chr8:6854287-6856724        [*********************    ]  85%
            WARNING: Multi-Hit not found (6854483,6855187).
            
            WARNING: Multi-Hit not found (6854483,6855187).
            > Processed 26008 loci.                        [*************************] 100%
            [14:41:54] Re-estimating abundances with bias and multi-read correction.
            > Processing Locus chr16:222845-223709         [********                 ]  32%
            WARNING: Multi-Hit not found (226782,227067).
            
            WARNING: Multi-Hit not found (226782,227067).
            
            WARNING: Multi-Hit not found (226782,227067).
            
            WARNING: Multi-Hit not found (226782,227067).
            > Processing Locus chr16:576846-577407         [********                 ]  32%
            WARNING: Multi-Hit not found (226782,227067).
            
            WARNING: Multi-Hit not found (226782,227067).
            > Processing Locus chr8:6844699-6847243        [*********************    ]  85%
            WARNING: Multi-Hit not found (6854483,6855187).
            
            WARNING: Multi-Hit not found (6854483,6855187).
            
            WARNING: Multi-Hit not found (6854483,6855187).
            
            WARNING: Multi-Hit not found (6854483,6855187).
            > Processing Locus chr8:6863802-6866346        [*********************    ]  85%
            WARNING: Multi-Hit not found (6854483,6855187).
            
            WARNING: Multi-Hit not found (6854483,6855187).
            > Processed 26008 loci.                        [*************************] 100%

            Comment


            • #7
              Hello. I did not get any answers to that question, but I did not receive this warning in Cufflinks 2, so for me it's not a problem anymore at least. Hope you figure it out!

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              11 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              51 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              68 views
              0 likes
              Last Post seqadmin  
              Working...
              X