Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Is this segemehl error due to memory?

    Hi I am fairly new to RNA-seq.
    I am trying to analyze my data using segemehl but am running into following error. (I've cut and pasted the last part of the output.)
    [SEGEMEHL] Fri Jul 24 19:16:53 2015: 1637977 reads in thread 0.
    [SEGEMEHL] Fri Jul 24 19:16:53 2015: 1637824 reads in thread 1.
    [SEGEMEHL] Fri Jul 24 19:16:53 2015: 1637824 reads in thread 2.
    [SEGEMEHL] Fri Jul 24 19:16:53 2015: 1637824 reads in thread 3.
    segemehl.x: libs/biofiles.c:1160: bl_fastxAddMate: Assertion `bl_fastaCheckMateID(f, n, descr, descrlen)' failed.

    My job commend is
    segemehl.x --silent -i hg19.idx -d human_hg19.fa -q READ1 -p READ2 -O -o sege.sam -u unmap.sam -D 1 -t 4

    One of my question was if I submit the job by chromosome to reduce the memory load how can segemehl map reads that align to different chromosomes?

    I read in some posting I should use the full reference file for but this will lead to significant increase in mapping time and memory requirement.
    How do I find the right balance?

    Thank you in advance

  • #2
    There is no "right balance". You need to map to the full reference if you want correct results.

    I can't advise you on that error message, but your command certainly looks strange. Is that the actual command, or are you substituting "READ1" and "READ2" for the filenames?
    Last edited by Brian Bushnell; 07-27-2015, 09:30 AM.

    Comment


    • #3
      If this error occurs, segemehl cannot assign mate2 to mate1. Are the reads in both your files in correct order? Do they have matching read ids (at least the beginning of the id)? Do you have the same number of reads in the mate1 file and the mate2 file?

      If you did adapter clipping and/or quality trimming, assure that you do it for both files together and not separated in two calls. You can use bbduk to trim paired-end reads without loosing the mate1-mate2-connection.
      ecSeq Bioinformatics is Europe’s leading provider of hands-on bioinformatics workshops and professional data analysis in the field of Next-Generation Sequencing (NGS).

      Comment


      • #4
        Thank you for the reply.

        Brian Bushnell : Yes the READ1 and READ2 are being substituted with actual fastq file names.
        Are there more strange things you could find in my commend? please let me know.

        ecSeq Bioinformatics : I was using a Alientrimmer and I believe it does not do read ID matching. I am sure that is the problem.
        Thank you.

        Comment


        • #5
          Hi Him26,
          Have you solved the problem? I'm using segemehl and meet the problem too. I don't do any trimming to my fastq file and I have checked that the reads in both my files are in correct order. I really appreciate any help.
          Thank you.

          Comment


          • #6
            Nope

            I got caught up with other issue and have not followed up on this matter. sorry about this. Do let me know if you find out anything.

            Comment


            • #7
              Segemehl tries to find the two mates that belong together by checking the fastq identifiers.

              They have to be:
              1. completely identical,
              2. contain identical substring (everything before the first whitespace), or
              3. identical with a '/1', or a '/2' at their ends
              ecSeq Bioinformatics is Europe’s leading provider of hands-on bioinformatics workshops and professional data analysis in the field of Next-Generation Sequencing (NGS).

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              24 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              25 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              21 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              52 views
              0 likes
              Last Post seqadmin  
              Working...
              X