Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • is my reference too big for maq?

    I am trying to convert my reference fasta file to bfa using maq with the command fasta2bfa.

    I am having a segmentation fault.

    My script has worked before with other old references but I just downloaded the new one from Ensmble and this one it doesn't work.

    I have noticed that it is a little bigger than usual.

    My reference is 11 Gb.

    Could it be that Maq cannot handle the file size or should I run my program with a bigger ram memory?

  • #2
    Run the program "fasta2bfa" and on another terminal check the memory usage every 15 - 20 seconds with the command: free -m (in megabytes) or free -g (in gigabytes) if the memory usage approaches the total Ram that your machine has just before the segmentation fault, That's your problem...

    Comment


    • #3
      According to my network administrator, I am using a node in the cluster that has 32 gigs of ram.

      So I find that the cluster is not the problem, it is more like a C++ limitation in the memory pointing.

      What other tests do you think i can run?

      I will try your post.

      I did though a little experiment splitting the file and 7 gigs is also too big. :S

      Comment


      • #4
        Hello,

        apparently is not a memory problem, since I checked for the memory resources just before the program crashes and there is a lot free.

        The program crashes just after line 166 in the fasta2bfa.c source file.

        Can somebody help me out?

        Comment


        • #5
          Originally posted by luisczul View Post
          I am trying to convert my reference fasta file to bfa using maq with the command fasta2bfa.

          My script has worked before with other old references but I just downloaded the new one from Ensmble and this one it doesn't work.
          In another forum there was a thread about someone having problems indexing the latest human assembly with blat. The problem was that the length of the new haplotype chromosomes pushed the overall genome length above what could be handled by a 32 bit pointer. There may be a similar issue with maq.

          Since the extra haplotype sequences are mostly poly-N (to keep the positions the same as the originals) you could either delete them altogether, or remove the Ns and see if things start working again.

          Comment


          • #6
            Solved

            static void ma_fasta2csfa_core(FILE *fpout, FILE *fpin)
            {
            seq_t seq;
            int i, c1, c2, c;
            char name[256], comment[4096];
            INIT_SEQ(seq);

            So finally the problem got solved.

            The reference file that I was downloading had a very big header file information.

            As you can see in the code i pasted from maq, there was a limitation of char name[256]. When the header went over that then a segmentation fault was created.

            Cheers hope this helps somebody.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin


              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
              Yesterday, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            55 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            52 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            45 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            55 views
            0 likes
            Last Post seqadmin  
            Working...
            X