Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • multiple runs and maq

    How do you do an alignment (./ maq map command) in maq when you have run more than one lane on a Solexa machine on the same sample?
    Can you simply append the reads,

    ./maq map n.read1.bfq in.read2.bfq in.read3.bfq etc etc?


    Cheers
    L

  • #2
    Layla,

    If you have tried that command by now you know it would never work.

    At most 2 bfq files can be given and these are assumed to contain paired-end reads. If one file is given then just single-lane.

    So, if you have all your *sequence.txt files use a for loop on these astq files:

    for file in `ls *sequence.txt`
    do
    maq fastq2bfq $file $file.bfq
    maq map $file.map genome.bfa $file.bfq
    done

    Comment


    • #3
      Zee,

      Would it be possible to parallelize this over several CPU cores in a simple manner? Kind of like PBS for cluster jobs, but locally.

      cheers
      D

      Comment


      • #4
        Dakl,

        I use novoalign for most of my mutli-core jobs, but it should be possible to do something similar with maq.

        If you have a large database to search you might run into some problems. I would split all my files into batches of N-1, N= no. CPUs - I like to keep one free for system, IO,etc.


        ls *.fastq | split -l <N-1> BATCH


        Then for each batch run a loop

        for file in `cat BATCH...`
        do
        maq fastq2bfq $file $file.bfq
        maq map $file.map genome.bfa $file.bfq
        done &
        And dont forget the "&" which places each loop in the background.

        Comment


        • #5
          Hi zee,

          Thankyou for your reply but I have been receiving a bizarre error:Assertion failed: (fp_bfa), function ma_match, file match.cc, line 516

          for file in `ls *.bfq`
          do
          ./maq map $file.map genome.bfa $file.bfq
          done

          From the paired end experiment I have a total of 3 pairs stored in one folder:
          s_1.bfq s_2.bfq
          t_1.bfq t_2.bfq
          u_1.bfq u_2.bfq

          I didnt understand how/where the ./maq map loop looks at s_1.bfq s_2.bfq and then t_1.bfq t_2.bfq, u_1.bfq u_2.bfq.

          Thank you for your help

          L

          Comment


          • #6
            OK, it is a simple change to the ff:

            Code:
            for base in `echo s t u`; do
              ./maq map $base.map genome.bfa $base"_1.bfq" $base"_2.bfq"
            done

            Originally posted by Layla View Post
            Hi zee,

            Thankyou for your reply but I have been receiving a bizarre error:Assertion failed: (fp_bfa), function ma_match, file match.cc, line 516

            for file in `ls *.bfq`
            do
            ./maq map $file.map genome.bfa $file.bfq
            done

            From the paired end experiment I have a total of 3 pairs stored in one folder:
            s_1.bfq s_2.bfq
            t_1.bfq t_2.bfq
            u_1.bfq u_2.bfq

            I didnt understand how/where the ./maq map loop looks at s_1.bfq s_2.bfq and then t_1.bfq t_2.bfq, u_1.bfq u_2.bfq.

            Thank you for your help

            L

            Comment


            • #7
              Thanx Zee. Since multiple runs are carried out to increase the number of reads, why is it that a separate .map file is being created for each pair (so a total of 3)? Is the purpose not to merge all the pairs and generate a single .map file to increase genome coverage?

              Actually whilst writing, is this where I can use the ./maq merge command?

              Cheers
              L

              Comment


              • #8
                You got it ... after you've generated all the maps, use maq merge to combine them into one map, from which you can generate a pileup, consensus, etc ...

                Comment


                • #9
                  Hi all,

                  Since Maq is optimized for ~2M reads as input, I managed to do the following:

                  Code:
                  time maq fastq2bfq -n 2000000 ../50a_fastq.single.fastq 50a
                  to create several bfq-files containing the reads, and then use the perl module Parallel::ForkManager to fork the process. See the script below for details.

                  Code:
                  #!/usr/bin/perl -w
                  
                  use strict;
                  use Parallel::ForkManager;
                  
                  my $pm = new Parallel::ForkManager(4); # number of parallel processes is 4
                  while(<>){
                          chomp;
                  
                          # Forks and returns the pid for the child:
                          my $pid = $pm->start and next; 
                          
                          qx/ maq match -c $_.map ~\/hg18\/hg18.bfa $_/; 
                          
                      $pm->finish; # Terminates the child process
                  }

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  27 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  30 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  26 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  52 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X