Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bwa sampe on multiple samples at once

    I'm looking to loop through many samples using the bwa sampe command.

    Here is what the command looks like for one sample:

    bwa sampe ref.fasta read1.sai read2.sai read1.fastq.gz read2.fastq.gz | samtools view -bS - > sample.bam

    I have read1.sai, read2.sai, and ref.fasta in the same directory, but not fastq files. I have no problem using a for loop on singleton files. I've tried different things with paired-end reads on the above command but nothing seems to work (I'm not a seasoned programmer). Could someone possibly provide an example for the above command? Apologies if this question is obvious or answered elsewhere.

  • #2
    You are using short file names in your command. This will only work, if the files are located in the current working directory. If your FASTQ files reside in another directory, you must pass their full path (either relative to the working directory or absolute).

    Comment


    • #3
      Thanks for the suggestion. Would it be possible to provide a quick example?

      Comment


      • #4
        Are you asking for an example for @piet's comment or for the actual loop?

        For @piet's comment:

        Replace /provide_path_to/ part with real or relative paths from your computer.

        Code:
        $ bwa sampe ref.fasta read1.sai read2.sai /provide_path_to/read1.fastq.gz /provide_path_to/read2.fastq.gz | samtools view -bS - > sample.bam

        Comment


        • #5
          Sorry, I meant for the actual loop.

          Comment


          • #6
            See this: https://www.biostars.org/p/98222/

            You would need to change the actual command and parsing criteria based on your file names. If you post example file names we can help with the commands.

            @piet has another example for grabbing the basenames from files which you can then use in your command: https://www.biostars.org/p/176887/#176889
            Last edited by GenoMax; 02-15-2016, 12:09 PM.

            Comment


            • #7
              1. fastq naming: taxon86-READ1.fastq.gz + taxon86-READ2.fastq.gz and taxon24-READ1.fastq.gz + taxon24-READ2.fastq.gz, etc.

              2. sai file naming: taxon86-read1.sai + taxon86-read2.sai and taxon24-read1.sai + taxon24-read2.sai, etc.

              3. sai files are in the current working directory where bwa sampe command is executed

              4. fastq files are in a different directory: /Users/juanc/projectscx/reads/taxon*/clean/

              (* is used to denote multiple directories under the "reads" directory corresponding to the different sample names)
              Last edited by juanc; 02-15-2016, 06:52 PM.

              Comment


              • #8
                I am going to assume that you are in the directory where you have the sai files. Since I don't have your files use the following at your own risk :-)

                NOTE: How many of these sample directories are there (I am using only one of these directories I have called it Sample_ONE).

                Step 1: Grab a listing of the Read1 files for one sample

                Code:
                $ ls -1 /Users/juanc/projectscx/reads/sample_ONE/clean/*READ1*.fastq.gz > sample_ONE_read1_list
                Step 2: See if the following works

                Code:
                for i in $(cat sample_ONE_read1_list | sed s'/\-READ1.fastq.gz//'); 
                do bwa sampe ref.fasta $i\-READ1.fastq.sai $i\-READ2.fastq.sai /Users/juanc/projectscx/reads/sample_ONE/clean/$i\-READ1.fastq.gz /Users/juanc/projectscx/reads/sample_ONE/clean/$i\-READ2.fastq.gz | samtools view -bS - >  $i\.bam; done

                Comment


                • #9
                  I updated my previous post.
                  Last edited by juanc; 02-15-2016, 05:04 PM.

                  Comment


                  • #10
                    Disclaimers in the last post still apply

                    Step 1: Grab a listing of the Read1 files

                    Code:
                    $ ls -1r /Users/juanc/projectscx/reads/taxon*/clean/*READ1*.fastq.gz > taxon_read1_list
                    Step 2: Verify that the taxon_read1_list looks good

                    Code:
                    $ more taxon_read1_list
                    Step 3: I think the following should work
                    Code:
                    for i in $(cat taxon_read1_list | sed s'/\-READ1.fastq.gz//'); 
                    do bwa sampe ref.fasta $i\-READ1.sai $i\-READ2.sai /Users/juanc/projectscx/reads/$i/clean/$i\-READ1.fastq.gz /Users/juanc/projectscx/reads/$i/clean/$i\-READ2.fastq.gz | samtools view -bS - >  $i\.bam; done

                    Comment


                    • #11
                      So I executed the commands and the output report was telling me that it was looking for the .sai files in the "clean" directories. I decided to do steps 1-4 (below) in those directories and move over output files to a different directory for downstream analysis (not shown). Steps 3-4 were generously provided by @GenoMax.

                      Step 1: Generate alignments for READ1
                      Code:
                      for i in /Users/juanc/projectscx/reads/taxon*/clean/*READ1.fastq.gz;
                      do bwa aln /Users/juanc/projectscx/mapping/ref.fasta $i > ${i%READ1.fastq.gz}read1.sai; done
                      Step 2: Generate alignments for READ2
                      Code:
                      for i in /Users/juanc/projectscx/reads/taxon*/clean/*READ2.fastq.gz;
                      do bwa aln /Users/juanc/projectscx/mapping/ref.fasta $i > ${i%READ2.fastq.gz}read2.sai; done
                      Step 3: Create a list of READ1.fastq files
                      Code:
                      ls -1r /Users/juanc/projectscx/reads/taxon*/clean/*READ1*.fastq.gz > taxon_read1_list
                      Step 4: Join read pairs and convert sam to bam
                      Code:
                      for i in $(cat taxon_read1_list | sed s'/\-READ1.fastq.gz//');
                      do bwa sampe /Users/juanc/projectscx/mapping/ref.fasta $i\-READ1.sai $i\-READ2.sai $i\-READ1.fastq.gz $i\-READ2.fastq.gz | samtools view -bS - >  $i\.bam; done
                      Last edited by juanc; 02-16-2016, 10:37 AM.

                      Comment


                      • #12
                        Reviving old thread, new problem!

                        Heya,
                        Not sure if anyone will still be monitoring this thread, but I have the same question and am running into a different problem with my bashing:
                        So I started by running bwa aln for R1 and R2:
                        Code:
                        for i in /Users/xxx/Desktop/Index_align/Fastq/*R1_001.fastq.gz; \
                        do bwa aln -t 16 /Users/xxx/Desktop/Index_align/GRCh37_latest_genomic.fna $i > ${i%R1_001.fastq.gz}read1.sai; done
                        Code:
                        for i in /Users/xxx/Desktop/Index_align/Fastq/*R2_001.fastq.gz; \
                        do bwa aln -t 16 /Users/xxx/Desktop/Index_align/GRCh37_latest_genomic.fna $i > ${i%R2_001.fastq.gz}read1.sai; done
                        Making the list:
                        Code:
                        ls -lr /Users/xxx/Desktop/Index_align/Fastq/fastq_run4/*R1_001.fastq.gz > index_read1_list
                        However running the following attempt for join read pairs and convert sam to bam:
                        Code:
                        for i in $(cat index_read1_list | sed s'/\-R1_001.fastq.gz//');
                        do bwa sampe /Users/xxx/Desktop/Index_align/GRCh37_latest_genomic.fna $i\-read1.sai $i\-read2.sai $i\-R1_001.fastq.gz $i\-R2_001.fastq.gz | samtools view -bS - >  $i\.bam; done
                        throws errors for not being able to find the file names:
                        [bwa_sai2sam_pe_core] fail to open file '-rwx-------read1.sai' : No such file or directory
                        etc.
                        Which seems to point to the list specifying the wrong files? Any ideas how to fix this? Thanks!

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Techniques and Challenges in Conservation Genomics
                          by seqadmin



                          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                          Avian Conservation
                          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                          03-08-2024, 10:41 AM
                        • seqadmin
                          The Impact of AI in Genomic Medicine
                          by seqadmin



                          Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                          02-26-2024, 02:07 PM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 03-14-2024, 06:13 AM
                        0 responses
                        33 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-08-2024, 08:03 AM
                        0 responses
                        72 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-07-2024, 08:13 AM
                        0 responses
                        81 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-06-2024, 09:51 AM
                        0 responses
                        68 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X