Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merge multiple *.sai files in bwa ?

    I have illumina paired-end multiple-lane_reads of same individual which are distributed in 3 lanes like below;

    C09FNACXX_EGLOB-140092_GTCAGT_L008_R1_001.fastq
    C09FNACXX_EGLOB-140092_GTCAGT_L008_R1_002.fastq
    C09FNACXX_EGLOB-140092_GTCAGT_L008_R2_001.fastq
    C09FNACXX_EGLOB-140092_GTCAGT_L008_R2_002.fastq

    D0H2WACXX_EGLOB-140092_GTCAGT_L001_R1_001.fastq
    D0H2WACXX_EGLOB-140092_GTCAGT_L001_R1_002.fastq
    D0H2WACXX_EGLOB-140092_GTCAGT_L001_R2_001.fastq
    D0H2WACXX_EGLOB-140092_GTCAGT_L001_R2_002.fastq

    D0H2WACXX_EGLOB-140092_GTCAGT_L002_R1_001.fastq
    D0H2WACXX_EGLOB-140092_GTCAGT_L002_R1_002.fastq
    D0H2WACXX_EGLOB-140092_GTCAGT_L002_R2_001.fastq
    D0H2WACXX_EGLOB-140092_GTCAGT_L002_R2_002.fastq

    They are all representing one individual (140092).
    I am mapping these with BWA.
    Question:
    Are R1_001 and R2_001 in each lane the read-pairs?
    if yes, I will probably able to do the aln command as following:
    bwa aln refrencegenome.fa C09FNACXX_EGLOB-140092_ATATGA_L008_R1_001.fastq > aln_L008_1_1.sai
    bwa aln refrencegenome.fa C09FNACXX_EGLOB-140092_ATATGA_L008_R1_002.fastq > aln_L008_1_2.sai
    bwa aln refrencegenome.fa C09FNACXX_EGLOB-140092_ATATGA_L008_R2_001.fastq > aln_L008_2_1.sai
    bwa aln refrencegenome.fa C09FNACXX_EGLOB-140092_ATATGA_L008_R2_002.fastq > aln_L008_2_2.sai
    .
    ..
    ...
    if yes, then how can I merge them in bwa sampe?
    the original command is: bwa sampe ref.fa aln1.sai aln2.sai R1.fq R.fq > aln.sam
    Here I have a confusion, actually I don't have aln1.sai aln2.sai R1.fq R2.fq but instead I will have;
    aln1_1.sai aln1_2.sai aln2_1.sai aln2_2.sai for each lane.
    So how I can merge all the many *.sai files into a single final_aln.sam?

    My be I should consider R1_001 and R2_001 as paired reads and do the bwa sampe command as following;
    bwa sampe ref.fa aln_L008_1_1.sai aln_L008_2_1.sai L008_R1_001.fastq L008_R2_001.fastq > aln_L008_1.sam
    bwa sampe ref.fa aln_L008_1_2.sai aln_L008_2_2.sai L008_R1_002.fastq L008_R2_002.fastq > aln_L008_2.sam
    .
    ..
    ..
    and convert them to BAM files
    and then merge them with samtools like;
    samtools merge final.bam aln_L008_1.bam aln_L008_2.bam aln_L001_1.bam aln_L001_2.bam aln_L002_1.bam aln_L002_2.bam


    I hope the question is clear.
    I will be thankful to anyone who could help me with this question.
    Cheers, Hossein.
    Last edited by hosseinv; 09-09-2012, 10:38 PM.

  • #2
    Or you could just concatenate the R1 fastq files together into one file, repeat in the same order for the R2 fastq fles and then you have two fastq files for the one individual and can go down the normal process of mapping the reads without having to merge alignments later.

    Comment


    • #3
      I've heard combining fastq files is not really recommended as there might be issues like fragment length penalties and ... so best to merge bam files. So I think you are agree with the way I wrote above, yes?

      Comment


      • #4
        Originally posted by hosseinv View Post
        I've heard combining fastq files is not really recommended as there might be issues like fragment length penalties and ... so best to merge bam files. So I think you are agree with the way I wrote above, yes?
        Could you elaborate on this fragment length penalty issue? I've only seen this referenced with respect to Novalign, and you're using bwa.

        What I suggested is the most parsimonious solution and certainly wouldn't have suggested it if I wouldn't do it myself

        Comment


        • #5
          Hi Bukowski. As I mentioned I'd just heard about that but you are right it happens with Novoalign. it should'nt be same problem with BWA.

          Could you please tell me a website or manual describing BWA and SAMtools options in detail as those in http://bio-bwa.sourceforge.net/ and http://samtools.sourceforge.net/ are not sometimes clarified.

          Thanks again for your attention.

          Hosseinv

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          30 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          32 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          28 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          53 views
          0 likes
          Last Post seqadmin  
          Working...
          X