Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merge sam/bam file

    Hello all,
    I have pair-end reads (1.fq and 2.fq ) data and mapped them with 'bwa aln', I don't want to generate alignments in the SAM format in paired-end way with 'bwa sampe', but generate sam format with 'bwa samse' respectively, and then merge the two sam file

    bwa aln reference.fa 1.fq > 1.sai
    bwa aln reference.fa 2.fq > 2.sai
    bwa samse reference.fa 1.sai 1.fq > 1.sam
    bwa samse reference.fa 2.sai 2.fq > 2.sam

    How to merge 1.sam and 2.sam?

    can someone give me suggestions? thanks!

  • #2
    Originally posted by genelab View Post
    Hello all,
    I have pair-end reads (1.fq and 2.fq ) data and mapped them with 'bwa aln', I don't want to generate alignments in the SAM format in paired-end way with 'bwa sampe', but generate sam format with 'bwa samse' respectively, and then merge the two sam file

    bwa aln reference.fa 1.fq > 1.sai
    bwa aln reference.fa 2.fq > 2.sai
    bwa samse reference.fa 1.sai 1.fq > 1.sam
    bwa samse reference.fa 2.sai 2.fq > 2.sam

    How to merge 1.sam and 2.sam?

    can someone give me suggestions? thanks!
    How do you want them merged (what is your desired result)? What is your motivation for not using "bwa sampe"?

    Comment


    • #3
      Originally posted by nilshomer View Post
      How do you want them merged (what is your desired result)? What is your motivation for not using "bwa sampe"?

      I want them merged into one sam file which contains the mapping results of both the pair-ends,
      or
      Should i convert the sam two bam, and then merge the two?

      Comment


      • #4
        Originally posted by genelab View Post
        I want them merged into one sam file which contains the mapping results of both the pair-ends,
        or
        Should i convert the sam two bam, and then merge the two?
        Again, why not use "sampe"? Alternatively, if each paired-read read has the same name, you could sort by read name and run "samtools fixmate".

        Comment


        • #5
          Originally posted by nilshomer View Post
          Again, why not use "sampe"? Alternatively, if each paired-read read has the same name, you could sort by read name and run "samtools fixmate".
          I had also used the "sampe" to get the sam result containing both read mapping information, however, I found many mapped read records contain softly clipped alignment, such as the sam records having cigar "59M16S", 32M43S" or "3S45M27S".
          These reads are consided to be mapped reads in "sampe result". but these reads will not have "mapped records" if i use "samse".

          In addtion, i don't need the pair-end information, and just need the mapping information.

          Thanks!

          Comment


          • #6
            Originally posted by genelab View Post
            I had also used the "sampe" to get the sam result containing both read mapping information, however, I found many mapped read records contain softly clipped alignment, such as the sam records having cigar "59M16S", 32M43S" or "3S45M27S".
            These reads are consided to be mapped reads in "sampe result". but these reads will not have "mapped records" if i use "samse".

            In addtion, i don't need the pair-end information, and just need the mapping information.

            Thanks!
            Did you try "samtools merge"?

            Comment


            • #7
              Originally posted by nilshomer View Post
              Did you try "samtools merge"?
              "samtools merge" can merge the bam file, thanks!

              Comment


              • #8
                Originally posted by genelab View Post
                I had also used the "sampe" to get the sam result containing both read mapping information, however, I found many mapped read records contain softly clipped alignment, such as the sam records having cigar "59M16S", 32M43S" or "3S45M27S".
                These reads are consided to be mapped reads in "sampe result". but these reads will not have "mapped records" if i use "samse".

                In addtion, i don't need the pair-end information, and just need the mapping information.

                Thanks!

                You can use sampe with -s option to disable Smith-Waterman alignments of unmapped mates. I think that's where you are getting those clipped sequences.

                Comment


                • #9
                  Originally posted by hl450 View Post
                  You can use sampe with -s option to disable Smith-Waterman alignments of unmapped mates. I think that's where you are getting those clipped sequences.

                  This is a great suggestion, sampe with -s option got the same bam records as the "samtools merge" result merged from two separate end sam/bam files.

                  Can i ask other question, what is the reason of the soft clipping? Our data is RNA-Seq fq reads, is the soft clipped come from the exon junctions?


                  Thanks for your great help!

                  Comment


                  • #10
                    I think sam files can be merged as :
                    cat sam1 sam2 > sam

                    and then sort it.

                    Am i right?

                    Comment


                    • #11
                      You'll end up with both sam file headers if you just use cat. You could do this:

                      Code:
                      cat sam1 <(grep -v '^@' sam2) > merged_sam.sam
                      /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
                      Salk Institute for Biological Studies, La Jolla, CA, USA */

                      Comment


                      • #12
                        Originally posted by genelab View Post
                        Can i ask other question, what is the reason of the soft clipping? Our data is RNA-Seq fq reads, is the soft clipped come from the exon junctions?
                        I'm not sure if it's the only case but yes. BWA is not an RNA-seq mapper in a strict sense. It doesn't attempt to discover reads that align across splice junctions. It's also not a "local" aligner otherwise it wouldn't fail to align reads across exons that are shorter than the read lengths (which I've seen myself).

                        Not sure if you're aware of it but BWA has a new aligner packaged with it called 'bwa mem'. This aligner is more powerful for aligning RNA-seq data to a genome. It will soft-clip reads as well. If these aligners don't soft clip reads you'll end up losing a lot of data. Something like 20 or 30% of the exons in the mm10 gene annotation are shorter than 100 bp which is a pretty typical read length.
                        /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
                        Salk Institute for Biological Studies, La Jolla, CA, USA */

                        Comment


                        • #13
                          samtools merge

                          but which is the exact sintaxis when you have eight files bam and you want to have one file?

                          Comment


                          • #14
                            am creating a sam file, with 2 fastq files, using bwa
                            I apply the following command

                            bwa mem -t 2 GRCh38.primary_assembly.genome.fa.gz V350019555_L03_B5GHUMqcnrRAABA-556_1.fq.gz
                            V350019555_L03_B5GHUMqcnrRAABA-556_2.fq.gz
                            > V350019555_L03_B5GHUMqcnrRAABA-556.sam



                            The alignment begins to be generated, but after 15 hours, the following appears, and it generates a 0-bit sam file

                            V350019555L3C006R0051051033 16 chr21 9651911 29 150M * TGTTTTCTCTTTTTACAGCAAGGACAGTTTATTAAACACACTGTTTTAAACCTTATTTTTTCTCTAAATAATATTTCATATAAGTCAATATGTAGACTTTATGTATTTTTAAAATGAAGACATAGCATTCCATTGAACTTAGACAAAAAT EFFFFFFEGFFFFGEFF9FEFFFFGGFCFFFFGFFFFFFDFFFFFFFFFFFFFGAFFFCGBFDF@G@FGFFGFGFFFFGFFFFGFGFCFFFFFFFDGFGFGFFGFFFFFAFFFFFFEFFFFFFFFFFGDFFFF@FG8FF?FFFFFFGFFF NM:i:1 MD:Z:23G126 AS:i:145 XS:i:131 XA:Z:chr4,+189704487,150M,4;
                            [main] Version: 0.7.17-r1188
                            [main] CMD: bwa mem -t 2 GRCh38.primary_assembly.genome.fa.gz V350019555_L03_B5GHUMqcnrRAABA-556_1.fq.gz
                            [main] Real time: 26995.284 sec; CPU: 33394.341 sec
                            zsh: command not found: V350019555_L03_B5GHUMqcnrRAABA-556_2.fq.gz



                            I have previously analyzed the files with fasqc, and they seem correct


                            What is happening, what should I do?

                            Comment


                            • #15
                              bwa 2 files fasq to 1 fil sam

                              I am creating a sam file, with 2 fastq files, using bwa
                              I apply the following command

                              bwa mem -t 2 GRCh38.primary_assembly.genome.fa.gz V350019555_L03_B5GHUMqcnrRAABA-556_1.fq.gz
                              V350019555_L03_B5GHUMqcnrRAABA-556_2.fq.gz
                              > V350019555_L03_B5GHUMqcnrRAABA-556.sam



                              The alignment begins to be generated, but after 15 hours, the following appears, and it generates a 0-bit sam file

                              V350019555L3C006R0051051033 16 chr21 9651911 29 150M * TGTTTTCTCTTTTTACAGCAAGGACAGTTTATTAAACACACTGTTTTAAACCTTATTTTTTCTCTAAATAATATTTCATATAAGTCAATATGTAGACTTTATGTATTTTTAAAATGAAGACATAGCATTCCATTGAACTTAGACAAAAAT EFFFFFFEGFFFFGEFF9FEFFFFGGFCFFFFGFFFFFFDFFFFFFFFFFFFFGAFFFCGBFDF@G@FGFFGFGFFFFGFFFFGFGFCFFFFFFFDGFGFGFFGFFFFFAFFFFFFEFFFFFFFFFFGDFFFF@FG8FF?FFFFFFGFFF NM:i:1 MD:Z:23G126 AS:i:145 XS:i:131 XA:Z:chr4,+189704487,150M,4;
                              [main] Version: 0.7.17-r1188
                              [main] CMD: bwa mem -t 2 GRCh38.primary_assembly.genome.fa.gz V350019555_L03_B5GHUMqcnrRAABA-556_1.fq.gz
                              [main] Real time: 26995.284 sec; CPU: 33394.341 sec
                              zsh: command not found: V350019555_L03_B5GHUMqcnrRAABA-556_2.fq.gz


                              I have previously analyzed the files with fasqc, and they seem correct


                              What is happening, what should I do?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              27 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              30 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              26 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X