SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to create a SAM/BAM file from scratch DNAjunk Bioinformatics 9 03-23-2012 07:23 AM
.SAM to .BAM with SAM file header @PG emilyjia2000 Bioinformatics 13 06-14-2011 12:21 PM
How to convert a bam file to sam file badhikari Bioinformatics 2 04-01-2011 08:56 AM
Bam and Sam don't like my fasta file mindlessbrain Bioinformatics 2 12-09-2010 10:47 PM
How to convert Eland file to BAM or SAM? fanshu Bioinformatics 0 10-30-2010 11:46 PM

Reply
 
Thread Tools
Old 07-25-2010, 07:50 AM   #1
genelab
Member
 
Location: honston

Join Date: Nov 2009
Posts: 27
Default Merge sam/bam file

Hello all,
I have pair-end reads (1.fq and 2.fq ) data and mapped them with 'bwa aln', I don't want to generate alignments in the SAM format in paired-end way with 'bwa sampe', but generate sam format with 'bwa samse' respectively, and then merge the two sam file

bwa aln reference.fa 1.fq > 1.sai
bwa aln reference.fa 2.fq > 2.sai
bwa samse reference.fa 1.sai 1.fq > 1.sam
bwa samse reference.fa 2.sai 2.fq > 2.sam

How to merge 1.sam and 2.sam?

can someone give me suggestions? thanks!
genelab is offline   Reply With Quote
Old 07-25-2010, 10:11 AM   #2
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by genelab View Post
Hello all,
I have pair-end reads (1.fq and 2.fq ) data and mapped them with 'bwa aln', I don't want to generate alignments in the SAM format in paired-end way with 'bwa sampe', but generate sam format with 'bwa samse' respectively, and then merge the two sam file

bwa aln reference.fa 1.fq > 1.sai
bwa aln reference.fa 2.fq > 2.sai
bwa samse reference.fa 1.sai 1.fq > 1.sam
bwa samse reference.fa 2.sai 2.fq > 2.sam

How to merge 1.sam and 2.sam?

can someone give me suggestions? thanks!
How do you want them merged (what is your desired result)? What is your motivation for not using "bwa sampe"?
nilshomer is offline   Reply With Quote
Old 07-25-2010, 05:43 PM   #3
genelab
Member
 
Location: honston

Join Date: Nov 2009
Posts: 27
Default

Quote:
Originally Posted by nilshomer View Post
How do you want them merged (what is your desired result)? What is your motivation for not using "bwa sampe"?

I want them merged into one sam file which contains the mapping results of both the pair-ends,
or
Should i convert the sam two bam, and then merge the two?
genelab is offline   Reply With Quote
Old 07-25-2010, 06:22 PM   #4
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by genelab View Post
I want them merged into one sam file which contains the mapping results of both the pair-ends,
or
Should i convert the sam two bam, and then merge the two?
Again, why not use "sampe"? Alternatively, if each paired-read read has the same name, you could sort by read name and run "samtools fixmate".
nilshomer is offline   Reply With Quote
Old 07-26-2010, 04:03 AM   #5
genelab
Member
 
Location: honston

Join Date: Nov 2009
Posts: 27
Default

Quote:
Originally Posted by nilshomer View Post
Again, why not use "sampe"? Alternatively, if each paired-read read has the same name, you could sort by read name and run "samtools fixmate".
I had also used the "sampe" to get the sam result containing both read mapping information, however, I found many mapped read records contain softly clipped alignment, such as the sam records having cigar "59M16S", 32M43S" or "3S45M27S".
These reads are consided to be mapped reads in "sampe result". but these reads will not have "mapped records" if i use "samse".

In addtion, i don't need the pair-end information, and just need the mapping information.

Thanks!
genelab is offline   Reply With Quote
Old 07-26-2010, 07:26 AM   #6
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by genelab View Post
I had also used the "sampe" to get the sam result containing both read mapping information, however, I found many mapped read records contain softly clipped alignment, such as the sam records having cigar "59M16S", 32M43S" or "3S45M27S".
These reads are consided to be mapped reads in "sampe result". but these reads will not have "mapped records" if i use "samse".

In addtion, i don't need the pair-end information, and just need the mapping information.

Thanks!
Did you try "samtools merge"?
nilshomer is offline   Reply With Quote
Old 07-28-2010, 06:12 PM   #7
genelab
Member
 
Location: honston

Join Date: Nov 2009
Posts: 27
Default

Quote:
Originally Posted by nilshomer View Post
Did you try "samtools merge"?
"samtools merge" can merge the bam file, thanks!
genelab is offline   Reply With Quote
Old 07-28-2010, 08:16 PM   #8
hl450
Junior Member
 
Location: IN, USA

Join Date: Jun 2010
Posts: 8
Default

Quote:
Originally Posted by genelab View Post
I had also used the "sampe" to get the sam result containing both read mapping information, however, I found many mapped read records contain softly clipped alignment, such as the sam records having cigar "59M16S", 32M43S" or "3S45M27S".
These reads are consided to be mapped reads in "sampe result". but these reads will not have "mapped records" if i use "samse".

In addtion, i don't need the pair-end information, and just need the mapping information.

Thanks!

You can use sampe with -s option to disable Smith-Waterman alignments of unmapped mates. I think that's where you are getting those clipped sequences.
hl450 is offline   Reply With Quote
Old 07-30-2010, 07:11 AM   #9
genelab
Member
 
Location: honston

Join Date: Nov 2009
Posts: 27
Default

Quote:
Originally Posted by hl450 View Post
You can use sampe with -s option to disable Smith-Waterman alignments of unmapped mates. I think that's where you are getting those clipped sequences.

This is a great suggestion, sampe with -s option got the same bam records as the "samtools merge" result merged from two separate end sam/bam files.

Can i ask other question, what is the reason of the soft clipping? Our data is RNA-Seq fq reads, is the soft clipped come from the exon junctions?


Thanks for your great help!
genelab is offline   Reply With Quote
Old 06-04-2013, 03:50 PM   #10
camelbbs
Member
 
Location: United States

Join Date: Jun 2011
Posts: 49
Default

I think sam files can be merged as :
cat sam1 sam2 > sam

and then sort it.

Am i right?
camelbbs is offline   Reply With Quote
Old 06-04-2013, 04:29 PM   #11
sdriscoll
I like code
 
Location: San Diego, CA, USA

Join Date: Sep 2009
Posts: 438
Default

You'll end up with both sam file headers if you just use cat. You could do this:

Code:
cat sam1 <(grep -v '^@' sam2) > merged_sam.sam
__________________
/* Shawn Driscoll, Gene Expression Laboratory, Pfaff
Salk Institute for Biological Studies, La Jolla, CA, USA */
sdriscoll is offline   Reply With Quote
Old 06-04-2013, 04:33 PM   #12
sdriscoll
I like code
 
Location: San Diego, CA, USA

Join Date: Sep 2009
Posts: 438
Default

Quote:
Originally Posted by genelab View Post
Can i ask other question, what is the reason of the soft clipping? Our data is RNA-Seq fq reads, is the soft clipped come from the exon junctions?
I'm not sure if it's the only case but yes. BWA is not an RNA-seq mapper in a strict sense. It doesn't attempt to discover reads that align across splice junctions. It's also not a "local" aligner otherwise it wouldn't fail to align reads across exons that are shorter than the read lengths (which I've seen myself).

Not sure if you're aware of it but BWA has a new aligner packaged with it called 'bwa mem'. This aligner is more powerful for aligning RNA-seq data to a genome. It will soft-clip reads as well. If these aligners don't soft clip reads you'll end up losing a lot of data. Something like 20 or 30% of the exons in the mm10 gene annotation are shorter than 100 bp which is a pretty typical read length.
__________________
/* Shawn Driscoll, Gene Expression Laboratory, Pfaff
Salk Institute for Biological Studies, La Jolla, CA, USA */
sdriscoll is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:00 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO