![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to extract mapped and unmapped raw reads from bwa's sam file ? | vaibhavvsk | Bioinformatics | 11 | 02-07-2013 10:01 AM |
BWA rescue of multi-mapping or unmapped reads | kjlee | Bioinformatics | 6 | 07-13-2012 11:29 PM |
BWA output bitwise flag for mapped/unmapped reads | wenhuang | Bioinformatics | 1 | 08-29-2011 04:54 PM |
BWA, mostly unmapped reads | Michael.James.Clark | Bioinformatics | 4 | 03-03-2011 03:20 PM |
What are the unmapped reads | beelu | Illumina/Solexa | 1 | 09-09-2010 06:18 AM |
![]() |
|
Thread Tools |
![]() |
#1 | ||
Member
Location: Paris Join Date: Jun 2010
Posts: 28
|
![]()
Hello everyone,
I'm trying to combine the use of bwa and blat but I have a few difficulties to extract the unmapped reads from bwa. It appears they are in the first .sam file created using bwa bwasw. I used the command below to filter the SAM file and extract unaligned reads into the file unalntest.bam: Quote:
By the way, I have this message using "samtools view..." (above): Quote:
Thanks. |
||
![]() |
![]() |
![]() |
#2 | ||
Senior Member
Location: 41°17'49"N / 2°4'42"E Join Date: Oct 2008
Posts: 323
|
![]() Quote:
Code:
$ samtools view -b -S -f 4 input.sam > output.bam Quote:
http://seqanswers.com/forums/showpos...72&postcount=2
__________________
-drd |
||
![]() |
![]() |
![]() |
#3 |
Member
Location: Paris Join Date: Jun 2010
Posts: 28
|
![]()
I have resolved the problem concerning the error message, thanks.
But for the other one, I'm still try to figure out where the problem is. I've tried your command (btw, I didn't paste perfectly mine ("-f 4" is missing...) in my first post), but the problem remains... Any idea? Last edited by Adamo; 06-21-2010 at 07:49 AM. |
![]() |
![]() |
![]() |
#4 | ||
Member
Location: Edinburgh Join Date: Dec 2008
Posts: 13
|
![]()
if you type
Code:
samtools view -? Code:
2. SAM->BAM conversion: `samtools view -bT ref.fa in.sam.gz'. Quote:
Quote:
|
||
![]() |
![]() |
![]() |
#5 |
Member
Location: Paris Join Date: Jun 2010
Posts: 28
|
![]()
I think the command's ok but still, I have no read in the output file. I don't get it!
I have only about 2% of mapped reads in my .sam file according to flagstat... I've tried both commands you suggest. Is there something I have forgotten? |
![]() |
![]() |
![]() |
#6 |
Member
Location: Edinburgh Join Date: Dec 2008
Posts: 13
|
![]()
It should work ...
Can you paste your commands and the first few lines of your sam file |
![]() |
![]() |
![]() |
#7 | ||
Member
Location: Paris Join Date: Jun 2010
Posts: 28
|
![]()
The first lines of my sam file are:
Quote:
Quote:
|
||
![]() |
![]() |
![]() |
#8 |
Member
Location: Edinburgh Join Date: Dec 2008
Posts: 13
|
![]()
I'm wondering if you have unmapped read in your sam file to beggin with.
try Code:
samtools view -bT ref.fa sam_file | samtools sort - bam_file (with no .bam extension) then samtools flagstat bam_file Then you can extract what you want from the bam file with Code:
samtools view bam_file -f 4 | less or which ever flag you want but if your original file does not contain any unmapped read then this is not going to help |
![]() |
![]() |
![]() |
#9 | |
Member
Location: Paris Join Date: Jun 2010
Posts: 28
|
![]()
Well, I'm a little confused.
My sam file contains only the reads mapped. The thing is that I read on a topic a post from lh3 (who know what he speaks about, I think!) telling someone that all the informations concerning unmapped reads where in the sam file. I assumed he talked about the one created using: Quote:
Otherwise, which one was lh3 speaking about? Or which one are you speaking about? |
|
![]() |
![]() |
![]() |
#10 |
Member
Location: Edinburgh Join Date: Dec 2008
Posts: 13
|
![]()
I'm sure Heng li knows what he's talking about so you should check your commands and log messages, parameters used, maybe rerun the alignment command (and keep the command you're running).
If it really does not output the unmapped read then send a email to the bwa mailing list. and try to be a bit more details on what you've done and what you have otherwise you won't get any help. |
![]() |
![]() |
![]() |
#11 |
Member
Location: Paris Join Date: Jun 2010
Posts: 28
|
![]()
I'll try the mailing list, thanks for the tip.
I've already run bwa several times, with default parameters and followed the steps suggested by the readme. Nothing more, nothing less... I'm quite a newbie here, excuse me for not being precise enough. Last edited by Adamo; 06-22-2010 at 02:59 AM. |
![]() |
![]() |
![]() |
#12 |
Member
Location: Paris Join Date: Jun 2010
Posts: 28
|
![]()
After further researches, it seems like bwasw doesn't include any unmapped read in the output file, unlike bwa aln.
I think I might have to write a script which could handle such a job. Thanks for your help anyway! |
![]() |
![]() |
![]() |
#13 |
Junior Member
Location: Cambridge MA Join Date: Feb 2010
Posts: 8
|
![]()
I confirm that I find that 'bwa samse/sampe' does output the unaligned reads in the sam file while 'bwa bwasw' does not. This difference in behavior is quite odd to me. Is there a reason for it? This is not even mentioned in the bwa manual. But then, bwa is still at version 0.5.8a, so I should not expect too much. I would be really interested in seeing what is being left after aligning all the reads with samse and bwasw, using blat or blast.
|
![]() |
![]() |
![]() |
#14 |
Junior Member
Location: Cambridge MA Join Date: Feb 2010
Posts: 8
|
![]()
This simple patch:
Code:
--- bwtsw2_aux.c +++ bwtsw2_aux.c @@ -520,8 +520,10 @@ bsw2_resolve_query_overlaps(b[0], opt.mask_level); } else b[1] = 0; // generate CIGAR and print SAM - gen_cigar(&opt, l, seq, pac, b[0]); - print_hits(bns, &opt, p, b[0]); + if(b[0]->n > 0) { + gen_cigar(&opt, l, seq, pac, b[0]); + print_hits(bns, &opt, p, b[0]); + } // free free(seq[0]); bsw2_destroy(b[0]); @@ -583,6 +585,7 @@ for (i = 0; i < _seq->n; ++i) { bsw2seq1_t *p = _seq->seq + i; if (p->sam) printf("%s", p->sam); + else printf("%s\t4\t*\t0\t0\t*\t*\t0\t0\t%s\t%s\n", p->name, p->seq, p->qual); free(p->name); free(p->seq); free(p->qual); free(p->sam); p->tid = -1; p->l = 0; p->name = p->seq = p->qual = p->sam = 0; Last edited by freeseek; 10-07-2010 at 04:09 PM. |
![]() |
![]() |
![]() |
#15 |
Member
Location: Germany Join Date: Mar 2009
Posts: 28
|
![]()
This thread contains a perl script to get reads that are in a .fq file but not in a sam file. It was written with TopHat in mind but it may work for you too.
I'm not sure if it helps. http://seqanswers.com/forums/showthr...eferrerid=2547 |
![]() |
![]() |
![]() |
#16 |
Junior Member
Location: Cambridge MA Join Date: Feb 2010
Posts: 8
|
![]()
@Pepe I am sure you can make a script to perform the same task, though it is conceptually easier to avoid dropping the unmapped reads while you are processing them with 'bwa bwasw' one by one rather than figuring out later what you dropped. I also wanted to mention that occasionally 'bwa bwasw' reports multiple alignments in the sam file for the same read. The only way to modify that behavior at the moment is to get into the source code.
|
![]() |
![]() |
![]() |
#17 | |
Member
Location: Sydney, Australia Join Date: Nov 2011
Posts: 24
|
![]() Quote:
I'm trying to align my sample illumina bacterial genome to an already published reference bacterial genome. The aim is to find out what genes are present in my sample genome, but missing in the reference. The plan of action was to run bwasw, and then pull out the sequence in the sample genome that do not align to the reference, then look at what these sequences are. I aligned my query genome to the ref using bwasw. The output was in sam format. I converted that to bam format using samtools view -bS aln.sam > aln.bam Then I ran the command lines quoted above. When I headed my mapped.sam file, it appears to have worked, however, my unmapped.sam file is empty,the size is zero. This is consistent for two of my sample genomes. Reading this thread, it seems that bwasw does not give out unmapped reads, thus I was trying your patch. This is the message I'm getting: bgansw:~/bwa$ patch < bwtsw2_aux.diff can't find file to patch at input line 3 Perhaps you should have used the -p or --strip option? The text leading up to this was: -------------------------- |--- bwtsw2_aux.c |+++ bwtsw2_aux.c -------------------------- File to patch: I'm probably going to ask a very silly question. I'm a newbie, so am not good with scripts. What should I enter in file to patch? Do I have to make changes or input a file name in the script? I've got paired end illumina hiseq data. Any help would be greatly appreciated. bgansw |
|
![]() |
![]() |
![]() |
#18 |
Member
Location: Sydney, Australia Join Date: Nov 2011
Posts: 24
|
![]()
Hi Freeseek,
I inputted my faq file in the file to patch. That gave me .fastq.orig file and .fastq.rej Copy pasting the .orig file bgansw:~/bwa$ head 6_combined_paired.faq.orig @GAPC_0027_FC:7:1:4477:1032#TGACCA/1 AGTATGGGGGAAATGCAAGCAAGCATTNATGCTAGTGGTTGTCAAATAGTTACCGTAGCTGTGCGCA + hhhhhhhhhhdhhghhhhhhhheddeeBededaad^dfdfffhhhhhhfhhfhhfceffdfbhhhad @GAPC_0027_FC:7:1:4477:1032#TGACCA/2 TGCTTCTAAGGTACCAATGGNNNCAGGTAACAGGTATTTAGGATCGGGTATGACCTNNNNTTTAACA + hhhhhhhhhhhefhfhf_]_BBB`]aa^acdcff`fffffffhghhgfafccY][ZBBBBW\\[[[[ @GAPC_0027_FC:7:1:4986:1032#TGACCA/1 GAAAATCCGAAAGAACCGGCAATGATTNTTTCTACGTTCCGTAAAGAAAGCAATGGTGATAAGCCTG Am copy pasting the .fastq.rej file: bgansw:~/bwa$ cat 6_combined_paired.faq.rej *************** *** 520,527 **** bsw2_resolve_query_overlaps(b[0], opt.mask_level); } else b[1] = 0; // generate CIGAR and print SAM - gen_cigar(&opt, l, seq, pac, b[0]); - print_hits(bns, &opt, p, b[0]); // free free(seq[0]); bsw2_destroy(b[0]); --- 520,529 ---- bsw2_resolve_query_overlaps(b[0], opt.mask_level); } else b[1] = 0; // generate CIGAR and print SAM + if(b[0]->n > 0) { + gen_cigar(&opt, l, seq, pac, b[0]); + print_hits(bns, &opt, p, b[0]); + } // free free(seq[0]); bsw2_destroy(b[0]); *************** *** 583,588 **** for (i = 0; i < _seq->n; ++i) { bsw2seq1_t *p = _seq->seq + i; if (p->sam) printf("%s", p->sam); free(p->name); free(p->seq); free(p->qual); free(p->sam); p->tid = -1; p->l = 0; p->name = p->seq = p->qual = p->sam = 0; --- 585,591 ---- for (i = 0; i < _seq->n; ++i) { bsw2seq1_t *p = _seq->seq + i; if (p->sam) printf("%s", p->sam); + else printf("%s\t4\t*\t0\t0\t*\t*\t0\t0\t%s\t%s\n", p->name, p->seq, p->qual); free(p->name); free(p->seq); free(p->qual); free(p->sam); p->tid = -1; p->l = 0; p->name = p->seq = p->qual = p->sam = 0; My apologies, since I dont know if I'm on the right track or not. Any help would be greatly appreciated. Regards bgansw |
![]() |
![]() |
![]() |
#19 |
Senior Member
Location: San Diego Join Date: May 2008
Posts: 912
|
![]()
If you have Illumina reads, just use bwa aln. It doesn't much matter how long your reads are, you are only aligning to a bacterial genome, how long can it take?
The post you are citing is a year old, do you know that it's suppsoed to work for whatever version of bwa you have? |
![]() |
![]() |
![]() |
#20 |
Member
Location: Sydney, Australia Join Date: Nov 2011
Posts: 24
|
![]()
Hi swbarnes2,
I ran bwa using the aln parameter, instead of the bwasw, but am not sure that its worked. The output .sam file is only 4K, while the bwasw output file was 796K. Additionally, the aln.sam file looks nothing like the bwasw.sam file. It doesnt contain reads, looks more like a .bam file. The aln.sam file also doesnt contain @SQ headers, while the bwasw file contained them, so I'm unable to convert the sam file to .bam. I tried samtools faidx ref.fa (but I get an error - [fai_build_core] different line length in sequence 'NZ_A01000034'. Segmentation fault So am stuck... I would really appreciate any help. |
![]() |
![]() |
![]() |
Thread Tools | |
|
|