SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Extracting (Slicing) a BAM file ashkot Bioinformatics 4 12-13-2011 11:02 AM
Extracting reads from BAM files alpesh Bioinformatics 1 10-12-2011 02:59 PM
unmapped reads in Bowtie causing problems in SAMtools? wimufi SOLiD 9 09-29-2011 06:30 PM
Finding unmapped reads using samtools Ash Bioinformatics 2 10-28-2010 07:20 AM
Extracting unique reads from a .ma or .bam file? JohnK SOLiD 14 06-04-2010 12:32 AM

Reply
 
Thread Tools
Old 06-24-2011, 06:30 AM   #1
Lspoor
Member
 
Location: Fife, Scotland

Join Date: Mar 2010
Posts: 18
Unhappy extracting unmapped reads from BAM using Samtools?

Hi, I am trying to extract unmapped reads from my BWA sorted BAM file (paired end reads);

First I filter for the following things using these command lines which seems to work fine:
1) An unmapped read whose mate is mapped.
samtools view -u -f 4 -F264 alignments.bam > temp1.bam
2) A mapped read who's mate is unmapped
samtools view -u -f 8 -F 260 alignments.bam > temp2.bam
3) Both reads of the pair are unmapped
samtools view -u -f 12 -F 256 alignments.bam > temp3.bam

Then I try to merge the files and sort it so it's ordered by read name using the following command
samtools merge -u - temp[123].bam | samtools sort -n - unmapped

BUT I get this error message:
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] EOF marker is absent. The input is probably truncated.


My next step would be to convert the BAM to fastq but I can't seem to get past this error despite trying out various other options. I get the same message after just trying to samtools merge comman on its own?

I also tried the bam2fastq script from Hudson alpha but it only seemed to output reads where both in the pair had not mapped, but I want to extract all unmapped reads?

Any help MUCH appreciated!
Lspoor is offline   Reply With Quote
Old 06-24-2011, 06:51 AM   #2
volks
Member
 
Location: hd.de

Join Date: Jun 2010
Posts: 81
Default

Quote:
Originally Posted by Lspoor View Post
[bam_header_read] EOF marker is absent. The input is probably truncated.
you get this warning when you read an uncompressed BAM from hard drive. i dont think this is the error.

you should sort before merge.

also you might filter the same read into different files. im not sure if this doesnt cause problems if you merge them later on.
volks is offline   Reply With Quote
Old 06-24-2011, 06:55 AM   #3
Lspoor
Member
 
Location: Fife, Scotland

Join Date: Mar 2010
Posts: 18
Default

OK Thanks, maybe I have duplicated reads somewhere? I'll try sorting the files before merging them and see if I get anywhere.
Lspoor is offline   Reply With Quote
Old 06-24-2011, 07:42 AM   #4
Lspoor
Member
 
Location: Fife, Scotland

Join Date: Mar 2010
Posts: 18
Default

Thanks I got that to work, now I have a BAM file containing an alignment made from single (mixed) reads (whose corresponding pair was not of good enough quality to include in the analysis), so they are a mixture of reads sequenced in each direction ie Some have headers ending in 1: @EBRI093151_0019_FC:1:2:2016:3554#CGATGN/1 and some have headers ending in 2:@EBRI093151_0019_FC:1:2:2042:1391#CGATCT/2.

These reads were mapped using BWA to produce a BAM fie.

I am trying to filter the unmapped reads out of this file using the following command line:
samtools view -uf 4 alignments.bam | bamToFastq -bam stdin -fq1 unmapped.fastq -fq2 empty.fastq

(This is using Hydra-sv bamtoFastq command). But it only outputs the reads ending in 2.

I'm guessing this command can't handle a mixed input of reads (which are not paired, but are similarly not single end!). Do you know if there is any way I can adapt this, or to filter the input BAM file to separate the reads perhaps?
Lspoor is offline   Reply With Quote
Old 02-16-2012, 10:09 PM   #5
Delphine Song
Junior Member
 
Location: China

Join Date: Aug 2011
Posts: 6
Default

Hi Lspoor,

I'm wondering whether the command below can extract all reads unmapped from a bam file.

samtools view -u -f 4 alignments.bam>tmp.bam
Delphine Song is offline   Reply With Quote
Old 02-17-2012, 12:05 AM   #6
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Quote:
Originally Posted by volks View Post
you get this warning when you read an uncompressed BAM from hard drive. i dont think this is the error.
Agreed. This warning is a false positive due to a bug in samtools -u output, there is a one line fix but it hasn't been applied yet:
http://seqanswers.com/forums/showthr...5067#post65067
http://sourceforge.net/mailarchive/m...sg_id=28843382
maubp is offline   Reply With Quote
Old 02-17-2012, 12:25 AM   #7
volks
Member
 
Location: hd.de

Join Date: Jun 2010
Posts: 81
Default

Quote:
Originally Posted by Delphine Song View Post
Hi Lspoor,

I'm wondering whether the command below can extract all reads unmapped from a bam file.

samtools view -u -f 4 alignments.bam>tmp.bam
in most cases you should go for:
samtools view -b -f 4 alignments.bam > tmp.bam

Options: -b output BAM
-u uncompressed BAM output (force -b)
volks is offline   Reply With Quote
Old 10-03-2012, 11:08 AM   #8
cwisch88
Member
 
Location: St. Louis

Join Date: Jan 2012
Posts: 15
Default Why make 3 tmp files

I've recently been asked to do the same thing that Lspoor has posted about. Why do we need 3 tmp files? Wouldn't using -f 4 retrieve all unmapped lines?

My guess would be that this keeps all the reads in some kind of order, but if you are going to merge and sort them in the end what is the point?
cwisch88 is offline   Reply With Quote
Old 10-03-2012, 01:33 PM   #9
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

Quote:
Originally Posted by cwisch88 View Post
I've recently been asked to do the same thing that Lspoor has posted about. Why do we need 3 tmp files? Wouldn't using -f 4 retrieve all unmapped lines?
Yes. The OP also wanted mapped reads whose mates did not map, that's what step 2 does.
swbarnes2 is offline   Reply With Quote
Old 10-04-2012, 07:38 AM   #10
cwisch88
Member
 
Location: St. Louis

Join Date: Jan 2012
Posts: 15
Default Just to clarify...

Thanks swbarnes!

Using these bits allows for lots of specificity on what you want to retrieve from your alignments, it's just a little cumbersome to someone who hasn't strayed into this portion of the samtools.

Thanks for confirming that -f 4 just retrieves the unmapped reads.
cwisch88 is offline   Reply With Quote
Old 10-04-2012, 08:41 AM   #11
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

-f 4 will get every read with the 4 flagged. That usually means unmapped. There is one exception that I know of. If your read hangs off the end of one of your reference sequences, bwa will map it at the end of that reference, and give it the 4 flag, as a sign that something is off. (This confuses Picard Tools, and you have to either fix the .bam, or tell Picard tools to be lenient about that problem)
swbarnes2 is offline   Reply With Quote
Old 10-04-2012, 09:14 AM   #12
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Quote:
Originally Posted by swbarnes2 View Post
-f 4 will get every read with the 4 flagged. That usually means unmapped. There is one exception that I know of. If your read hangs off the end of one of your reference sequences, bwa will map it at the end of that reference, and give it the 4 flag, as a sign that something is off. (This confuses Picard Tools, and you have to either fix the .bam, or tell Picard tools to be lenient about that problem)
That isn't an exception. FLAG bit 0x4 isn't a sign that something is off, it is the authoritative indicator that a read is unmapped.

What you're describing in BWA is the fact that the implementation actually searches against all your references concatenated together - and then checks if you've got an unlucky read at the boundary. In this case it isn't a real mapping, so FLAG bit 0x4 is set (unmapped). Confusingly (but allowed within the specification) BWA leaves the potential mapping position in place.
maubp is offline   Reply With Quote
Old 05-21-2013, 01:51 PM   #13
wangli
Member
 
Location: Texas

Join Date: Apr 2012
Posts: 48
Default

Quote:
Originally Posted by swbarnes2 View Post
-f 4 will get every read with the 4 flagged. That usually means unmapped. There is one exception that I know of. If your read hangs off the end of one of your reference sequences, bwa will map it at the end of that reference, and give it the 4 flag, as a sign that something is off. (This confuses Picard Tools, and you have to either fix the .bam, or tell Picard tools to be lenient about that problem)
Could you please further specify how to fix the .bam file? I donot want to use the lenient stringency in Picard, because it causes my further problem when calling SNPs in GATK.
wangli is offline   Reply With Quote
Old 08-22-2013, 01:55 AM   #14
lhlbio
Junior Member
 
Location: CHINA

Join Date: Aug 2013
Posts: 4
Default

hi,
very thanks for that ,now I have BAM files and how can I extracting exon reads from BAM using Samtools?
any ideas will be awesome!


Thanks,
huili
lhlbio is offline   Reply With Quote
Old 08-22-2013, 06:20 AM   #15
cwisch88
Member
 
Location: St. Louis

Join Date: Jan 2012
Posts: 15
Default

lhlbio,

If you have the annotations for your reference you can collect them all in the: 'chr1:start-end' format and then run this command:

samtools view aln.sorted.bam chr2:20,100,000-20,200,000 [region2 [...]]

If that doesn't work because there are too many arguments; I'd do each one separately in a shell script and then 'samtools merge' the results.
cwisch88 is offline   Reply With Quote
Old 08-22-2013, 01:24 PM   #16
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

Quote:
Originally Posted by lhlbio View Post
hi,
very thanks for that ,now I have BAM files and how can I extracting exon reads from BAM using Samtools?
any ideas will be awesome!


Thanks,
huili

I assume that you have aligned reads to the whole genome, and now want to filter for only those reads that hit exons?

Get a .bed file of exon boundaries, and use BEDTools intersectBed to filter the .bam;
swbarnes2 is offline   Reply With Quote
Old 08-25-2013, 12:02 AM   #17
lhlbio
Junior Member
 
Location: CHINA

Join Date: Aug 2013
Posts: 4
Default

Quote:
Originally Posted by swbarnes2 View Post
I assume that you have aligned reads to the whole genome, and now want to filter for only those reads that hit exons?

Get a .bed file of exon boundaries, and use BEDTools intersectBed to filter the .bam;
Yes, some exom sequencing are what I have done ,now I can try to filter my exom reads by applying your command ,but I havn't use BEDTools before ! now I must learn someting !
thanks very much!
lhlbio is offline   Reply With Quote
Old 08-25-2013, 12:22 AM   #18
lhlbio
Junior Member
 
Location: CHINA

Join Date: Aug 2013
Posts: 4
Question

Quote:
Originally Posted by cwisch88 View Post
lhlbio,

If you have the annotations for your reference you can collect them all in the: 'chr1:start-end' format and then run this command:

samtools view aln.sorted.bam chr2:20,100,000-20,200,000 [region2 [...]]

If that doesn't work because there are too many arguments; I'd do each one separately in a shell script and then 'samtools merge' the results.
very thanks, I use the command "samtools view aln.sorted.bam chr2:20,100,000-20,200,000 [region2 [...]]",and it does work,but I can't understand it on the screen of linux operation interface.
if I can write it into some files as using "R" do? or other methods ? any help will be appreciated!
lhlbio is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:16 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO