SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
pysam: a python wrapper of samtools TGBelgard Bioinformatics 36 01-28-2016 03:13 AM
Using Blast in Python susanklein Bioinformatics 2 02-24-2014 02:19 AM
Looking for a Python Ninja tdeboer Industry Jobs! 0 12-06-2013 12:12 PM
Python marqudiego General 4 06-30-2013 11:14 PM
python dror Bioinformatics 0 11-29-2010 04:22 AM

Reply
 
Thread Tools
Old 05-05-2014, 12:46 PM   #1
IonTom
Member
 
Location: Germany

Join Date: Apr 2014
Posts: 32
Default Python samtools BytesIO

I currently have a a problem with samtools and python.
Iḿ trying to write a sam or bam file into a file like object
which works well using the following code

import sh
import io

#define samtools as a python functions
samtools = sh.Command("samtools")

#where is the input sam
sam_input = "test.sam"

#generate a file like object
sam_IO2 = io.BytesIO()

#let samtools view write in the file like object
samtools.view(sam_output,S=True,h=True,_out = sam_IO2)

#show the sam file
sam_IO2.getvalue()

or

bam_IO2 = io.BytesIO()
samtools.view(bam_input,_out = bam_IO2)

#show the bam file
bam_IO2.getvalue()


but now i try to read from the file like object for both cases:

samtools.view("/dev/stdin",S=True,_in = sam_IO2)

samtools.view("/dev/stdin",_in = bam_IO2)

it fails in both tcases.


Does anyone have an idea how to do this correctly ?
Would be really helpful.

Last edited by IonTom; 05-05-2014 at 01:28 PM.
IonTom is offline   Reply With Quote
Old 05-05-2014, 01:57 PM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Without actually going through your code, is there a reason you're not just using pysam? That would seem simpler.

Also, it's generally better to give the actual error message produced instead of just writing "it fails".
dpryan is offline   Reply With Quote
Old 05-05-2014, 02:03 PM   #3
IonTom
Member
 
Location: Germany

Join Date: Apr 2014
Posts: 32
Default

Sorry forgot tom post this.
Here is the error message:

samtools.view("/dev/stdin",_in = bam_IO2)

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/sh.py", line 769, in __call__
return RunningCommand(cmd, call_args, stdin, stdout, stderr)
File "/usr/local/lib/python2.7/dist-packages/sh.py", line 330, in __init__
self.wait()
File "/usr/local/lib/python2.7/dist-packages/sh.py", line 334, in wait
self._handle_exit_code(self.process.wait())
File "/usr/local/lib/python2.7/dist-packages/sh.py", line 348, in _handle_exit_code
self.process.stderr
sh.ErrorReturnCode_1:

RAN: '/usr/bin/samtools view /dev/stdin'

STDOUT:

STDERR:
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[main_samview] fail to read the header from "/dev/stdin".

Isn't pysam more or less doing the same a the sh solution ?
Basically calling the c implementation ?
So would it help to interact with the file like objects ?
IonTom is offline   Reply With Quote
Old 05-05-2014, 02:17 PM   #4
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Pysam actually wraps the C API, so it's a bit more useful. Why bother reinventing the wheel.
dpryan is offline   Reply With Quote
Old 05-06-2014, 08:49 AM   #5
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

I would also recommend trying pysam instead, but it could be your code is breaking due to a simple user error:

Where is the BytesIO handle position within the data stream? Does a
Code:
.seek(0)
to ensure it is at the start help? My guess is the handle position is at the end of the data, and thus when samtools tries to read from it there is no (more) data.
maubp is offline   Reply With Quote
Old 05-13-2014, 04:06 PM   #6
IonTom
Member
 
Location: Germany

Join Date: Apr 2014
Posts: 32
Default

Here is a solution to the problem above using pipes in python

bam_input = "input.bam"
bam_output = "output.bam"

outputter = open(bam_output,mode = "w")

p1 = subprocess.Popen(['samtools',"view","-u",bam_output],stdout=subprocess.PIPE,bufsize=1) #Set up the samtools view command to output to pipe
p2 = subprocess.Popen(["samtools","view","-b","-"], stdin=p1.stdout,stdout=outputter,bufsize = 1) #send p1's output to p2 and p2 to bam file
p1.stdout.close() #make sure we close the output so p2 doesn't hang waiting for more input
output = p2.communicate()[0] #run our commands

outputter.close()

This example is just a toy example but the principle can also be used for real tasks.

Last edited by IonTom; 05-13-2014 at 04:19 PM.
IonTom is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:04 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO