SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and dan Literature Watch 1 11-09-2011 05:18 AM
ChIP-Seq: Systematic bias in high-throughput sequencing data and its correction by BE Newsbot! Literature Watch 0 06-08-2011 03:50 AM
ChIP-Seq: Data structures and compression algorithms for high-throughput sequencing t Newsbot! Literature Watch 0 10-16-2010 03:00 AM
ChIP-Seq: Savant: Genome Browser for High Throughput Sequencing Data. Newsbot! Literature Watch 0 06-22-2010 03:00 AM
Dave - interested in trial data sets and high-throughput sequencing tutorials Dave S. Introductions 4 02-02-2010 11:04 AM

Reply
 
Thread Tools
Old 08-12-2014, 02:15 AM   #181
superpyrin
Junior Member
 
Location: US

Join Date: Aug 2014
Posts: 7
Default Running HTSeq in parallel

Hello,

I am trying to process mapped reads in parallel. However, when using pool of workers (multiprocessing package), I get following error:

Traceback (most recent call last):
File "testmp.py", line 15, in <module>
out = pool.map(repr, iter(sa), chunksize=1)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 228, in map
return self.map_async(func, iterable, chunksize).get()
File "/usr/lib/python2.7/multiprocessing/pool.py", line 531, in get
raise self._value
AttributeError: 'NoneType' object has no attribute 'name'


Running in serial fashion (using just built-in 'map' function) works fine.

Do you know what can be wrong here?

Thank you.

---

Here is a simple script to reproduce the error. I am using HTSeq ver 0.6.1, Python 2.7.3, 64bit ubuntu 12.04

import HTSeq
from multiprocessing import Pool
# this works for me
map(repr, sa)

# this does not work
pool = Pool(processes=1)
sa = HTSeq.SAM_Reader('test.sam')
out = pool.map(repr, sa, chunksize=1)
print list(out)
superpyrin is offline   Reply With Quote
Old 08-12-2014, 02:44 AM   #182
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 992
Default

You asked me this before, but Ididn't reply, right? Sorry about that, I was a bit overwhelmed with mails.

The bad news is: I have no clue why it does not work; I have never worked with the multiprocessing package. But I agree that it would be nice if this worked.

Maybe somebody else here has some idea?
Simon Anders is offline   Reply With Quote
Old 08-12-2014, 03:13 AM   #183
superpyrin
Junior Member
 
Location: US

Join Date: Aug 2014
Posts: 7
Default

Quote:
Originally Posted by Simon Anders View Post
You asked me this before, but Ididn't reply, right? Sorry about that, I was a bit overwhelmed with mails.

The bad news is: I have no clue why it does not work; I have never worked with the multiprocessing package. But I agree that it would be nice if this worked.

Maybe somebody else here has some idea?
my guess is that it is something trivial, perhaps missing implementation of some part of dictionary/list interface in on of HTSeq data structures. I tried to look into HTSeq source, but it seems to be machine generated code, so I quickly gave up.
superpyrin is offline   Reply With Quote
Old 08-12-2014, 03:58 AM   #184
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 992
Default

Yes, the C code is machine generated, but if you look at the pyx files which it is generated from, it should be clearer. Have a look at:
http://www-huber.embl.de/users/ander...c/contrib.html
Simon Anders is offline   Reply With Quote
Old 08-12-2014, 06:40 AM   #185
superpyrin
Junior Member
 
Location: US

Join Date: Aug 2014
Posts: 7
Default

Quote:
Originally Posted by Simon Anders View Post
Yes, the C code is machine generated, but if you look at the pyx files which it is generated from, it should be clearer. Have a look at:
http://www-huber.embl.de/users/ander...c/contrib.html
Multiprocessing can work only with objects that can be pickled. SAM_Alignment cannot be pickled. I suspect this may be the reason it does not work.

Objects must implement __getstate__ and __setstate__ functions in order to be pickled/unpickled. Would it be difficult to implement these functions?
superpyrin is offline   Reply With Quote
Old 08-12-2014, 06:51 AM   #186
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 992
Default

Quote:
Originally Posted by superpyrin View Post
Multiprocessing can work only with objects that can be pickled. SAM_Alignment cannot be pickled. I suspect this may be the reason it does not work.
I suppose you are right. makes perfect sense.

Quote:
Objects must implement __getstate__ and __setstate__ functions in order to be pickled/unpickled. Would it be difficult to implement these functions?
I don't think so. I would just need to find the time to do it.

All one needs to do is take all the slots defined for the class in _HTSeq.SAM_Alignment, pack them into a tuple for __getstate__ and write them back for __setstate__.
Simon Anders is offline   Reply With Quote
Old 02-11-2015, 09:45 AM   #187
cdias
Junior Member
 
Location: Canada

Join Date: Jun 2012
Posts: 8
Default

Hi,
I'm having trouble installing HTSeq.
I pretty much followed the instructions, but when I try to run it, I get the following error:

.local/lib/python2.7/site-packages/HTSeq-0.6.1-py2.7-linux-x86_64.egg/HTSeq/_HTSeq.so: undefined symbol: PyUnicodeUCS2_DecodeUTF8

Any help is appreciated!
cdias is offline   Reply With Quote
Old 02-27-2015, 11:40 AM   #188
zjrouc
Member
 
Location: USA

Join Date: Sep 2010
Posts: 25
Default

i just got a result from ht-seq. It showed that my interesting gene has 7 counts in the alignment. However, from the view of IGV, i could easily identify much more counts than 7 on this gene. My alignment is from STAR, and i used more stringent parameters to control the multiple alignment, which means there should not be any multiple aligned reads in the output. I am really confused about this.

Any suggestion?
Attached Images
File Type: jpg alignment.jpg (94.2 KB, 7 views)
zjrouc is offline   Reply With Quote
Old 06-10-2015, 09:05 PM   #189
patchper
Junior Member
 
Location: hubei, china

Join Date: Jun 2015
Posts: 1
Default

I don't know where to report bugs so I posted here.
I think the start_d and end_d feature of GenomicIntervals have bugs.
With a SAM file below as sample.sam:
read1 0 chr 1 40 7M * 0 0 ATGGCGT AAAAAAA
read2 16 chr 1 40 7M * 0 0 ATGGCGT AAAAAAA

and:
>>> read1,read2 = list(itertools.islice(HTSeq.SAM_Reader('sample.sam'),2))

>>> read1
<SAM_Alignment object: Read 'read' aligned to chr:[0,7)/+>

>>> read2
<SAM_Alignment object: Read 'read2' aligned to chr:[0,7)/->

>>> read1.iv.start,read1.iv.end,read1.iv.start_d,read1.iv.end_d
(0, 7, 0, 7)

>>> read2.iv.start,read2.iv.end,read2.iv.start_d,read2.iv.end_d
(0, 7, 6, -1)

the end_d of read2 ended with a negative coordinate! This behavior is mentioned in document, but I think it is a bug rather than a feature.
patchper is offline   Reply With Quote
Old 06-10-2015, 09:27 PM   #190
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 823
Default

From the looks of it, the read locations are zero-based and open-ended on the right, so don't include the "end" location in the list of base locations. For an end location of -2, that's a bit more concerning, otherwise it's just business as usual for how these things are done.
gringer is offline   Reply With Quote
Old 10-22-2015, 11:26 AM   #191
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,800
Default

Hi Simon,

We have been encountering an error with htseq-count (v. 0.6.1p1) on alignment files that have SAM v.1.4 tags.

The specific error is

Code:
Unknown CIGAR code 'X' encountered
I found an old request about this error which does not appear to have been implemented in htseq-count yet : http://sourceforge.net/p/htseq/support-requests/22/

Are there plans to add support for SAM v.1.4 tags to htseq-count? For now we have been working around this by generating SAM v.1.3 tags.

Thanks.
GenoMax is offline   Reply With Quote
Old 10-29-2015, 07:33 AM   #192
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 992
Default

Plans, yes -- but I'm so overwhelmed with other things that it might take a while till I get to that. Sorry.
Simon Anders is offline   Reply With Quote
Reply

Tags
htseq, python

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:36 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO