SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Cuffdiff: Importance Sampling and Variance Calculation jk1124 RNA Sequencing 0 10-25-2012 02:43 PM
PubMed: RNA-seq : technical variability and sampling. Newsbot! Literature Watch 0 06-08-2011 03:00 AM
PubMed: Sampling and pyrosequencing methods for characterizing bacterial communities Newsbot! Literature Watch 0 08-03-2010 05:40 AM

Reply
 
Thread Tools
Old 04-09-2013, 11:15 AM   #1
kakseq
Junior Member
 
Location: Ithaca NY

Join Date: Apr 2013
Posts: 5
Default import HTSeq & random sampling

Hi, warning that this is a noob question.

I am running into problems importing HTSeq. The context is that I am trying to sample randomly from a fastq file (not paired reads). After trying a few methods and always running out of memory I think I need to use the script Simon Anders posted here that relies on HTSeq: http://seqanswers.com/forums/showthread.php?t=12070

I navigate to the directory where my script and the fastq file of interest are saved, then call the script as follows in order to randomly subsample a tenth of the reads:

python subsamplewithHTSeq.py 0.10 SRR12345.fastq out_tenth_SRR12345.fastq

However, I always get a message that HTSeq can't be imported. I have installed HTSeq (version HTSeq-0.5.4p1.win32-py2.7.exe), along with Numpy and I am using 32 bit python 2.7 on Windows 7. I have read the tour through HTSeq, but I still can't figure this out. I am a newbie to computational work and any help would be greatly appreciated. The error is below:

C:\Users\kak\Desktop\ShortReads\fastqformatted>python subsamplewithHTSeq.py 0
.1 fastq_SRR12345.fastq tenth_SRR12345.fastq
Traceback (most recent call last):
File "subsamplewithHTSeq.py", line 15, in <module>
import HTSeq
File "C:\Python27\lib\site-packages\HTSeq\__init__.py", line 9, in <module>
from _HTSeq import *
File "_HTSeq.pyx", line 14, in init HTSeq._HTSeq (src/_HTSeq.c:31058)
File "C:\Python27\lib\site-packages\HTSeq\StepVector.py", line 26, in <module>

_StepVector = swig_import_helper()
File "C:\Python27\lib\site-packages\HTSeq\StepVector.py", line 22, in swig_imp
ort_helper
_mod = imp.load_module('_StepVector', fp, pathname, description)
ImportError: DLL load failed: The specified module could not be found.
HTSeq-0.5.4p1.win32-py2.7.exe
kakseq is offline   Reply With Quote
Old 04-09-2013, 11:48 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,015
Default

Quote:
Originally Posted by kakseq View Post
However, I always get a message that HTSeq can't be imported.
Is that referring to this test from the HTSeq installationpage?

Code:
To test your installation, start Python and then try whether typing import HTSeq causes an error meesage.
GenoMax is offline   Reply With Quote
Old 04-09-2013, 01:50 PM   #3
kakseq
Junior Member
 
Location: Ithaca NY

Join Date: Apr 2013
Posts: 5
Default

hi,
Thank you so much for taking a few minutes to help me.
When I type import HTSeq after typing python it gives me nearly the same error message as above. But, if I type import HTSeq again it doesn't give me an error message.
I tried reinstalling HTSeq to no avail.
kakseq is offline   Reply With Quote
Old 04-10-2013, 04:35 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,015
Default

I have sent some email to Simon Anders (not sure if he is on this forum). It appears that the current version of HTseq for windows is not working (for me either).
GenoMax is offline   Reply With Quote
Old 04-10-2013, 05:33 AM   #5
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,173
Default

An alternative to HTSeq for randomly sampling a FASTQ file is Heng Li's seqtk. It can subsample a specific number of reads from a file or a fraction of the input as with Simon's HTSeq script.
kmcarr is offline   Reply With Quote
Old 04-10-2013, 06:49 AM   #6
kakseq
Junior Member
 
Location: Ithaca NY

Join Date: Apr 2013
Posts: 5
Default

hi,

Thanks for your help. For now I guess I'll run HTSeq on a mac instead and look into seqtk for future use.
kakseq is offline   Reply With Quote
Old 04-17-2013, 09:29 AM   #7
mlkerber
Junior Member
 
Location: North Carolina

Join Date: Sep 2012
Posts: 3
Default

I'm having the same issue with HTSeq on my PC. Made sure I was using Python 2.7, reinstalled numpy, reinstalled HTSeq, also without success.

Is there an alternative for getting read counts from SAM file produced by BWA?
mlkerber is offline   Reply With Quote
Old 04-17-2013, 02:53 PM   #8
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

It's hard to say what's wrong if you don't post complete session logs (i.e., all commands typed plus all output and error messages)
Simon Anders is offline   Reply With Quote
Old 04-18-2013, 05:28 AM   #9
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,015
Default

Quote:
Originally Posted by Simon Anders View Post
It's hard to say what's wrong if you don't post complete session logs (i.e., all commands typed plus all output and error messages)
Simon,

On a windows 7 (64-bit) machine: I installed 32-bit python, NumPy (BTW: the www.scipy.org link you have in the instructions does not work any more, I downloaded NumPy from: https://pypi.python.org/pypi/numpy).

Trying "import HTSeq" as recommended in your instructions is generating the following error. Googling around seems to indicate that a VS2010 DLL may be missing but I would like to get your take on what is going on.

Code:
Python 2.7.4 (default, Apr  6 2013, 19:54:46) [MSC v.1500 32 bit (Intel)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
>>> import HTSeq
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\site-packages\HTSeq\__init__.py", line 9, in <module>
    from _HTSeq import *
  File "_HTSeq.pyx", line 14, in init HTSeq._HTSeq (src/_HTSeq.c:31058)
  File "C:\Python27\lib\site-packages\HTSeq\StepVector.py", line 26, in <module>

    _StepVector = swig_import_helper()
  File "C:\Python27\lib\site-packages\HTSeq\StepVector.py", line 22, in swig_imp
ort_helper
    _mod = imp.load_module('_StepVector', fp, pathname, description)
ImportError: DLL load failed: The specified module could not be found.
>>>
GenoMax is offline   Reply With Quote
Old 04-18-2013, 05:51 AM   #10
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Seems that "_StepVector.dll" is really missing in this binary package. I'll try to fix it.
Simon Anders is offline   Reply With Quote
Old 04-18-2013, 05:57 AM   #11
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,015
Default

Quote:
Originally Posted by Simon Anders View Post
Seems that "_StepVector.dll" is really missing in this binary package. I'll try to fix it.
Thanks Simon.

If you can post an update to this thread when you have a chance to fix that it would be great.
GenoMax is offline   Reply With Quote
Old 04-18-2013, 07:26 AM   #12
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Okay, after an hour of fighting with Windows (even the easiest things are hard on an OS that one uses less than once a year), I found the mistake and fixed it.

Please try HTSeq-0.5.4p2.win32-py2.7.exe and let me know if it still fails.
Simon Anders is offline   Reply With Quote
Old 04-18-2013, 08:31 AM   #13
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,015
Default

Lot of people want to use this package on windows so "fighting windows" on your part is worthwhile

We are not in the clear yet. This is the latest result.
Code:
Python 2.7.4 (default, Apr  6 2013, 19:54:46) [MSC v.1500 32 bit (Intel)] on wi
32
Type "help", "copyright", "credits" or "license" for more information.
>>> import HTSeq
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\site-packages\HTSeq\__init__.py", line 9, in <module>
    from _HTSeq import *
ImportError: DLL load failed: The specified module could not be found.
>>>
GenoMax is offline   Reply With Quote
Old 04-18-2013, 09:04 AM   #14
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Could you post a list of all file in C:\Python27\lib\site-packages\HTSeq\, please? With file extensions, if possible. Or, maybe compare with my system, where the directory looks like this:

Code:
C:\Python27\Lib\site-packages\HTSeq>dir
 Volume in drive C has no label.
 Volume Serial Number is BC62-98E8

 Directory of C:\Python27\Lib\site-packages\HTSeq

18/04/2013  16:22    <DIR>          .
18/04/2013  16:22    <DIR>          ..
18/04/2013  16:22    <DIR>          scripts
18/02/2013  17:06            25,079 StepVector.py
18/04/2013  16:42            32,148 StepVector.pyc
18/04/2013  16:42            32,097 StepVector.pyo
18/04/2013  18:21           248,832 _HTSeq.pyd
20/02/2013  17:19             1,407 _HTSeq_internal.py
18/04/2013  16:42             2,011 _HTSeq_internal.pyc
18/04/2013  16:42             2,011 _HTSeq_internal.pyo
18/04/2013  18:21            83,968 _StepVector.pyd
18/04/2013  18:17                24 _version.py
18/04/2013  16:42               168 _version.pyc
18/04/2013  16:42               168 _version.pyo
18/04/2013  18:16            32,996 __init__.py
18/04/2013  16:42            36,726 __init__.pyc
18/04/2013  16:42            36,495 __init__.pyo
              14 File(s)        534,130 bytes
               3 Dir(s)   5,331,779,584 bytes free
Simon Anders is offline   Reply With Quote
Old 04-18-2013, 11:33 AM   #15
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,015
Default

Here it is. Looks the same at first glance to me.

Code:
Directory of c:\Python27\Lib\site-packages\HTSeq

04/18/2013  11:29 AM    <DIR>          .
04/18/2013  11:29 AM    <DIR>          ..
04/18/2013  11:29 AM    <DIR>          scripts
02/18/2013  11:06 AM            25,079 StepVector.py
04/18/2013  11:29 AM            32,148 StepVector.pyc
04/18/2013  11:29 AM            32,097 StepVector.pyo
04/18/2013  12:21 PM           248,832 _HTSeq.pyd
02/20/2013  11:19 AM             1,407 _HTSeq_internal.py
04/18/2013  11:29 AM             2,011 _HTSeq_internal.pyc
04/18/2013  11:29 AM             2,011 _HTSeq_internal.pyo
04/18/2013  12:21 PM            83,968 _StepVector.pyd
04/18/2013  12:17 PM                24 _version.py
04/18/2013  11:29 AM               168 _version.pyc
04/18/2013  11:29 AM               168 _version.pyo
04/18/2013  12:16 PM            32,996 __init__.py
04/18/2013  11:29 AM            36,726 __init__.pyc
04/18/2013  11:29 AM            36,495 __init__.pyo
              14 File(s)        534,130 bytes
GenoMax is offline   Reply With Quote
Old 04-18-2013, 12:51 PM   #16
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,015
Default

When I checked _HTseq.pyd with "dependency walker" it is complaining that "IESHIMS.DLL" can't be found. I will try to see if I can diagnose this further.
GenoMax is offline   Reply With Quote
Old 04-18-2013, 01:39 PM   #17
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Interesting. Just ran "dependency walker" on my system, and I don't have IESHIMS.DLL either, although everything is working for me. Also, it's a "delayed dependency", so it wouldn't cause an issue right at loading.
Simon Anders is offline   Reply With Quote
Old 04-18-2013, 01:43 PM   #18
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

1. Could somebody else who ha a Windows system maybe try? Just to know if its a general problem or something specific to GenoMax's system.

2. GenoMax, could you try to load "_HTSeq" (with the underscore) directly, as follows?

Code:
C:\>cd \Python27\Lib\site-packages\HTSeq

C:\Python27\Lib\site-packages\HTSeq>c:\Python27\python.exe
Python 2.7.4 (default, Apr  6 2013, 19:54:46) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import _HTSeq

Last edited by Simon Anders; 04-18-2013 at 01:45 PM. Reason: added full path
Simon Anders is offline   Reply With Quote
Old 04-18-2013, 01:54 PM   #19
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,015
Default

Here is what I get. May try to uninstall and reinstall everything tomorrow.

Code:
Python 2.7.4 (default, Apr  6 2013, 19:54:46) [MSC v.1500 32 bit (Intel)] on
32
Type "help", "copyright", "credits" or "license" for more information.
>>> import _HTSeq
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named _HTSeq
>>>
GenoMax is offline   Reply With Quote
Old 04-18-2013, 01:57 PM   #20
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

So, you went right to the directory that contained _HTSeq.pyd (you did notice the "cd" command I put on top? I shoudl have highlighted it.), and even then, it doesn't see it? That's strange.
Simon Anders is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:29 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO