SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
DEXSeq Using Counts File From htseq-count FuzzyCoder Bioinformatics 20 01-03-2016 11:18 PM
Error with GTF file when using htseq-count MDonlin Bioinformatics 13 01-13-2015 08:29 AM
Is this a valid GFF3 file for HTSeq? bernardo_bello RNA Sequencing 1 10-18-2013 01:46 AM
HTSeq.BAM_Reader error: .bam-file has no index ... JIrish Bioinformatics 4 10-09-2013 09:35 AM
How to run HTSeq with paired end file xuguorong Bioinformatics 3 11-09-2012 08:17 AM

Reply
 
Thread Tools
Old 06-04-2014, 08:01 AM   #1
thh32
Member
 
Location: UK

Join Date: Feb 2014
Posts: 60
Default How to iterate over reads in file with HTSeq?

I am currently attempting to write a program in Python (v2.7.6) and am using HTSeq to read in fastq files and then run my code on it. However my issue is that my code runs on the first read from each input file (pair end data so 2 input files) and then stops.

Does anyone know how to run code on read1 and iterate to read 2 etc?

Code is :

HTML Code:
# Import functions
import HTSeq
import itertools
import numpy
from matplotlib import pyplot
from itertools import groupby
import operator

#Open files
output_file= open('output.fq', "w")
Unmatched1= open ('Unmatched1.fq', "w")
Unmatched2= open ('Unmatched2.fq', "w")

# Counts lines in input file as 'count' for future use, if needed
print "%d lines in your choosen file" % len(open("Real_test_1").readlines())
count = '%d'  % len(open("Real_test_1").readlines())
print "Reads below"
print count

# Reads in files for use by HTSeq
fastq_file1 = HTSeq.FastqReader( "Real_test_1", "phred")
fastq_file2 = HTSeq.FastqReader( "Real_test_2", "phred")


#Get rev_comp and rename seq1
for read in fastq_file2:
	rc2o = read
	for read in fastq_file2:                                          
		rc2=read.get_reverse_complement()                             
		print rc2                                                     

		for read in fastq_file1:
			rc1o = read
		for read in fastq_file1:
		    rc1 = read
		    print rc1

		#Reverse seq2 again so can be matched
		rc2w = rc2[::-1]

		rc1u = rc1



		while len(rc1u) > 20:
		    slide_merge(rc1u, rc2w)
		    rc1u = rc1u[1:]

		merging


		max(merging.iteritems(), key=operator.itemgetter(1))[0]

		highest = max(merging.iteritems(), key=operator.itemgetter(1))[0]

		highest

		len(highest)

		remove = len(highest)

		if remove > 8:
			rc1r = rc1[:-remove]
			rc3 = rc1r+rc2w
			rc3.write_to_fastq_file(output_file)
		else:
			rc1o.write_to_fastq_file(Unmatched1)
			rc2o.write_to_fastq_file(Unmatched2)

I tried putting aload of the code in a for loop however all that happened was it ran 1000 times due to that being how many lines were in each input file but ran on the first line each time and iterate through the reads.

Code:
HTML Code:
for read in fastq_file2:
	rc2o = read
	for read in fastq_file2:                                          
		rc2=read.get_reverse_complement()                             
		print rc2                                                     

		for read in fastq_file1:
			rc1o = read
		for read in fastq_file1:
		    rc1 = read
		    print rc1

		#Reverse seq2 again so can be matched
		rc2w = rc2[::-1]

		rc1u = rc1



		while len(rc1u) > 20:
		    slide_merge(rc1u, rc2w)
		    rc1u = rc1u[1:]

		merging


		max(merging.iteritems(), key=operator.itemgetter(1))[0]

		highest = max(merging.iteritems(), key=operator.itemgetter(1))[0]

		highest

		len(highest)

		remove = len(highest)

		if remove > 8:
			rc1r = rc1[:-remove]
			rc3 = rc1r+rc2w
			rc3.write_to_fastq_file(output_file)
		else:
			rc1o.write_to_fastq_file(Unmatched1)
			rc2o.write_to_fastq_file(Unmatched2)
Any help would be greatly appreciated.

Thanks,
Tom
thh32 is offline   Reply With Quote
Old 06-04-2014, 11:10 AM   #2
wolma
Member
 
Location: Germany

Join Date: May 2014
Posts: 23
Default

ouff,
as a start, replace your way too many for loops with:

for read1, read2 in itertools.izip(fastq_file1, fastq_file2):

then do all your processing inside this one loop.
Good luck,
Wolfgang
wolma is offline   Reply With Quote
Old 06-07-2014, 04:29 AM   #3
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 838
Default

Can you give a rough idea of what your code is supposed to do? I'm used to using HTSeq to iterate over mapped BAM files or GTF files, not FASTQ files, so can't quite work out how HTSeq would be useful for you here. My guess based on a rough look at the code is a paired-end read merger, but the lines with only "empty" and "merging" (not commented) are very confusing -- I'm guessing they're for debugging purposes.

I've got a few comments about the code syntax, which may or may not affect the outcome:
  • reading entire files into memory just to count lines [len(open("Real_test_1").readlines())] is a little inefficient, and will make your code much slower than it needs to be.
  • when using loops, never use the same variable for an inner loop as used for an outer loop [you have "read" used in a loop and its child, and then again in a couple more loops].

Last edited by gringer; 06-07-2014 at 04:32 AM.
gringer is offline   Reply With Quote
Reply

Tags
htseq, python

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:53 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO