![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
counting tags in bam file | james4545 | Bioinformatics | 2 | 01-10-2013 08:03 AM |
VarScan shift in counting of bases! | AlbertZANG | Bioinformatics | 0 | 06-18-2012 10:29 AM |
Counting coverage bases | batsal | Bioinformatics | 0 | 02-17-2012 10:46 AM |
Counting mapped nucleotides in a bam file | moriah | Bioinformatics | 7 | 08-22-2011 01:30 AM |
Unique bases from .gff file | Khanjan | Bioinformatics | 2 | 12-01-2010 12:21 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: US Join Date: Jul 2013
Posts: 55
|
![]()
New to python but trying to learn, here I wrote a program to count the nucleotides in a fasta file. The approach is that when the line does not contain a ">" symbol then the program counts nucleotides one by one and adds it to the variable sum. The program does not work and I don't know why. Please help!
#!/usr/bin/env python import sys # Takes fasta file and counts total bases infile = open(sys.argv[0], "rU") sum = 0 line = infile.readlines() while ">" not in lline: for i in line: print (sum)sum += 1 usage: $ python countbases.py infile.fasta |
![]() |
![]() |
![]() |
#2 |
Junior Member
Location: US Join Date: Aug 2013
Posts: 1
|
![]()
It would be helpful if you gave more information about how the program fails. However, one issue I think is this line:
infile = open(sys.argv[0], "rU") Should be: infile = open(sys.argv[1], "rU") In this case sys.argv[0] = your script name, whereas sys.argv[1] = infile.fasta. See here for more information: http://stackoverflow.com/questions/5...-documentation |
![]() |
![]() |
![]() |
#3 | |
Member
Location: US Join Date: Jul 2013
Posts: 55
|
![]() Quote:
The program does not return any error, it just does not return anything. If I run it in Idle, it crashes and if I run it in terminal, it jumps to the next line as if it was running but it stays there forever. I am trying with very small test files and if it worked it should go fairly fast instead of staying hanged for a long time. |
|
![]() |
![]() |
![]() |
#4 | |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#5 |
Member
Location: US Join Date: Jul 2013
Posts: 55
|
![]() |
![]() |
![]() |
![]() |
#6 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]()
Well, you're also trying to iterate through a list in an odd way that I'm not sure would work. You might take a rather more efficient approach:
Code:
#!/usr/bin/env python import sys f = open(sys.argv[1], "rU") total=0 for line in f : if(line[0] == ">") : continue total += len(line)-1 print("%s has %i nucleotides" % (sys.argv[1], total)) Last edited by dpryan; 08-20-2013 at 06:49 AM. Reason: typo |
![]() |
![]() |
![]() |
#7 |
Member
Location: US Join Date: Jul 2013
Posts: 55
|
![]()
Impressed! it works like a charm
Thanks a million |
![]() |
![]() |
![]() |
#8 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]()
No problem. In a real version, you'd probably want to use argparse (so you can easily give help and usage information) and check to see that the fasta file exists before running further. You also might want to add the number of bases per chromosome, likely only printed if the user specifies.
|
![]() |
![]() |
![]() |
#9 |
Member
Location: salt lake city, UT Join Date: Apr 2010
Posts: 72
|
![]()
import sys
print sum(len(l.rstrip()) for l in open(sys.argv[1]) if l[0] != ">") |
![]() |
![]() |
![]() |
#10 |
Junior Member
Location: Greater Boston Area Join Date: May 2013
Posts: 8
|
![]()
Yet another way (and more general) is to use the BioPython module. You will want to get familiar with handling sequences like this anyway, it is much more convenient to have a data structure. It would be this:
Code:
import sys from Bio import SeqIO for seq_record in SeqIO.parse(sys.argv[1], "fasta"): print seq_record.id print len(seq_record) |
![]() |
![]() |
![]() |
Tags |
bases, count, python |
Thread Tools | |
|
|