![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to obtain full length RNA transcript sequence | Annibal | RNA Sequencing | 9 | 03-14-2012 07:51 AM |
length of transcript in cufflinks output | papori | Bioinformatics | 0 | 08-02-2011 06:06 AM |
transcript length bias in enrichment analysis and RPKM | PFS | RNA Sequencing | 1 | 12-12-2010 06:32 PM |
Nonuniformity of reads across transcript length | Daniel | RNA Sequencing | 7 | 10-07-2010 06:06 AM |
normalizing RNA-seq data to "unique transcript length" instead of "transcript length" | lmc | Bioinformatics | 2 | 06-23-2010 11:45 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: ND, USA Join Date: Oct 2011
Posts: 28
|
![]()
I've created a few different assemblies, and I'd like to make a dotplot of the number of transcripts at a certain length. I'm using R or python. Right now I've got to the point where my data looks like this (each individual transcript is listed):
Line number, Data set, Length "42","b",1258 "43","b",517 "44","b",529 "45","b",593 "46","b",1075 "47","b",772 I want to count the number of times a certain length transcript occurs so I can plot length vs. frequency and show which assemblies are generating longer transcripts. I can do this with R once I have the frequencies, but I can't figure out how to do that. Ultimately, I'd like to turn that file into something like: Length, Quantity 1258, 38 517, 79 Etc...any recommendations on how to transform this? |
![]() |
![]() |
![]() |
#2 |
Member
Location: San Francisco Bay Area Join Date: Dec 2008
Posts: 35
|
![]()
Here's some Python code to do it for you, assuming your file is called data.txt:
Code:
inputFile = open("data.txt") inputFile.next() # skip that first header line frequencyDict = {} for line in inputFile: curLength = int(line.strip().split(',')[2]) if curLength in frequencyDict: frequencyDict[curLength] += 1 else: frequencyDict[curLength] = 1 input.close() outputFile = open("data_transformed.txt",'w') outputFile.write("Length, Quantity\n") minLength, maxLength = min(frequencyDict.keys()), max(frequencyDict.keys()) for i in range(minLength,maxLength+1): if i in frequencyDict: outputFile.write("%s, %s\n" % (str(i),str(frequencyDict[i]))) outputFile.close() |
![]() |
![]() |
![]() |
#3 |
Member
Location: ND, USA Join Date: Oct 2011
Posts: 28
|
![]()
Thanks for the response- I'm getting this error when I run it on my data:
Traceback (most recent call last): File "frequency.py", line 5, in <module> curLength = int(line.strip().split(',')[2]) IndexError: list index out of range Any idea what's causing that? I'm not very good at interpereting this kind of code Last edited by dacotahm; 12-10-2012 at 04:20 PM. |
![]() |
![]() |
![]() |
#4 |
Member
Location: San Francisco Bay Area Join Date: Dec 2008
Posts: 35
|
![]()
It means that it can't find the third element in the list after splitting the line on commas. Do all of your lines have 2 commas or does your data file look different?
|
![]() |
![]() |
![]() |
#5 |
Member
Location: ND, USA Join Date: Oct 2011
Posts: 28
|
![]()
I saw the seperator was a comma, ...I wasn't sure what it was choking on so I ran it on just this fragment and got the same error:
5,b,787 6,b,493 7,b,1344 8,b,906 9,b,960 10,b,544 11,b,4109 12,b,513 13,b,578 14,b,687 15,b,551 16,b,563 17,b,1075 18,b,932 19,b,543 20,b,429 21,b,761 22,b,834 23,b,572 24,b,300 25,b,4790 26,b,667 27,b,656 28,b,1086 29,b,544 30,b,1293 |
![]() |
![]() |
![]() |
#6 |
Member
Location: San Francisco Bay Area Join Date: Dec 2008
Posts: 35
|
![]()
I just ran the same code on the same fragment of data and got the following output:
300, 1 429, 1 493, 1 513, 1 543, 1 544, 2 551, 1 563, 1 572, 1 578, 1 656, 1 667, 1 687, 1 761, 1 834, 1 906, 1 932, 1 960, 1 1075, 1 1086, 1 1293, 1 1344, 1 4109, 1 4790, 1 Note that in the script that should read inputFile.close() and not input.close(). Are you running this on a plain text file? Or are you running this based on something that has been in Microsoft Word or Excel? MS products add strange characters to files, but when I copy your test data to a plain text file and save it as 'data.txt', I get no errors... |
![]() |
![]() |
![]() |
#7 |
Member
Location: San Francisco Bay Area Join Date: Dec 2008
Posts: 35
|
![]()
I should also add that the above output skipped the first line of your file, assuming it had a header line in it-- that's why 787 is missing.
|
![]() |
![]() |
![]() |
#8 |
Member
Location: ND, USA Join Date: Oct 2011
Posts: 28
|
![]()
I copied it right out of my terminal into Kate and saved it as plain text. I did change the input.close() to inputFile.close(). It still returns the same error.
|
![]() |
![]() |
![]() |
#9 |
Member
Location: San Francisco Bay Area Join Date: Dec 2008
Posts: 35
|
![]()
Do you have any empty lines in the file? Otherwise, I don't know what to tell you-- it totally worked as is for me. Assuming you don't have some underlying problem with your input file, this will catch that error and proceed anyway:
Code:
inputFile = open("data.txt") inputFile.next() # skip that first header line frequencyDict = {} for line in inputFile: try: curLength = int(line.strip().split(',')[2]) except: continue if curLength in frequencyDict: frequencyDict[curLength] += 1 else: frequencyDict[curLength] = 1 inputFile.close() outputFile = open("data_transformed.txt",'w') outputFile.write("Length, Quantity\n") minLength, maxLength = min(frequencyDict.keys()), max(frequencyDict.keys()) for i in range(minLength,maxLength+1): if i in frequencyDict: outputFile.write("%s, %s\n" % (str(i),str(frequencyDict[i]))) outputFile.close() |
![]() |
![]() |
![]() |
#10 |
Member
Location: ND, USA Join Date: Oct 2011
Posts: 28
|
![]()
Hm....it worked when I copied a fragment into a new text file using nano but not when I saved the same fragment into a plain text with gedit or kate...
But I think I may have it- thank you for your help and patience. |
![]() |
![]() |
![]() |
#11 |
Member
Location: San Francisco Bay Area Join Date: Dec 2008
Posts: 35
|
![]()
Ah cool, good luck with the rest of your analysis! I used emacs FWIW.
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|