SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Should I extract RNA from tissues with constant weight? Ouuuw Sample Prep / Library Generation 4 02-06-2014 10:51 PM
Re-adding all reads in Consed hnbc Bioinformatics 0 06-19-2013 09:22 AM
Program edit an ace file to simultaneously extract read information from all contig? cllorens Bioinformatics 3 06-27-2012 08:20 AM
Best RNA-seq program for unpaired reads reshetovdenis RNA Sequencing 15 02-02-2012 12:18 AM
Any program to extract different peaks from two wigs files hon Bioinformatics 1 10-29-2009 01:30 AM

Reply
 
Thread Tools
Old 02-25-2014, 04:06 AM   #1
bambus
Member
 
Location: Germany

Join Date: Nov 2013
Posts: 20
Default A program to extract the reads and modify the seq ID by adding weight

Hi everyone,

I have a problem in executing the perl script (found online) is given below, a script t0 compare 2 files

1) a file with seq IDs and its weight
2) a file with seq IDs and the sequences.

I modified the original script a bit and tried to use the code with my data,but it neither prints out the output nor gives out any errors and further I want to add the weights in the file 1 to the sequence ID after comparing and extracting the respective reads.

Input files and the script are attached.

expected output:-

>comp10003_c0_seq1 len=166 path=[748:0-22 1004:23-46 2527:47-165]_weight=41
AAGTAGCCTATGCGCTACAGTAAGAAAGACAGGTGAAAAAATGGAAGTAAAACAATTAGA
TGACTACTTTGGATATACAGAAAAGGGCAGTTCCTTAGAGGGGGAATTACGAGCAGGACT
AACGACATTCTTGACAATGGCGTACATTCTGTTTGTGAACCCAGAC


Could anyone please help me out.

Thank you in advance.
Attached Files
File Type: txt sample_IDs.txt (426 Bytes, 8 views)
File Type: txt sample_reads.txt (3.6 KB, 3 views)
File Type: pl test_to_extract_reads.pl (535 Bytes, 5 views)
bambus is offline   Reply With Quote
Old 02-25-2014, 05:40 AM   #2
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Your script is pulling in the sample_IDs with the '>' attached as well as the count. It then pulls in the sample_reads without the '>' attached. The program thus can not match up sample_IDs with sample_reads. So there are two problems here -- (1) you are not saving the counts and (2) you can not match up IDs.

The solution is to re-write the part where you have

$ids{$_} += 1;

Let us know you want more of a hint than that.
westerman is offline   Reply With Quote
Old 02-25-2014, 05:54 AM   #3
bambus
Member
 
Location: Germany

Join Date: Nov 2013
Posts: 20
Default

Does it mean that I have to create a hash of Ids or?
bambus is offline   Reply With Quote
Old 02-25-2014, 05:59 AM   #4
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Yes, create the hash of IDs. You need to do two things:

1) Remove the '>'
2) Split out the counts from the read name and save the counts as the values in your hash.
westerman is offline   Reply With Quote
Old 02-25-2014, 06:03 AM   #5
bambus
Member
 
Location: Germany

Join Date: Nov 2013
Posts: 20
Unhappy

Can you please help me how to proceed further to fulfill the steps you mentioned as I am not a very good programmer
bambus is offline   Reply With Quote
Old 02-25-2014, 06:36 AM   #6
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

The best way to become a better program is to experiment with your programs. :-)

That said, I would change the line:

$ids{$_} += 1;

To

my ($id, $count) = $_ =~ /^>*(\S+)\s+(\d+)/;
$ids{$id} = $count;

Note: I did not test the above. Basically you are taking the input line and looking for:
1) '>' (optional)
2) Characters (the id)
3) Whitespace
4) Digits (the count)
And then putting the id and count into your %ids hash
westerman is offline   Reply With Quote
Reply

Tags
bioinformatics., perl script

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:46 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO