Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • A program to extract the reads and modify the seq ID by adding weight

    Hi everyone,

    I have a problem in executing the perl script (found online) is given below, a script t0 compare 2 files

    1) a file with seq IDs and its weight
    2) a file with seq IDs and the sequences.

    I modified the original script a bit and tried to use the code with my data,but it neither prints out the output nor gives out any errors and further I want to add the weights in the file 1 to the sequence ID after comparing and extracting the respective reads.

    Input files and the script are attached.

    expected output:-

    >comp10003_c0_seq1 len=166 path=[748:0-22 1004:23-46 2527:47-165]_weight=41
    AAGTAGCCTATGCGCTACAGTAAGAAAGACAGGTGAAAAAATGGAAGTAAAACAATTAGA
    TGACTACTTTGGATATACAGAAAAGGGCAGTTCCTTAGAGGGGGAATTACGAGCAGGACT
    AACGACATTCTTGACAATGGCGTACATTCTGTTTGTGAACCCAGAC


    Could anyone please help me out.

    Thank you in advance.
    Attached Files

  • #2
    Your script is pulling in the sample_IDs with the '>' attached as well as the count. It then pulls in the sample_reads without the '>' attached. The program thus can not match up sample_IDs with sample_reads. So there are two problems here -- (1) you are not saving the counts and (2) you can not match up IDs.

    The solution is to re-write the part where you have

    $ids{$_} += 1;

    Let us know you want more of a hint than that.

    Comment


    • #3
      Does it mean that I have to create a hash of Ids or?

      Comment


      • #4
        Yes, create the hash of IDs. You need to do two things:

        1) Remove the '>'
        2) Split out the counts from the read name and save the counts as the values in your hash.

        Comment


        • #5
          Can you please help me how to proceed further to fulfill the steps you mentioned as I am not a very good programmer

          Comment


          • #6
            The best way to become a better program is to experiment with your programs. :-)

            That said, I would change the line:

            $ids{$_} += 1;

            To

            my ($id, $count) = $_ =~ /^>*(\S+)\s+(\d+)/;
            $ids{$id} = $count;

            Note: I did not test the above. Basically you are taking the input line and looking for:
            1) '>' (optional)
            2) Characters (the id)
            3) Whitespace
            4) Digits (the count)
            And then putting the id and count into your %ids hash

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            25 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            28 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            24 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            52 views
            0 likes
            Last Post seqadmin  
            Working...
            X