Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • A program to extract the reads and modify the seq ID by adding weight

    Hi everyone,

    I have a problem in executing the perl script (found online) is given below, a script t0 compare 2 files

    1) a file with seq IDs and its weight
    2) a file with seq IDs and the sequences.

    I modified the original script a bit and tried to use the code with my data,but it neither prints out the output nor gives out any errors and further I want to add the weights in the file 1 to the sequence ID after comparing and extracting the respective reads.

    Input files and the script are attached.

    expected output:-

    >comp10003_c0_seq1 len=166 path=[748:0-22 1004:23-46 2527:47-165]_weight=41
    AAGTAGCCTATGCGCTACAGTAAGAAAGACAGGTGAAAAAATGGAAGTAAAACAATTAGA
    TGACTACTTTGGATATACAGAAAAGGGCAGTTCCTTAGAGGGGGAATTACGAGCAGGACT
    AACGACATTCTTGACAATGGCGTACATTCTGTTTGTGAACCCAGAC


    Could anyone please help me out.

    Thank you in advance.
    Attached Files

  • #2
    Your script is pulling in the sample_IDs with the '>' attached as well as the count. It then pulls in the sample_reads without the '>' attached. The program thus can not match up sample_IDs with sample_reads. So there are two problems here -- (1) you are not saving the counts and (2) you can not match up IDs.

    The solution is to re-write the part where you have

    $ids{$_} += 1;

    Let us know you want more of a hint than that.

    Comment


    • #3
      Does it mean that I have to create a hash of Ids or?

      Comment


      • #4
        Yes, create the hash of IDs. You need to do two things:

        1) Remove the '>'
        2) Split out the counts from the read name and save the counts as the values in your hash.

        Comment


        • #5
          Can you please help me how to proceed further to fulfill the steps you mentioned as I am not a very good programmer

          Comment


          • #6
            The best way to become a better program is to experiment with your programs. :-)

            That said, I would change the line:

            $ids{$_} += 1;

            To

            my ($id, $count) = $_ =~ /^>*(\S+)\s+(\d+)/;
            $ids{$id} = $count;

            Note: I did not test the above. Basically you are taking the input line and looking for:
            1) '>' (optional)
            2) Characters (the id)
            3) Whitespace
            4) Digits (the count)
            And then putting the id and count into your %ids hash

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 08:47 AM
            0 responses
            14 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            60 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            60 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            54 views
            0 likes
            Last Post seqadmin  
            Working...
            X