Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • PRINSEQ -derep option

    Hi,
    i would like to remove sequences that are identical with the 5' or 3' end of a longer sequence.
    Here is an example of what i would like to do :
    INPUT :
    Code:
    >pi1
    AAAAAAAAAATTAAGGGCCAGCTGA
    >pi12
    AAAAAAAAAATTAAGGGCCAGCTGAA
    >pi13
    AAAAAAAAACTTGAACTCTACTGC
    >pi14
    AAAAAAAAATTAAGGGCCAGCTGAA
    >pi15
    AAAAAAAAATTTTGGATGATCTTAAT
    >pi16
    AAAAAAAAATTTTGGATGATCTTAATT
    >pi17
    AAAAAAAACAAGGTCGGCATAAAG
    >pi18
    AAAAAAAACGAACATGAGAGGATGGA
    OUTPUT :
    Code:
    >pi12
    AAAAAAAAAATTAAGGGCCAGCTGAA
    >pi13
    AAAAAAAAACTTGAACTCTACTGC
    >pi16
    AAAAAAAAATTTTGGATGATCTTAATT
    >pi17
    AAAAAAAACAAGGTCGGCATAAAG
    >pi18
    AAAAAAAACGAACATGAGAGGATGGA
    I try to solve my problem with PRINSEQ, with the following comand line, but it did'nt work, it only remove reads that have the exact same sequence
    Code:
    perl prinseq-lite.pl -verbose -fasta tmp1.fa -derep 123 -out_format 1
    Someone familiar with this tool can help me ?
    Thanks in advance

  • #2
    Hi StephaniePi83, the "1" in -derep 123 means to only remove exact sequences. I would try to remove the "1" and go with only 2 (2 means 5' duplicate, 3 means 3' duplicate). I'm not completely certain this will do what you want but it is easy to test with a small data set (like the example you pasted above). If that doesn't do it, then I would email the author. I've emailed him before and he responded.

    Comment


    • #3
      Hi SES,
      I already try this; i think i'll email to the author as you suggest me
      Last edited by StephaniePi83; 09-05-2012, 07:35 AM.

      Comment


      • #4
        I email the author. There was an issue with the program that didn't tricker the 5'/3' removal. He gives me the latest version with the fix !

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        59 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        57 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        53 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        56 views
        0 likes
        Last Post seqadmin  
        Working...
        X