Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Split long fasta seq into smaller seqs by removing 'n'

    Hello,

    My fasta file has a long consensus sequence (gigabases long) that is padded with 'n' between the actual sequences. Like this:

    actgggacnnnnnnnnnnnnnnnnnnnnnnnnnnactgacgtggattgc
    aatnnnnnnnnnnnnnnnaccaattggatagagaccnnn

    I've searched rather intensively to see if the internet can solve my problem. Not really, though they were close. For example, people report splitting the long sequence into smaller files but I do not want that. I want the n removed and have all sequences in the same file. Better yet, with the start and end of the sequences.

    Desired output (>seqname_start_end) :
    >seq1_1_8
    actgggac

    >seq2_35_52
    actgacgtggattgcaat

    >seq3_68_85
    accaattggatagagacc

    If anyone could point me towards a right tool (bioperl, etc) or give me a pseudo-code in perl, I would appreciate it.

    Thanks.

  • #2
    IDBA contains a script called split_scaffold

    Contribute to loneknightpy/idba development by creating an account on GitHub.


    Here's what it does:

    IN =>

    >test
    acgacgacgacgacgacgacgacagcannnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnacgacgacgacagcagtagatgatgatagtag


    OUT =>

    >teste_0
    ACGACGACGACGACGACGACGACAGCA
    >teste_1
    ACGACGACGACAGCAGTAGATGATGATAGTAG

    Comment


    • #3
      Originally posted by azneto View Post
      IDBA contains a script called split_scaffold

      Contribute to loneknightpy/idba development by creating an account on GitHub.

      Great! Thanks a lot! Worked flawlessly.

      It doesn't get me the start/end of the base positions but that I can live with.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Recent Innovations in Spatial Biology
        by seqadmin


        Spatial biology is an exciting field that encompasses a wide range of techniques and technologies aimed at mapping the organization and interactions of various biomolecules in their native environments. As this area of research progresses, new tools and methodologies are being introduced, accompanied by efforts to establish benchmarking standards and drive technological innovation.

        3D Genomics
        While spatial biology often involves studying proteins and RNAs in their...
        01-01-2025, 07:30 PM
      • seqadmin
        Advancing Precision Medicine for Rare Diseases in Children
        by seqadmin




        Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
        12-16-2024, 07:57 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 01-09-2025, 04:04 PM
      0 responses
      431 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 01-09-2025, 09:42 AM
      0 responses
      440 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 01-08-2025, 03:17 PM
      0 responses
      452 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 01-03-2025, 11:18 AM
      1 response
      50 views
      1 like
      Last Post Tonia
      by Tonia
       
      Working...
      X