Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • NCBI WGS submission: Need to trim sequences of various length from scaffolds

    The submission that will never die...

    I have a number of contigs that did not pass NCBI's contamination/adapter screen. I need to trim these but the problem is that they all are varying lengths. Some internal bust most at either the 5' or 3' end of the scaffold.

    The only info provided by NCBI is the scaffold/contig name, length and the start and stop base # of what needs to be trimmed. If I had the sequence I could just manually ctrl-f and delete/substitute Ns.

    Most of these are too large to load into VectorNTI or similar program.

    Any help would be appreciated. Thanks.

  • #2
    I presume that you have the original contigs in fasta or some other text format, yes? If so, you'll find biopython very useful (it won't complain about contig length, unless your computer is from the 80s). You can parse fasta files and subset sequences based on coordinates relatively easily with it. The general idea would be to store the coordinates to be trimmed in a text file and the write a little script to (1) read that into a hash (2) open the file containing the contigs (3) iterate through the records, checking for the presence of each in the hash and then subsetting accordingly.

    I would be hesitant to hard mask internal sequences that are actually adapter contamination. It would seem more reasonable in those cases to simply break apart the contigs containing them (you really should remove all adapter sequence prior to assembly).

    Comment


    • #3
      Thanks for the quick reply. I will look into that.

      The adapter sequences were removed from the short reads for the initial contig assembly. I'm assuming that the jump libraries are the culprit hear.

      In the end it's only 47 contigs/scaffolds out of 70k for a large eukaryotic genome.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Recent Advances in Sequencing Analysis Tools
        by seqadmin


        The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
        05-06-2024, 07:48 AM
      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 02:46 PM
      0 responses
      11 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 05-07-2024, 06:57 AM
      0 responses
      13 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 05-06-2024, 07:17 AM
      0 responses
      17 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 05-02-2024, 08:06 AM
      0 responses
      23 views
      0 likes
      Last Post seqadmin  
      Working...
      X