Seqanswers Leaderboard Ad

**TiborNagy** · 02-17-2014, 02:45 AM

perl -ne 'if($_ =~ /([^N]+)N+([^N]+)/){print $1;print stderr $1}' input.seq >contig1.txt 2>contig2.txt

It will split the input file (input.seq) into contig1.txt and contig2.txt

**mastal** · 02-17-2014, 02:58 AM

should that be

Code:

print stderr $2

**Dagga** · 02-17-2014, 03:15 AM

Thanks for that!!

Will this rename the contigs?

Will the contig that is split be called the same thing in contig1.txt and contig2.txt.

Is it possible to rename the contigs when they are split. For example, if contig 84 is split into two contigs can they be renamed contig 84.1 and contig 84.2 for each half, respectively?

**TiborNagy** · 02-17-2014, 04:49 AM

mastal: you are right!
Dagga: This script does not handle the contig names, only sequences, because you do not tell us what kind of input format do you have.

**Dagga** · 02-17-2014, 01:16 PM

TiborNagy: Sorry, the file will be in fasta format post de novo assembly.

would you be able to alter the script to handle contig names please?

Thanks!

**mastal** · 02-17-2014, 01:30 PM

If you are doing your assemblies with velvet, setting '-scaffolding no' will stop velvet from joining contigs together with stretches of Ns.

**Dagga** · 02-17-2014, 01:35 PM

Excellent!

Whilst this does help with some genomes that I am assembling right now, we have some older genomes that were sequenced by BGI and these contain N's that we still need to have removed...

**TiborNagy** · 02-18-2014, 05:49 AM

Just for you :-)

Code:

#!/usr/bin/perl

$seq = "";

while(<>){
   chomp;

   if(/^>/){
      if($seq ne ""){
         if($seq =~ /([^N]+)N+([^N]+)/){
            print  "$id.1\n$1\n";
            print STDERR "$id.2\n$2\n";
         }
      }
      $seq = "";
      $id = $_;
   }
   else{
      $seq .= $_;
   }
}

if($seq =~ /([^N]+)N+([^N]+)/){
  print "$id.1\n$1\n";
  print STDERR "$id.2\n$2\n";
}

**Dagga** · 02-18-2014, 03:38 PM

Thanks!! I appreciate it!

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Remove N's and split contigs

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News