SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Beginner Question: Generate WIG file el_Davido Bioinformatics 11 09-12-2012 09:08 AM
samtools sort bam file error: generate non-existent file mediator Bioinformatics 0 03-05-2012 09:42 PM
how to generate ini file cjose Bioinformatics 4 07-22-2011 11:18 AM
generate data from ewbt file zhaowei Bioinformatics 1 01-28-2011 01:58 PM
Celera Assembler (WGS) - splice site file? dan Bioinformatics 4 09-28-2009 03:56 AM

Reply
 
Thread Tools
Old 09-10-2012, 06:16 AM   #1
Tuinhof
Member
 
Location: Netherlands

Join Date: Jul 2012
Posts: 10
Lightbulb Generate .AGP file for WGS submission

Hi,

I have been trying for a while to submit our WGS genome data to NCBI.
I had to overcome some obstetricals, and I'm almost there, but this last obstetrical seems to be to big for me.

I have to make a AGP file witch describes the assembly of a larger sequence object (scaffolds) from smaller objects (contigs).
We have some sort of script to generate the AGP file. But it is't working correctly. And my expertise isn't not that great to make it work.

So my question to the community is: Who has a working script and is willing to help to improve mine or share theirs?

Thank you in advance!
Tuinhof is offline   Reply With Quote
Old 09-10-2012, 09:15 AM   #2
vivek_
PhD Student
 
Location: Denmark

Join Date: Jul 2012
Posts: 164
Default

Can you post an example of what you current script is producing? May be it can be modified if there's just a few simple formatting issues.

Also some information on which assembler produced the contigs would be useful as well.
vivek_ is offline   Reply With Quote
Old 09-26-2012, 10:00 AM   #3
herstein
Junior Member
 
Location: California

Join Date: Oct 2011
Posts: 4
Default

I also have a similar question. I used SOAPdenovo to assemble a genome but I don't know how to generate an AGP file from the output. Has anyone done this before?

Thanks!
herstein is offline   Reply With Quote
Old 01-31-2013, 01:27 AM   #4
Tuinhof
Member
 
Location: Netherlands

Join Date: Jul 2012
Posts: 10
Question

Quote:
Originally Posted by vivek_ View Post
Can you post an example of what you current script is producing? May be it can be modified if there's just a few simple formatting issues.

Also some information on which assembler produced the contigs would be useful as well.
Sorry for the late response, I was busy with other projects.
I created 7 nucleotide sequences different with N-regions to invoke error messages. Here is the outcome of the Perl script to create an AGP file:

Code:
scaffold_1	1	265	1	W	contig_1	1	265	+
scaffold_1	266	530	2	W	contig_2	1	265	+
scaffold_2	1	407	1	W	contig_3	1	407	+
scaffold_3	1	309	1	W	contig_4	1	309	+
scaffold_3	310	674	2	W	contig_5	1	365	+
scaffold_4	1	340	1	N	340	scaffold	yes	paired-ends
scaffold_5	1	249	1	N	249	scaffold	yes	paired-ends
scaffold_5	250	509	2	W	contig_6	1	260	+
scaffold_5	510	928	3	N	419	scaffold	yes	paired-ends
scaffold_6	1	504	1	N	504	scaffold	yes	paired-ends
scaffold_7	1	402	1	W	contig_7	1	402	+
scaffold_7	403	702	2	N	300	scaffold	yes	paired-ends
scaffold_7	703	1156	3	W	contig_8	1	454	+

	
scaffold_4 1 340 1 N 340 scaffold yes paired-ends
 	[Warning (w32)] gap at the beginning of object scaffold_4
 	[Warning (w31)] gap at the end of object scaffold_4
 	[Warning (w34)] no components in object scaffold_4
	
scaffold_5 1 249 1 N 249 scaffold yes paired-ends
 	[Warning (w32)] gap at the beginning of object scaffold_5
	
scaffold_5 510 928 3 N 419 scaffold yes paired-ends
 	[Warning (w31)] gap at the end of object scaffold_5
	
scaffold_6 1 504 1 N 504 scaffold yes paired-ends
 	[Warning (w32)] gap at the beginning of object scaffold_6
 	[Warning (w31)] gap at the end of object scaffold_6
 	[Warning (w34)] no components in object scaffold_6
Scaffold_4: contains only an N-region

Scaffold_5: contains nucleotide region (260), adjacent on both sides by a N-region. This is not allowed, because of the 'gaps' on both sides of the 'contig'.

Scaffold_6: contains in the middle nucleotide region <200 nt, (so only the N-region is noticed in the AGP file. But only a N-region is also not allowed.

My question:
How can I get around these error messages?
How can I modify the AGP script where it knows to skip the first or last N-region, or when there is only a N-region, to skip this scaffold, in the AGP file? (in the fasta file it is not printed, of course).

Here is my input:
Code:
>scaffold1
TACAATATATGTATACAAAACTGATAGATGCTACGTGGTGAATTGGGCATATTGTAAATATGTATATGTATGTATGTAAACCTAGGGTTTTAGCATTTTTGCATTGGTCAGTGGGGGCCATCTTCTCAGTTTTCTAATAATATTGTTTCAAGACTGATGTGTGTGGGATTTTCGAATGAATTATTCTGGTTCAACGTTATATAAGGTATATGAATTTAAAATAAAATATGCTGTTTCATTCTGTAGGGTGTGTGCCTCTGACTGANNNNNNNNAGTCTCTTTTTGTCGGAGTGGTAAGTCTAGGTTGTCAGATTAAATGACATTATTTATTTGCTGGCTTGTTCTGTGACACACACTGTAATTACTAGTAGCCTGTATTCTGGGGTTATGACGTCATTCATCGCAGTTACTGTAAGANNNNTTTCATCCAGTGGAGAGCTGTAATCTCTGCATCTGGGGATCAGAGGTATTGTGTCTGTAATCTCTCCATCTCAGTTTGCATGTCCCGTTCTTACCGCCTTCTCTGCTATTGAACCGGTTACAGATATATCATTACAATTAAACATTATTGATACACTATTACATATGTACACTACAAGTAGTGACTGACATATCTGAAAAGACACACAGGTGACCCCGTACAGAGAATTTTGTAATTGCCAATTTTGTGAATCCTTATACACAAC
>scaffold2
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGATCTGGAACATTTGACTTCCTTTAAAACCGGATTGACGGGTTTTATAGGACGATCAGCCTCCAAACGCACCCTAGCAATATAGTATAAGACATTACTTAGGTGCTTTTGTGTTAGCGTTTCTCAGACAACAATTTATTTGATATTGGTTTTGGGATTTCTCTGACAACAATTTATTTTATACATTGGTAATGCTGCAGGTTTGTGTGCACTTGAACACAACGTTACTTTACAAAGGTTATTGCTTCAAGAGATATGACCCATGAACCATTATTTCTAGGAAGCACCAAGTACCCTTAGTTCATCAACTTCCTGCAGCCCAAATGCCTTTCATATAGAGCAACTTCCCACAGTATACAATTGAATCAAACTAAAGAGAATATGCTTTTCTATATTTCTGCTCCCTAANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
>scaffold3
GATCTGGAACATTTGACTTCCTTTAAAACCGGATTGACGGGTTTTATAGGACGATCAGCCTCCAAACGCACCCTAGCAATATAGTATAAGACATTACTTAGGTGCTTTTGTGTTAGCGTTTCTCAGACAACAATTTATTTGATATTGGTTTTGGGATTTCTCTGACAACAATTTATTTTATACATTGGTAATGCTGCAGGTTTGTGTGCACTTGAACACAACGTTACTTTACAAAGGTTATTGCTTCAAGAGATATGACCCATGAACCATTATTTCTAGGAAGCACCAAGTACCCTTAGTTCATCAACTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTTTAATTTTATTTGGTACATGTACTTACACATACAATGATGACATACAATATATGTATACAAAACTGATAGATGCTACGTGGTGAATTGGGCATATTGTAAATATGTATATGTATGTATGTAAACCTAGGGTTTTAGCATTTTTGCATTGGTATGGCCTGTATAAGGTCAAAAAACGTAGTCTCTTTTTGTCGGAGTGGTAAGTCTAGGTAGACAGTCAGGCTGACATGCAGACAAACAGTATGCCCCGGCTGGAAAATTCAGTGCTGTCCACTGGTTGGGGTAGAGGCAGACTTTTCAGGAGGGGGCTGTTAATACATTTGCCCATTTTGTAATAAAACTGTCACTTGCAGTCTCCGTAAAGGG
>scaffold4
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
>scaffold5
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGATCTGGAACATTTGACTTCCTTTAAAACCGGATTGACGGGTTTTATAGGACGATCAGCCTCCAAACGCACCCTAGCAATATAGTATAAGACATTACTTAGGTGCTTTTGTGTTAGCGTTTCTCAGACAACAATTTATTTGATATTGGTTTTGGGATTTCTCTGACAACAATTTATTTTATACATTGGTAATGCTGCAGGTTTGTGTGCACTTGAACACAACGTTACTTTACAAAGGTTATTGCTTCAAGAGATATGACCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
>scaffold6
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGATCTGGAACATTTGACTTCCTTTAAAACCGGATTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
>scaffold7
TTTATTTTATACATTGGTAATGCTGCAGGTTTGTGTGCACTTGAACACAACGTTACTTTACAAAGGTTATTGCTTCAAGAGATATGACCCATGAACCATTATTTCTAGGAAGCACCAAGTACCCTTAGTTCATCAACTTCCTGCAGCCCAAATGCCTTTCATATAGAGCAACTTCCCACAGTATACAATTGAATCAAACTAAAGAGAATATGCTTTTCTATATTTCTGCTCCCTAAAGGGATGCTGGCTGGTGGCATATTTTGCAGCTCCCCATACTTTGAAAAACGCTGATAAATGGCTACTAAAGGCAAGTATTCATTATCTTTATAACTTTGAATGTATAGTTGCAAATGGGCCCTTGGCCAAATAATACATAATAGAATAAGCTGGGTAATGCATAATNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTTTAATTTTATTTGGTACATGTACTTACACATACAATGATGACATACAATATATGTATACAAAACTGATAGATGCTACGTGGTGAATTGGGCATATTGTAAATATGTATATGTATGTATGTAAACCTAGGGTTTTAGCATTTTTGCATTGGTATGGCCTGTATAAGGTCAAAAAACGTAGTCTCTTTTTGTCGGAGTGGTAAGTCTAGGTAGACAGTCAGGCTGACATGCAGACAAACAGTATGCCCCGGCTGGAAAATTCAGTGCTGTCCACTGGTTGGGGTAGAGGCAGACTTAAAATATTGTTTTACCACAGCCCATGGGGCAGTGTATGGCCTGTATAAGGTCAAAAAACGTCCAGAATTCTTTGGCATTTTCAATTGTTAATTTAAGTTGAACTGAAGAGGAATAGAGGTGCTTGTGTTTGTGTGTTTGTGTGTGTGTGTGCATATGTG
Tuinhof is offline   Reply With Quote
Old 05-30-2013, 08:29 PM   #5
wwq413
Junior Member
 
Location: usa

Join Date: Dec 2010
Posts: 6
Default

i am not sure if this code can help. It generates contig .agp file by using scaffolds.


www.hmpdacc.org/doc/fasta2apg.pl‎
__________________
wwq
wwq413 is offline   Reply With Quote
Old 05-31-2013, 05:06 AM   #6
Tuinhof
Member
 
Location: Netherlands

Join Date: Jul 2012
Posts: 10
Default

Quote:
Originally Posted by wwq413 View Post
i am not sure if this code can help. It generates contig .agp file by using scaffolds.


www.hmpdacc.org/doc/fasta2apg.pl‎

Dear WWQ413, thank you for trying. I will try this on my scaffolds.
Did you already try to submit your file generated by your script to NCBI?
Tuinhof is offline   Reply With Quote
Old 05-31-2013, 05:07 AM   #7
Tuinhof
Member
 
Location: Netherlands

Join Date: Jul 2012
Posts: 10
Default

Quote:
Originally Posted by wwq413 View Post
i am not sure if this code can help. It generates contig .agp file by using scaffolds.


www.hmpdacc.org/doc/fasta2apg.pl‎
The link does not work...
Tuinhof is offline   Reply With Quote
Old 05-31-2013, 10:54 AM   #8
herstein
Junior Member
 
Location: California

Join Date: Oct 2011
Posts: 4
Default

Try copying this link to your browser, that seemed to work for me:

http://www.hmpdacc.org/doc/fasta2apg.pl
herstein is offline   Reply With Quote
Old 06-03-2013, 02:31 AM   #9
boetsie
Senior Member
 
Location: NL, Leiden

Join Date: Feb 2010
Posts: 245
Default

There is a script in the Velvet package called fasta2agp.pl, it is located in the 'contrib' folder.
boetsie is offline   Reply With Quote
Old 03-18-2014, 07:41 AM   #10
lac302
Member
 
Location: DE

Join Date: Dec 2012
Posts: 65
Default

fasta2agp.pl requires Bio::SeqIO...can this module be found seperately or do I need to install BioPerl?

I tried pip install with no luck.

Bear with me I'm new at this. Hopefully this is the one and only time I need to submit a WGS assembly to NCBI.
lac302 is offline   Reply With Quote
Old 11-25-2014, 07:41 AM   #11
stevebaeyen
Member
 
Location: Belgium

Join Date: Aug 2011
Posts: 18
Default

Hi,
I tried to use the fasta2agp.pl script but I get errors when submitting to NCBI WGS. The script is from 2010 and there is a new AGPv2,0 version since 2012. The script worked but there is an error counting the fragments (column 4) and it gives errors when submitting to WGS.
Can anyone help with this issue?
See first lines:
# Generated from SOAPdenovo assembly file B12041.fa using script fasta2agp.pl
B12041_scaffold_1 1 8630 0 W B12041_1 1 8630 +
B12041_scaffold_1 8631 8665 1 N 35 fragment yes
B12041_scaffold_1 8666 30966 1 W B12041_2 1 22301 +
B12041_scaffold_1 30967 30984 2 N 18 fragment yes
B12041_scaffold_1 30985 89730 2 W B12041_3 1 58746 +
B12041_scaffold_1 89731 89731 3 N 1 fragment yes
B12041_scaffold_1 89732 105108 3 W B12041_4 1 15377 +
stevebaeyen is offline   Reply With Quote
Old 11-27-2014, 05:59 AM   #12
stevebaeyen
Member
 
Location: Belgium

Join Date: Aug 2011
Posts: 18
Arrow

Quote:
Originally Posted by stevebaeyen View Post
Hi,
I tried to use the fasta2agp.pl script but I get errors when submitting to NCBI WGS. The script is from 2010 and there is a new AGPv2,0 version since 2012. The script worked but there is an error counting the fragments (column 4) and it gives errors when submitting to WGS.
Can anyone help with this issue?
See first lines:
# Generated from SOAPdenovo assembly file B12041.fa using script fasta2agp.pl
B12041_scaffold_1 1 8630 0 W B12041_1 1 8630 +
B12041_scaffold_1 8631 8665 1 N 35 fragment yes
B12041_scaffold_1 8666 30966 1 W B12041_2 1 22301 +
B12041_scaffold_1 30967 30984 2 N 18 fragment yes
B12041_scaffold_1 30985 89730 2 W B12041_3 1 58746 +
B12041_scaffold_1 89731 89731 3 N 1 fragment yes
B12041_scaffold_1 89732 105108 3 W B12041_4 1 15377 +
Found a solution!! We adapted the script because there was an error counting the fragments. Here is the script (ran on Ubuntu 12,04 LTS):
#!/usr/bin/perl

### david.studholme@tsl.ac.uk

### Generates contigs (in FastA) and scaffolding information (in AGP) from Velvet 'contigs.fa' supercontigs file

### Use entirely at you own risk!! There may be bugs!

### modified by chienchi@lanl.gov 2010/09/13
### add flags -i -size -o
### change naming convention for HMP assembly purpose

### modified by mpop@umd.edu 2010/9/15
### added flag -name
### allowing to change names of sequences

### modified by mpop@umd.edu 2010/9/27
### removed 10bp limit for inter-contig gaps

### modified by annelies.haegeman@ilvo.vlaanderen.be 2014/11/27
### fixed counter $i to restart from 1 at each scaffold + unique fragment number generation


use strict;
use warnings ;
use Bio::SeqIO ;
use Getopt::Long;
use File::Basename;

my $file;
my $outDir;
my $scaf_size_cutoff=0;
my $name_prefix="AGP";

GetOptions(
'i=s' => \$file, # scaf seq in fasta
'size=i' => \$scaf_size_cutoff,
'o=s' => \$outDir,
'name=s' => \$name_prefix);

die "Usage: $0 -i <sequence file> -o <out_dir> [-size <scaf_size_cutoff>] [-name <name>]\n" unless $file;
my ($file_name, $path, $suffix)=fileparse("$file", qr/\.[^.]*/);
my ($sample_name,$center)=split /_/,$file_name;

### Output file for contigs in Fasta format
### The various output file names can be tuned here.
my $contig_outfile = "$outDir/$name_prefix.contigs.fa";
my $scaffolds_outfile = "$outDir/$name_prefix.scaffolds.fa";
my $agp_outfile = "$outDir/$name_prefix.agp";


#open (FILE, ">$outdir/$contig_outfile") and
# warn "Will write contigs to file '$contig_outfile' to $outDir Directory\n" or
# die "Failed to write to file '$fasta_outfile'\n";

my $contig_out = Bio::SeqIO->new('-file' => ">$contig_outfile",
'-format' => 'Fasta') and
warn "Will write contigs to file '$contig_outfile' to $outDir Directory\n" or
die "Failed to write to file '$contig_outfile'\n";

open (AGP_FILE, ">$agp_outfile") and
warn "Will write contigs to file $agp_outfile to $outDir Directory\n" or
die "Failed to write to file $agp_outfile\n";

#open (SCAFF_FILE, ">$outDir/$scaffolds_outfile") and
my $scaff_out = Bio::SeqIO->new('-file' => ">$scaffolds_outfile",
'-format' => 'Fasta') and
warn "Will write scaffolds to file $scaffolds_outfile to $outDir Directory\n" or
die "Failed to write to file $scaffolds_outfile\n";

print AGP_FILE "# Generated from SOAPdenovo assembly file $file using script $0\n";

warn "Scaffold Sequence cutoff = $scaf_size_cutoff nt\n" if ($scaf_size_cutoff);


my $i = 0;# a counter, used for generating unique contig names

my $inseq = Bio::SeqIO->new('-file' => "<$file",
'-format' => 'Fasta' ) ;


while (my $seq_obj = $inseq->next_seq ) {

my $supercontig_id = $seq_obj->id ;
my $supercontig_seq = $seq_obj->seq ;
my $supercontig_desc = $seq_obj->description ;
my $supercontig_length = length($supercontig_seq);

$i=0; #reset each time you have a new scaffold

# only process the long supercontigs
next if ($supercontig_length < $scaf_size_cutoff);


### NCBI do not allow coverage and length information in the FastA identifier
### e.g. NODE_1160_length_397673_cov_14.469489 is an illegal FastA ID
### So we will replace these with simple numbers
if ($supercontig_id =~ m/NODE_(\d+)_length_\d+_cov_\d+/ or
$supercontig_id =~ m/^(\d+)$/ or
$supercontig_id =~ m/^scaffold(\d+)$/) {
$supercontig_id = "${name_prefix}_scaffold_$1";
}


# print SCAFF_FILE ">$supercontig_id\n$supercontig_seq\n";
my $scf = Bio::PrimarySeq->new(-seq => "$supercontig_seq",
-id => "$supercontig_id");

$scaff_out->write_seq($scf);

my $start_pos = 1; # keep track of whereabouts in this supercontig we are
my %substring_sequences;
foreach my $substring_sequence ( split /(N+)/i, $supercontig_seq ) {
#warn "\n$substring_sequence\n" if $supercontig_id eq '1160'; for #debugging only
$i++; # a counter, used for generating unique contig names
### Define the AGP column contents
my $object1 = $supercontig_id;
my $object_beg2 = $start_pos;
my $object_end3 = $start_pos + length($substring_sequence) - 1;
my $part_number4 = $i;
my $component_type5;
my $component_id6a;
my $gap_length6b;
my $component_beg7a;
my $gap_type7b;
my $component_end8a;
my $linkage8b;
my $orientation9a;
my $filler9b;
if ( $substring_sequence =~ m/^N+$/i ) {
### This is poly-N gap between contigs
$component_type5 = 'N';
$gap_length6b = length($substring_sequence);
$gap_type7b = 'fragment';
$linkage8b = 'yes';
$filler9b = '';
} elsif ( $substring_sequence =~ m/^[ACGT]+$/i ) {
### This is a contig

$component_type5 = 'W';
$component_id6a = "$supercontig_id"."_"."$i";
$component_beg7a = 1;
$component_end8a = length($substring_sequence);
$orientation9a = '+';
### Print FastA formatted contig
# print FILE ">$component_id6a\n$substring_sequence\n";
my $ctg = Bio::PrimarySeq->new(-seq => "$substring_sequence",
-id => "$component_id6a");

$contig_out->write_seq($ctg);


} else {
die "Illegal characters in sequence\n$substring_sequence\n";
}
$start_pos += length ($substring_sequence);
if ($component_type5 eq 'N') {
### print AGP line for gap
print AGP_FILE "$object1\t$object_beg2\t$object_end3\t$part_number4\t$component_type5\t$gap_length6b\t$gap_type7b\t$linkage8b\t$filler9b\n";
} else {
### print AGP line for contig
print AGP_FILE "$object1\t$object_beg2\t$object_end3\t$part_number4\t$component_type5\t$component_id6a\t$component_beg7a\t$component_end8a\t$orientation9a\n";
}
}
}

$contig_out->close();
close AGP_FILE;
$scaff_out->close();
stevebaeyen is offline   Reply With Quote
Reply

Tags
ncbi, wgs

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:17 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO