Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate .AGP file for WGS submission

    Hi,

    I have been trying for a while to submit our WGS genome data to NCBI.
    I had to overcome some obstetricals, and I'm almost there, but this last obstetrical seems to be to big for me.

    I have to make a AGP file witch describes the assembly of a larger sequence object (scaffolds) from smaller objects (contigs).
    We have some sort of script to generate the AGP file. But it is't working correctly. And my expertise isn't not that great to make it work.

    So my question to the community is: Who has a working script and is willing to help to improve mine or share theirs?

    Thank you in advance!

  • #2
    Can you post an example of what you current script is producing? May be it can be modified if there's just a few simple formatting issues.

    Also some information on which assembler produced the contigs would be useful as well.

    Comment


    • #3
      I also have a similar question. I used SOAPdenovo to assemble a genome but I don't know how to generate an AGP file from the output. Has anyone done this before?

      Thanks!

      Comment


      • #4
        Originally posted by vivek_ View Post
        Can you post an example of what you current script is producing? May be it can be modified if there's just a few simple formatting issues.

        Also some information on which assembler produced the contigs would be useful as well.
        Sorry for the late response, I was busy with other projects.
        I created 7 nucleotide sequences different with N-regions to invoke error messages. Here is the outcome of the Perl script to create an AGP file:

        Code:
        scaffold_1	1	265	1	W	contig_1	1	265	+
        scaffold_1	266	530	2	W	contig_2	1	265	+
        scaffold_2	1	407	1	W	contig_3	1	407	+
        scaffold_3	1	309	1	W	contig_4	1	309	+
        scaffold_3	310	674	2	W	contig_5	1	365	+
        scaffold_4	1	340	1	N	340	scaffold	yes	paired-ends
        scaffold_5	1	249	1	N	249	scaffold	yes	paired-ends
        scaffold_5	250	509	2	W	contig_6	1	260	+
        scaffold_5	510	928	3	N	419	scaffold	yes	paired-ends
        scaffold_6	1	504	1	N	504	scaffold	yes	paired-ends
        scaffold_7	1	402	1	W	contig_7	1	402	+
        scaffold_7	403	702	2	N	300	scaffold	yes	paired-ends
        scaffold_7	703	1156	3	W	contig_8	1	454	+
        
        	
        scaffold_4 1 340 1 N 340 scaffold yes paired-ends
         	[Warning (w32)] gap at the beginning of object scaffold_4
         	[Warning (w31)] gap at the end of object scaffold_4
         	[Warning (w34)] no components in object scaffold_4
        	
        scaffold_5 1 249 1 N 249 scaffold yes paired-ends
         	[Warning (w32)] gap at the beginning of object scaffold_5
        	
        scaffold_5 510 928 3 N 419 scaffold yes paired-ends
         	[Warning (w31)] gap at the end of object scaffold_5
        	
        scaffold_6 1 504 1 N 504 scaffold yes paired-ends
         	[Warning (w32)] gap at the beginning of object scaffold_6
         	[Warning (w31)] gap at the end of object scaffold_6
         	[Warning (w34)] no components in object scaffold_6
        Scaffold_4: contains only an N-region

        Scaffold_5: contains nucleotide region (260), adjacent on both sides by a N-region. This is not allowed, because of the 'gaps' on both sides of the 'contig'.

        Scaffold_6: contains in the middle nucleotide region <200 nt, (so only the N-region is noticed in the AGP file. But only a N-region is also not allowed.

        My question:
        How can I get around these error messages?
        How can I modify the AGP script where it knows to skip the first or last N-region, or when there is only a N-region, to skip this scaffold, in the AGP file? (in the fasta file it is not printed, of course).

        Here is my input:
        Code:
        >scaffold1
        TACAATATATGTATACAAAACTGATAGATGCTACGTGGTGAATTGGGCATATTGTAAATATGTATATGTATGTATGTAAACCTAGGGTTTTAGCATTTTTGCATTGGTCAGTGGGGGCCATCTTCTCAGTTTTCTAATAATATTGTTTCAAGACTGATGTGTGTGGGATTTTCGAATGAATTATTCTGGTTCAACGTTATATAAGGTATATGAATTTAAAATAAAATATGCTGTTTCATTCTGTAGGGTGTGTGCCTCTGACTGANNNNNNNNAGTCTCTTTTTGTCGGAGTGGTAAGTCTAGGTTGTCAGATTAAATGACATTATTTATTTGCTGGCTTGTTCTGTGACACACACTGTAATTACTAGTAGCCTGTATTCTGGGGTTATGACGTCATTCATCGCAGTTACTGTAAGANNNNTTTCATCCAGTGGAGAGCTGTAATCTCTGCATCTGGGGATCAGAGGTATTGTGTCTGTAATCTCTCCATCTCAGTTTGCATGTCCCGTTCTTACCGCCTTCTCTGCTATTGAACCGGTTACAGATATATCATTACAATTAAACATTATTGATACACTATTACATATGTACACTACAAGTAGTGACTGACATATCTGAAAAGACACACAGGTGACCCCGTACAGAGAATTTTGTAATTGCCAATTTTGTGAATCCTTATACACAAC
        >scaffold2
        NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGATCTGGAACATTTGACTTCCTTTAAAACCGGATTGACGGGTTTTATAGGACGATCAGCCTCCAAACGCACCCTAGCAATATAGTATAAGACATTACTTAGGTGCTTTTGTGTTAGCGTTTCTCAGACAACAATTTATTTGATATTGGTTTTGGGATTTCTCTGACAACAATTTATTTTATACATTGGTAATGCTGCAGGTTTGTGTGCACTTGAACACAACGTTACTTTACAAAGGTTATTGCTTCAAGAGATATGACCCATGAACCATTATTTCTAGGAAGCACCAAGTACCCTTAGTTCATCAACTTCCTGCAGCCCAAATGCCTTTCATATAGAGCAACTTCCCACAGTATACAATTGAATCAAACTAAAGAGAATATGCTTTTCTATATTTCTGCTCCCTAANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
        >scaffold3
        GATCTGGAACATTTGACTTCCTTTAAAACCGGATTGACGGGTTTTATAGGACGATCAGCCTCCAAACGCACCCTAGCAATATAGTATAAGACATTACTTAGGTGCTTTTGTGTTAGCGTTTCTCAGACAACAATTTATTTGATATTGGTTTTGGGATTTCTCTGACAACAATTTATTTTATACATTGGTAATGCTGCAGGTTTGTGTGCACTTGAACACAACGTTACTTTACAAAGGTTATTGCTTCAAGAGATATGACCCATGAACCATTATTTCTAGGAAGCACCAAGTACCCTTAGTTCATCAACTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTTTAATTTTATTTGGTACATGTACTTACACATACAATGATGACATACAATATATGTATACAAAACTGATAGATGCTACGTGGTGAATTGGGCATATTGTAAATATGTATATGTATGTATGTAAACCTAGGGTTTTAGCATTTTTGCATTGGTATGGCCTGTATAAGGTCAAAAAACGTAGTCTCTTTTTGTCGGAGTGGTAAGTCTAGGTAGACAGTCAGGCTGACATGCAGACAAACAGTATGCCCCGGCTGGAAAATTCAGTGCTGTCCACTGGTTGGGGTAGAGGCAGACTTTTCAGGAGGGGGCTGTTAATACATTTGCCCATTTTGTAATAAAACTGTCACTTGCAGTCTCCGTAAAGGG
        >scaffold4
        NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
        >scaffold5
        NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGATCTGGAACATTTGACTTCCTTTAAAACCGGATTGACGGGTTTTATAGGACGATCAGCCTCCAAACGCACCCTAGCAATATAGTATAAGACATTACTTAGGTGCTTTTGTGTTAGCGTTTCTCAGACAACAATTTATTTGATATTGGTTTTGGGATTTCTCTGACAACAATTTATTTTATACATTGGTAATGCTGCAGGTTTGTGTGCACTTGAACACAACGTTACTTTACAAAGGTTATTGCTTCAAGAGATATGACCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
        >scaffold6
        NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGATCTGGAACATTTGACTTCCTTTAAAACCGGATTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
        >scaffold7
        TTTATTTTATACATTGGTAATGCTGCAGGTTTGTGTGCACTTGAACACAACGTTACTTTACAAAGGTTATTGCTTCAAGAGATATGACCCATGAACCATTATTTCTAGGAAGCACCAAGTACCCTTAGTTCATCAACTTCCTGCAGCCCAAATGCCTTTCATATAGAGCAACTTCCCACAGTATACAATTGAATCAAACTAAAGAGAATATGCTTTTCTATATTTCTGCTCCCTAAAGGGATGCTGGCTGGTGGCATATTTTGCAGCTCCCCATACTTTGAAAAACGCTGATAAATGGCTACTAAAGGCAAGTATTCATTATCTTTATAACTTTGAATGTATAGTTGCAAATGGGCCCTTGGCCAAATAATACATAATAGAATAAGCTGGGTAATGCATAATNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTTTAATTTTATTTGGTACATGTACTTACACATACAATGATGACATACAATATATGTATACAAAACTGATAGATGCTACGTGGTGAATTGGGCATATTGTAAATATGTATATGTATGTATGTAAACCTAGGGTTTTAGCATTTTTGCATTGGTATGGCCTGTATAAGGTCAAAAAACGTAGTCTCTTTTTGTCGGAGTGGTAAGTCTAGGTAGACAGTCAGGCTGACATGCAGACAAACAGTATGCCCCGGCTGGAAAATTCAGTGCTGTCCACTGGTTGGGGTAGAGGCAGACTTAAAATATTGTTTTACCACAGCCCATGGGGCAGTGTATGGCCTGTATAAGGTCAAAAAACGTCCAGAATTCTTTGGCATTTTCAATTGTTAATTTAAGTTGAACTGAAGAGGAATAGAGGTGCTTGTGTTTGTGTGTTTGTGTGTGTGTGTGCATATGTG

        Comment


        • #5
          i am not sure if this code can help. It generates contig .agp file by using scaffolds.


          wwq

          Comment


          • #6
            Originally posted by wwq413 View Post
            i am not sure if this code can help. It generates contig .agp file by using scaffolds.


            www.hmpdacc.org/doc/fasta2apg.pl‎

            Dear WWQ413, thank you for trying. I will try this on my scaffolds.
            Did you already try to submit your file generated by your script to NCBI?

            Comment


            • #7
              Originally posted by wwq413 View Post
              i am not sure if this code can help. It generates contig .agp file by using scaffolds.


              www.hmpdacc.org/doc/fasta2apg.pl‎
              The link does not work...

              Comment


              • #8
                Try copying this link to your browser, that seemed to work for me:

                Comment


                • #9
                  There is a script in the Velvet package called fasta2agp.pl, it is located in the 'contrib' folder.

                  Comment


                  • #10
                    fasta2agp.pl requires Bio::SeqIO...can this module be found seperately or do I need to install BioPerl?

                    I tried pip install with no luck.

                    Bear with me I'm new at this. Hopefully this is the one and only time I need to submit a WGS assembly to NCBI.

                    Comment


                    • #11
                      Hi,
                      I tried to use the fasta2agp.pl script but I get errors when submitting to NCBI WGS. The script is from 2010 and there is a new AGPv2,0 version since 2012. The script worked but there is an error counting the fragments (column 4) and it gives errors when submitting to WGS.
                      Can anyone help with this issue?
                      See first lines:
                      # Generated from SOAPdenovo assembly file B12041.fa using script fasta2agp.pl
                      B12041_scaffold_1 1 8630 0 W B12041_1 1 8630 +
                      B12041_scaffold_1 8631 8665 1 N 35 fragment yes
                      B12041_scaffold_1 8666 30966 1 W B12041_2 1 22301 +
                      B12041_scaffold_1 30967 30984 2 N 18 fragment yes
                      B12041_scaffold_1 30985 89730 2 W B12041_3 1 58746 +
                      B12041_scaffold_1 89731 89731 3 N 1 fragment yes
                      B12041_scaffold_1 89732 105108 3 W B12041_4 1 15377 +

                      Comment


                      • #12
                        Originally posted by stevebaeyen View Post
                        Hi,
                        I tried to use the fasta2agp.pl script but I get errors when submitting to NCBI WGS. The script is from 2010 and there is a new AGPv2,0 version since 2012. The script worked but there is an error counting the fragments (column 4) and it gives errors when submitting to WGS.
                        Can anyone help with this issue?
                        See first lines:
                        # Generated from SOAPdenovo assembly file B12041.fa using script fasta2agp.pl
                        B12041_scaffold_1 1 8630 0 W B12041_1 1 8630 +
                        B12041_scaffold_1 8631 8665 1 N 35 fragment yes
                        B12041_scaffold_1 8666 30966 1 W B12041_2 1 22301 +
                        B12041_scaffold_1 30967 30984 2 N 18 fragment yes
                        B12041_scaffold_1 30985 89730 2 W B12041_3 1 58746 +
                        B12041_scaffold_1 89731 89731 3 N 1 fragment yes
                        B12041_scaffold_1 89732 105108 3 W B12041_4 1 15377 +
                        Found a solution!! We adapted the script because there was an error counting the fragments. Here is the script (ran on Ubuntu 12,04 LTS):
                        #!/usr/bin/perl

                        ### [email protected]

                        ### Generates contigs (in FastA) and scaffolding information (in AGP) from Velvet 'contigs.fa' supercontigs file

                        ### Use entirely at you own risk!! There may be bugs!

                        ### modified by [email protected] 2010/09/13
                        ### add flags -i -size -o
                        ### change naming convention for HMP assembly purpose

                        ### modified by [email protected] 2010/9/15
                        ### added flag -name
                        ### allowing to change names of sequences

                        ### modified by [email protected] 2010/9/27
                        ### removed 10bp limit for inter-contig gaps

                        ### modified by [email protected] 2014/11/27
                        ### fixed counter $i to restart from 1 at each scaffold + unique fragment number generation


                        use strict;
                        use warnings ;
                        use Bio::SeqIO ;
                        use Getopt::Long;
                        use File::Basename;

                        my $file;
                        my $outDir;
                        my $scaf_size_cutoff=0;
                        my $name_prefix="AGP";

                        GetOptions(
                        'i=s' => \$file, # scaf seq in fasta
                        'size=i' => \$scaf_size_cutoff,
                        'o=s' => \$outDir,
                        'name=s' => \$name_prefix);

                        die "Usage: $0 -i <sequence file> -o <out_dir> [-size <scaf_size_cutoff>] [-name <name>]\n" unless $file;
                        my ($file_name, $path, $suffix)=fileparse("$file", qr/\.[^.]*/);
                        my ($sample_name,$center)=split /_/,$file_name;

                        ### Output file for contigs in Fasta format
                        ### The various output file names can be tuned here.
                        my $contig_outfile = "$outDir/$name_prefix.contigs.fa";
                        my $scaffolds_outfile = "$outDir/$name_prefix.scaffolds.fa";
                        my $agp_outfile = "$outDir/$name_prefix.agp";


                        #open (FILE, ">$outdir/$contig_outfile") and
                        # warn "Will write contigs to file '$contig_outfile' to $outDir Directory\n" or
                        # die "Failed to write to file '$fasta_outfile'\n";

                        my $contig_out = Bio::SeqIO->new('-file' => ">$contig_outfile",
                        '-format' => 'Fasta') and
                        warn "Will write contigs to file '$contig_outfile' to $outDir Directory\n" or
                        die "Failed to write to file '$contig_outfile'\n";

                        open (AGP_FILE, ">$agp_outfile") and
                        warn "Will write contigs to file $agp_outfile to $outDir Directory\n" or
                        die "Failed to write to file $agp_outfile\n";

                        #open (SCAFF_FILE, ">$outDir/$scaffolds_outfile") and
                        my $scaff_out = Bio::SeqIO->new('-file' => ">$scaffolds_outfile",
                        '-format' => 'Fasta') and
                        warn "Will write scaffolds to file $scaffolds_outfile to $outDir Directory\n" or
                        die "Failed to write to file $scaffolds_outfile\n";

                        print AGP_FILE "# Generated from SOAPdenovo assembly file $file using script $0\n";

                        warn "Scaffold Sequence cutoff = $scaf_size_cutoff nt\n" if ($scaf_size_cutoff);


                        my $i = 0;# a counter, used for generating unique contig names

                        my $inseq = Bio::SeqIO->new('-file' => "<$file",
                        '-format' => 'Fasta' ) ;


                        while (my $seq_obj = $inseq->next_seq ) {

                        my $supercontig_id = $seq_obj->id ;
                        my $supercontig_seq = $seq_obj->seq ;
                        my $supercontig_desc = $seq_obj->description ;
                        my $supercontig_length = length($supercontig_seq);

                        $i=0; #reset each time you have a new scaffold

                        # only process the long supercontigs
                        next if ($supercontig_length < $scaf_size_cutoff);


                        ### NCBI do not allow coverage and length information in the FastA identifier
                        ### e.g. NODE_1160_length_397673_cov_14.469489 is an illegal FastA ID
                        ### So we will replace these with simple numbers
                        if ($supercontig_id =~ m/NODE_(\d+)_length_\d+_cov_\d+/ or
                        $supercontig_id =~ m/^(\d+)$/ or
                        $supercontig_id =~ m/^scaffold(\d+)$/) {
                        $supercontig_id = "${name_prefix}_scaffold_$1";
                        }


                        # print SCAFF_FILE ">$supercontig_id\n$supercontig_seq\n";
                        my $scf = Bio::PrimarySeq->new(-seq => "$supercontig_seq",
                        -id => "$supercontig_id");

                        $scaff_out->write_seq($scf);

                        my $start_pos = 1; # keep track of whereabouts in this supercontig we are
                        my %substring_sequences;
                        foreach my $substring_sequence ( split /(N+)/i, $supercontig_seq ) {
                        #warn "\n$substring_sequence\n" if $supercontig_id eq '1160'; for #debugging only
                        $i++; # a counter, used for generating unique contig names
                        ### Define the AGP column contents
                        my $object1 = $supercontig_id;
                        my $object_beg2 = $start_pos;
                        my $object_end3 = $start_pos + length($substring_sequence) - 1;
                        my $part_number4 = $i;
                        my $component_type5;
                        my $component_id6a;
                        my $gap_length6b;
                        my $component_beg7a;
                        my $gap_type7b;
                        my $component_end8a;
                        my $linkage8b;
                        my $orientation9a;
                        my $filler9b;
                        if ( $substring_sequence =~ m/^N+$/i ) {
                        ### This is poly-N gap between contigs
                        $component_type5 = 'N';
                        $gap_length6b = length($substring_sequence);
                        $gap_type7b = 'fragment';
                        $linkage8b = 'yes';
                        $filler9b = '';
                        } elsif ( $substring_sequence =~ m/^[ACGT]+$/i ) {
                        ### This is a contig

                        $component_type5 = 'W';
                        $component_id6a = "$supercontig_id"."_"."$i";
                        $component_beg7a = 1;
                        $component_end8a = length($substring_sequence);
                        $orientation9a = '+';
                        ### Print FastA formatted contig
                        # print FILE ">$component_id6a\n$substring_sequence\n";
                        my $ctg = Bio::PrimarySeq->new(-seq => "$substring_sequence",
                        -id => "$component_id6a");

                        $contig_out->write_seq($ctg);


                        } else {
                        die "Illegal characters in sequence\n$substring_sequence\n";
                        }
                        $start_pos += length ($substring_sequence);
                        if ($component_type5 eq 'N') {
                        ### print AGP line for gap
                        print AGP_FILE "$object1\t$object_beg2\t$object_end3\t$part_number4\t$component_type5\t$gap_length6b\t$gap_type7b\t$linkage8b\t$filler9b\n";
                        } else {
                        ### print AGP line for contig
                        print AGP_FILE "$object1\t$object_beg2\t$object_end3\t$part_number4\t$component_type5\t$component_id6a\t$component_beg7a\t$component_end8a\t$orientation9a\n";
                        }
                        }
                        }

                        $contig_out->close();
                        close AGP_FILE;
                        $scaff_out->close();

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM
                        • seqadmin
                          Techniques and Challenges in Conservation Genomics
                          by seqadmin



                          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                          Avian Conservation
                          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                          03-08-2024, 10:41 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, Yesterday, 06:37 PM
                        0 responses
                        10 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, Yesterday, 06:07 PM
                        0 responses
                        9 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-22-2024, 10:03 AM
                        0 responses
                        50 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-21-2024, 07:32 AM
                        0 responses
                        67 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X