Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    yes, the alignment looks very good using IGV- I think, there is an issue in the GFF file as genes do not appear annotated using IGV (only the sequence appear without gene annotation), do not know how to fix the gff file?

    Comment


    • #32
      Can you post an example of the GFF file (first few lines would be fine)?

      Comment


      • #33
        >complete genome
        NC_022544.1 RefSeq region 1 4814801 . + . ID=id0;Dbxref=taxon:568709;Is_circular=true;gbkey=Src;genome=genomic;mol_type=genomic DNA;serovar=Typhimurium;strain=DT2;sub-species=enterica
        NC_022544.1 RefSeq gene 169 255 . + . ID=gene0;Name=thrL;Dbxref=GeneID:17155329;gbkey=Gene;gene=thrL;locus_tag=STMDT2_00011
        NC_022544.1 RefSeq CDS 169 255 . + 0 ID=cds0;Name=YP_008642919.1;Parent=gene0;Dbxref=Genbank:YP_008642919.1,GeneID:17155329;gbkey=CDS;product=thr operon leader peptide;protein_id=YP_008642919.1;transl_table=11
        NC_022544.1 RefSeq gene 337 2799 . + . ID=gene1;Name=thrA;Dbxref=GeneID:17159252;gbkey=Gene;gene=thrA;locus_tag=STMDT2_00021
        NC_022544.1 RefSeq CDS 337 2799 . + 0 ID=cds1;Name=YP_008642920.1;Parent=gene1;Dbxref=Genbank:YP_008642920.1,GeneID:17159252;gbkey=CDS;product=aspartokinase I%2Fhomoserine dehydrogenase I;protein_id=YP_008642920.1;transl_table=11
        NC_022544.1 RefSeq gene 2801 3730 . + . ID=gene2;Name=thrB;Dbxref=GeneID:17159249;gbkey=Gene;gene=thrB;locus_tag=STMDT2_00031
        NC_022544.1 RefSeq CDS 2801 3730 . + 0 ID=cds2;Name=YP_008642921.1;Parent=gene2;Dbxref=Genbank:YP_008642921.1,GeneID:17159249;gbkey=CDS;product=Homoserine kinase;protein_id=YP_008642921.1;transl_table=11
        NC_022544.1 RefSeq gene 3734 5020 . + . ID=gene3;Name=thrC;Dbxref=GeneID:17159250;gbkey=Gene;gene=thrC;locus_tag=STMDT2_00041
        NC_022544.1 RefSeq CDS 3734 5020 . + 0 ID=cds3;Name=YP_008642922.1;Parent=gene3;Dbxref=Genbank:YP_008642922.1,GeneID:17159250;gbkey=CDS;product=threonine synthase;protein_id=YP_008642922.1;transl_table=11
        NC_022544.1 RefSeq gene 5114 5887 . - . ID=gene4;Name=yaaA;Dbxref=GeneID:17159251;gbkey=Gene;gene=yaaA;locus_tag=STMDT2_00051
        NC_022544.1 RefSeq CDS 5114 5887 . - 0 ID=cds4;Name=YP_008642923.1;Parent=gene4;Dbxref=Genbank:YP_008642923.1,GeneID:17159251;gbkey=CDS;product=hypothetical protein;protein_id=YP_008642923.1;transl_table=11
        NC_022544.1 RefSeq gene 5966 7396 . - . ID=gene5;Name=yaaJ;Dbxref=GeneID:17159391;gbkey=Gene;gene=yaaJ;locus_tag=STMDT2_00061
        NC_022544.1 RefSeq CDS 5966 7396 . - 0 ID=cds5;Name=YP_008642924.1;Parent=gene5;Dbxref=Genbank:YP_008642924.1,GeneID:17159391;gbkey=CDS;product=putative amino-acid transport protein;protein_id=YP_008642924.1;transl_table=11
        NC_022544.1 RefSeq gene 7665 8618 . + . ID=gene6;Name=talB;Dbxref=GeneID:17159395;gbkey=Gene;gene=talB;locus_tag=STMDT2_00071
        NC_022544.1 RefSeq CDS 7665 8618 . + 0 ID=cds6;Name=YP_008642925.1;Parent=gene6;Dbxref=Genbank:YP_008642925.1,GeneID:17159395;gbkey=CDS;product=transaldolase B;protein_id=YP_008642925.1;transl_table=11
        NC_022544.1 RefSeq gene 8729 9319 . + . ID=gene7;Name=mog;Dbxref=GeneID:17159215;gbkey=Gene;gene=mog;locus_tag=STMDT2_00081
        NC_022544.1 RefSeq CDS 8729 9319 . + 0 ID=cds7;Name=YP_008642926.1;Parent=gene7;Dbxref=Genbank:YP_008642926.1,GeneID:17159215;gbkey=CDS;product=molybdopterin biosynthesis Mog protein;protein_id=YP_008642926.1;transl_table=11
        NC_022544.1 RefSeq gene 9376 9942 . - . ID=gene8;Name=yaaH;Dbxref=GeneID:17158379;gbkey=Gene;gene=yaaH;locus_tag=STMDT2_00091
        NC_022544.1 RefSeq CDS 9376 9942 . - 0

        Comment


        • #34
          Can you remove the fist line from the file (please make a backup copy of original file in case something goes wrong)

          Code:
          >complete genome
          and then try? That would make it a gff format file (http://www.sanger.ac.uk/resources/so.../gff/spec.html).

          To make it a gff3 format file you will have to replace that first line with following two lines http://www.sequenceontology.org/gff3.shtml

          Code:
          ##gff-version 3 
          ##sequence-region NC_022544.1 1 4814801
          Last edited by GenoMax; 12-04-2013, 06:08 AM.

          Comment


          • #35
            after editing the first 2 lines in the gff file as you have kindly suggested. genes annotations can not be seen on IGV (can see only the sequence but not the genes)???- your advice is very appreciated

            Comment


            • #36
              Have you renamed the GFF file as "your_file_name.gff3"?

              IGV expects the GFF3 files to have that extension (http://www.broadinstitute.org/software/igv/GFF) and they also need to be tab-delimited (which your example above does not appear to be).

              Comment


              • #37
                After taking out the first two lines from your example, adding the two lines for GFF3 meta-data and then converting to tab-delimited text the file appears to work with IGV.

                Take these two lines out:

                Code:
                >complete genome
                NC_022544.1 RefSeq region 1 4814801 . + . ID=id0;Dbxref=taxon:568709;Is_circular=true;gbkey=Src;genome=genomic;mol_type=genomic DNA;serovar=Typhimurium;strain=DT2;sub-species=enterica
                Replace with:

                Code:
                ##gff-version 3 
                ##sequence-region NC_022544.1 1 4814801
                Attached Files

                Comment


                • #38
                  how do you convert text (gff3) to tab-delimited, please?
                  I used:

                  awk '{ for(i=1;i<=NF;i++){if(i==NF){printf("%s\n",$NF);}else {printf("%s\t",$i)}}}' file.gff3

                  but did not convert the file to tab-delimited????

                  ##gff-version 3
                  ##sequence-region NC_022544.1 1 4814801

                  NC_022544.1 RefSeq gene 169 255 . + . ID=gene0;Name=thrL;Dbxref=GeneID:17155329;gbkey=Gene;gene=thrL;locus_tag=STMDT2_00011
                  NC_022544.1 RefSeq CDS 169 255 . + 0 ID=cds0;Name=YP_008642919.1;Parent=gene0;Dbxref=Genbank:YP_008642919.1,GeneID:17155329;gbkey=CDS;gene=thrL;product=thr operon leader peptide;protein_id=YP_008642919.1;transl_table=11
                  NC_022544.1 RefSeq gene 337 2799 . + . ID=gene1;Name=thrA;Dbxref=GeneID:17159252;gbkey=Gene;gene=thrA;locus_tag=STMDT2_00021
                  NC_022544.1 RefSeq CDS 337 2799 . + 0 ID=cds1;Name=YP_008642920.1;Parent=gene1;Dbxref=Genbank:YP_008642920.1,GeneID:17159252;gbkey=CDS;gene=thrA;product=aspartokinase I%2Fhomoserine dehydrogenase I;protein_id=YP_008642920.1;transl_table=11
                  NC_022544.1 RefSeq gene 2801 3730 . + . ID=gene2;Name=thrB;Dbxref=GeneID:17159249;gbkey=Gene;gene=thrB;locus_tag=STMDT2_00031
                  NC_022544.1 RefSeq CDS 2801 3730 . + 0 ID=cds2;Name=YP_008642921.1;Parent=gene2;Dbxref=Genbank:YP_008642921.1,GeneID:17159249;gbkey=CDS;gene=thrB;product=Homoserine kinase;protein_id=YP_008642921.1;transl_table=11
                  NC_022544.1 RefSeq gene 3734 5020 . + . ID=gene3;Name=thrC;Dbxref=GeneID:17159250;gbkey=Gene;gene=thrC;locus_tag=STMDT2_00041
                  NC_022544.1 RefSeq CDS 3734 5020 . + 0 ID=cds3;Name=YP_008642922.1;Parent=gene3;

                  Comment


                  • #39
                    Try this unix command. Adjust the file names accordingly.
                    Code:
                    $ tr ' ' \\t < original.gff3 > tab_converted.gff3
                    There is a "space" between the two single quotes in the command above.

                    Best to put the two top metadata lines in after the conversion.

                    Comment


                    • #40
                      thanks you soo much for your advice and time. managed to view genes on IGV and have got a coverage.txt file showing genes that are covered/ missed from the reference

                      but have a very simple technichal issue when I open the coverage.txt using excel or libreoffice- I could not sort the coverage values in ascending order as values are written on a separte line

                      NC_022544.1 RefSeq gene 2096621 2097676 . - . ID=gene2037;Name=cbiG;Dbxref=GeneID:17157414;gbkey=Gene;gene=cbiG;locus_tag=STMDT2_20011
                      2505 1056 1056 1
                      NC_022544.1 RefSeq CDS 2096621 2097676 . - 0 "ID=cds1963;Name=YP_008644883.1;Parent=gene2037;Dbxref=Genbank:YP_008644883.1,GeneID:17157414;gbkey=CDS;gene=cbiG;product=cobalamin biosynthesis protein;protein_id=YP_008644883.1;transl_table=11"
                      2505 1056 1056 0.9
                      NC_022544.1 RefSeq gene 3145124 3146200 . - . ID=gene2977;Name=STMDT2_29251;Dbxref=GeneID:17156485;gbkey=Gene;locus_tag=STMDT2_29251
                      1716 1077 1077 0.6

                      Comment


                      • #41
                        Something like the following will put each record on a single line.
                        Code:
                        cat coverage.txt | awk 'BEGIN{first=1; OFS='\t'; ORS='\t';}{if(first==1) {print $0; first=0;} else {print "\t",$0,"\n"; first=1}}' > coverage.single_line.txt

                        Comment


                        • #42
                          I am afraid, that did not work

                          Comment


                          • #43
                            Devon's solution works for me. What is happening in your case?

                            Comment


                            • #44
                              NC_022544.1 RefSeq gene 2096621 2097676 0 - 0 ID=gene2037;Name=cbiG;Dbxref=GeneID:17157414;gbkey=Gene;gene=cbiG;locus_tag=STMDT2_20011
                              2505 1056 1056 1 NC_022544.1 RefSeq CDS 2096621
                              2505 1056 1056 1
                              NC_022544.1 RefSeq gene 3145124 3146200 0 - 0 ID=gene2977;Name=STMDT2_29251;Dbxref=GeneID:17156485;gbkey=Gene;locus_tag=STMDT2_29251
                              1716 1077 1077 1 NC_022544.1 RefSeq CDS 3145124
                              1716 1077 1077 1

                              Comment


                              • #45
                                Wonder if this is a unix vs PC/Mac file format issue. Are you moving the file among machines before running the script? I pasted your original sample into a new file on unix and did not have any problem.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM
                                • seqadmin
                                  The Impact of AI in Genomic Medicine
                                  by seqadmin



                                  Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                  02-26-2024, 02:07 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 03-14-2024, 06:13 AM
                                0 responses
                                34 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-08-2024, 08:03 AM
                                0 responses
                                72 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-07-2024, 08:13 AM
                                0 responses
                                81 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-06-2024, 09:51 AM
                                0 responses
                                68 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X