Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    yes, the alignment looks very good using IGV- I think, there is an issue in the GFF file as genes do not appear annotated using IGV (only the sequence appear without gene annotation), do not know how to fix the gff file?

    Comment


    • #32
      Can you post an example of the GFF file (first few lines would be fine)?

      Comment


      • #33
        >complete genome
        NC_022544.1 RefSeq region 1 4814801 . + . ID=id0;Dbxref=taxon:568709;Is_circular=true;gbkey=Src;genome=genomic;mol_type=genomic DNA;serovar=Typhimurium;strain=DT2;sub-species=enterica
        NC_022544.1 RefSeq gene 169 255 . + . ID=gene0;Name=thrL;Dbxref=GeneID:17155329;gbkey=Gene;gene=thrL;locus_tag=STMDT2_00011
        NC_022544.1 RefSeq CDS 169 255 . + 0 ID=cds0;Name=YP_008642919.1;Parent=gene0;Dbxref=Genbank:YP_008642919.1,GeneID:17155329;gbkey=CDS;product=thr operon leader peptide;protein_id=YP_008642919.1;transl_table=11
        NC_022544.1 RefSeq gene 337 2799 . + . ID=gene1;Name=thrA;Dbxref=GeneID:17159252;gbkey=Gene;gene=thrA;locus_tag=STMDT2_00021
        NC_022544.1 RefSeq CDS 337 2799 . + 0 ID=cds1;Name=YP_008642920.1;Parent=gene1;Dbxref=Genbank:YP_008642920.1,GeneID:17159252;gbkey=CDS;product=aspartokinase I%2Fhomoserine dehydrogenase I;protein_id=YP_008642920.1;transl_table=11
        NC_022544.1 RefSeq gene 2801 3730 . + . ID=gene2;Name=thrB;Dbxref=GeneID:17159249;gbkey=Gene;gene=thrB;locus_tag=STMDT2_00031
        NC_022544.1 RefSeq CDS 2801 3730 . + 0 ID=cds2;Name=YP_008642921.1;Parent=gene2;Dbxref=Genbank:YP_008642921.1,GeneID:17159249;gbkey=CDS;product=Homoserine kinase;protein_id=YP_008642921.1;transl_table=11
        NC_022544.1 RefSeq gene 3734 5020 . + . ID=gene3;Name=thrC;Dbxref=GeneID:17159250;gbkey=Gene;gene=thrC;locus_tag=STMDT2_00041
        NC_022544.1 RefSeq CDS 3734 5020 . + 0 ID=cds3;Name=YP_008642922.1;Parent=gene3;Dbxref=Genbank:YP_008642922.1,GeneID:17159250;gbkey=CDS;product=threonine synthase;protein_id=YP_008642922.1;transl_table=11
        NC_022544.1 RefSeq gene 5114 5887 . - . ID=gene4;Name=yaaA;Dbxref=GeneID:17159251;gbkey=Gene;gene=yaaA;locus_tag=STMDT2_00051
        NC_022544.1 RefSeq CDS 5114 5887 . - 0 ID=cds4;Name=YP_008642923.1;Parent=gene4;Dbxref=Genbank:YP_008642923.1,GeneID:17159251;gbkey=CDS;product=hypothetical protein;protein_id=YP_008642923.1;transl_table=11
        NC_022544.1 RefSeq gene 5966 7396 . - . ID=gene5;Name=yaaJ;Dbxref=GeneID:17159391;gbkey=Gene;gene=yaaJ;locus_tag=STMDT2_00061
        NC_022544.1 RefSeq CDS 5966 7396 . - 0 ID=cds5;Name=YP_008642924.1;Parent=gene5;Dbxref=Genbank:YP_008642924.1,GeneID:17159391;gbkey=CDS;product=putative amino-acid transport protein;protein_id=YP_008642924.1;transl_table=11
        NC_022544.1 RefSeq gene 7665 8618 . + . ID=gene6;Name=talB;Dbxref=GeneID:17159395;gbkey=Gene;gene=talB;locus_tag=STMDT2_00071
        NC_022544.1 RefSeq CDS 7665 8618 . + 0 ID=cds6;Name=YP_008642925.1;Parent=gene6;Dbxref=Genbank:YP_008642925.1,GeneID:17159395;gbkey=CDS;product=transaldolase B;protein_id=YP_008642925.1;transl_table=11
        NC_022544.1 RefSeq gene 8729 9319 . + . ID=gene7;Name=mog;Dbxref=GeneID:17159215;gbkey=Gene;gene=mog;locus_tag=STMDT2_00081
        NC_022544.1 RefSeq CDS 8729 9319 . + 0 ID=cds7;Name=YP_008642926.1;Parent=gene7;Dbxref=Genbank:YP_008642926.1,GeneID:17159215;gbkey=CDS;product=molybdopterin biosynthesis Mog protein;protein_id=YP_008642926.1;transl_table=11
        NC_022544.1 RefSeq gene 9376 9942 . - . ID=gene8;Name=yaaH;Dbxref=GeneID:17158379;gbkey=Gene;gene=yaaH;locus_tag=STMDT2_00091
        NC_022544.1 RefSeq CDS 9376 9942 . - 0

        Comment


        • #34
          Can you remove the fist line from the file (please make a backup copy of original file in case something goes wrong)

          Code:
          >complete genome
          and then try? That would make it a gff format file (http://www.sanger.ac.uk/resources/so.../gff/spec.html).

          To make it a gff3 format file you will have to replace that first line with following two lines http://www.sequenceontology.org/gff3.shtml

          Code:
          ##gff-version 3 
          ##sequence-region NC_022544.1 1 4814801
          Last edited by GenoMax; 12-04-2013, 06:08 AM.

          Comment


          • #35
            after editing the first 2 lines in the gff file as you have kindly suggested. genes annotations can not be seen on IGV (can see only the sequence but not the genes)???- your advice is very appreciated

            Comment


            • #36
              Have you renamed the GFF file as "your_file_name.gff3"?

              IGV expects the GFF3 files to have that extension (http://www.broadinstitute.org/software/igv/GFF) and they also need to be tab-delimited (which your example above does not appear to be).

              Comment


              • #37
                After taking out the first two lines from your example, adding the two lines for GFF3 meta-data and then converting to tab-delimited text the file appears to work with IGV.

                Take these two lines out:

                Code:
                >complete genome
                NC_022544.1 RefSeq region 1 4814801 . + . ID=id0;Dbxref=taxon:568709;Is_circular=true;gbkey=Src;genome=genomic;mol_type=genomic DNA;serovar=Typhimurium;strain=DT2;sub-species=enterica
                Replace with:

                Code:
                ##gff-version 3 
                ##sequence-region NC_022544.1 1 4814801
                Attached Files

                Comment


                • #38
                  how do you convert text (gff3) to tab-delimited, please?
                  I used:

                  awk '{ for(i=1;i<=NF;i++){if(i==NF){printf("%s\n",$NF);}else {printf("%s\t",$i)}}}' file.gff3

                  but did not convert the file to tab-delimited????

                  ##gff-version 3
                  ##sequence-region NC_022544.1 1 4814801

                  NC_022544.1 RefSeq gene 169 255 . + . ID=gene0;Name=thrL;Dbxref=GeneID:17155329;gbkey=Gene;gene=thrL;locus_tag=STMDT2_00011
                  NC_022544.1 RefSeq CDS 169 255 . + 0 ID=cds0;Name=YP_008642919.1;Parent=gene0;Dbxref=Genbank:YP_008642919.1,GeneID:17155329;gbkey=CDS;gene=thrL;product=thr operon leader peptide;protein_id=YP_008642919.1;transl_table=11
                  NC_022544.1 RefSeq gene 337 2799 . + . ID=gene1;Name=thrA;Dbxref=GeneID:17159252;gbkey=Gene;gene=thrA;locus_tag=STMDT2_00021
                  NC_022544.1 RefSeq CDS 337 2799 . + 0 ID=cds1;Name=YP_008642920.1;Parent=gene1;Dbxref=Genbank:YP_008642920.1,GeneID:17159252;gbkey=CDS;gene=thrA;product=aspartokinase I%2Fhomoserine dehydrogenase I;protein_id=YP_008642920.1;transl_table=11
                  NC_022544.1 RefSeq gene 2801 3730 . + . ID=gene2;Name=thrB;Dbxref=GeneID:17159249;gbkey=Gene;gene=thrB;locus_tag=STMDT2_00031
                  NC_022544.1 RefSeq CDS 2801 3730 . + 0 ID=cds2;Name=YP_008642921.1;Parent=gene2;Dbxref=Genbank:YP_008642921.1,GeneID:17159249;gbkey=CDS;gene=thrB;product=Homoserine kinase;protein_id=YP_008642921.1;transl_table=11
                  NC_022544.1 RefSeq gene 3734 5020 . + . ID=gene3;Name=thrC;Dbxref=GeneID:17159250;gbkey=Gene;gene=thrC;locus_tag=STMDT2_00041
                  NC_022544.1 RefSeq CDS 3734 5020 . + 0 ID=cds3;Name=YP_008642922.1;Parent=gene3;

                  Comment


                  • #39
                    Try this unix command. Adjust the file names accordingly.
                    Code:
                    $ tr ' ' \\t < original.gff3 > tab_converted.gff3
                    There is a "space" between the two single quotes in the command above.

                    Best to put the two top metadata lines in after the conversion.

                    Comment


                    • #40
                      thanks you soo much for your advice and time. managed to view genes on IGV and have got a coverage.txt file showing genes that are covered/ missed from the reference

                      but have a very simple technichal issue when I open the coverage.txt using excel or libreoffice- I could not sort the coverage values in ascending order as values are written on a separte line

                      NC_022544.1 RefSeq gene 2096621 2097676 . - . ID=gene2037;Name=cbiG;Dbxref=GeneID:17157414;gbkey=Gene;gene=cbiG;locus_tag=STMDT2_20011
                      2505 1056 1056 1
                      NC_022544.1 RefSeq CDS 2096621 2097676 . - 0 "ID=cds1963;Name=YP_008644883.1;Parent=gene2037;Dbxref=Genbank:YP_008644883.1,GeneID:17157414;gbkey=CDS;gene=cbiG;product=cobalamin biosynthesis protein;protein_id=YP_008644883.1;transl_table=11"
                      2505 1056 1056 0.9
                      NC_022544.1 RefSeq gene 3145124 3146200 . - . ID=gene2977;Name=STMDT2_29251;Dbxref=GeneID:17156485;gbkey=Gene;locus_tag=STMDT2_29251
                      1716 1077 1077 0.6

                      Comment


                      • #41
                        Something like the following will put each record on a single line.
                        Code:
                        cat coverage.txt | awk 'BEGIN{first=1; OFS='\t'; ORS='\t';}{if(first==1) {print $0; first=0;} else {print "\t",$0,"\n"; first=1}}' > coverage.single_line.txt

                        Comment


                        • #42
                          I am afraid, that did not work

                          Comment


                          • #43
                            Devon's solution works for me. What is happening in your case?

                            Comment


                            • #44
                              NC_022544.1 RefSeq gene 2096621 2097676 0 - 0 ID=gene2037;Name=cbiG;Dbxref=GeneID:17157414;gbkey=Gene;gene=cbiG;locus_tag=STMDT2_20011
                              2505 1056 1056 1 NC_022544.1 RefSeq CDS 2096621
                              2505 1056 1056 1
                              NC_022544.1 RefSeq gene 3145124 3146200 0 - 0 ID=gene2977;Name=STMDT2_29251;Dbxref=GeneID:17156485;gbkey=Gene;locus_tag=STMDT2_29251
                              1716 1077 1077 1 NC_022544.1 RefSeq CDS 3145124
                              1716 1077 1077 1

                              Comment


                              • #45
                                Wonder if this is a unix vs PC/Mac file format issue. Are you moving the file among machines before running the script? I pasted your original sample into a new file on unix and did not have any problem.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                25 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                28 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                24 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                52 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X