Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • moldach
    Member
    • Jan 2016
    • 10

    Assigning reads to genes in the absence of genomic annotation

    I'm looking for a method to calculate read counts from SAM/BAM files that does not require an annotation file (GTF or BED files).

    I've done a de novo transcriptome assembly of an organism which does not have an annotated genome. While there is an available genome for a closely related species, it has not been annotated.

    Which program(s) can be used to output count data to a text file suitable for differential expression analysis with DESeq?

    If I blastx my transcriptome against uniref 50 (or uniprot_sprot, I'm not sure which database is ideal - but that is another post all in itself) how can this output be integrated with the count data set (either in DESeq or prior to that)?
  • SylvainL
    Senior Member
    • Feb 2012
    • 180

    #2
    Hi,

    did you map your reads versus the closely realted genome of directly on your transcriptome assembly? If you did on the transcriptome, simply use bedtools to get the number of reads on each transcripts...

    Comment

    • colindaven
      Senior Member
      • Oct 2008
      • 417

      #3
      Suggestions

      #1 Remap reads to closely related genome. Satisfied with mapping rate ?

      #2 Use gmap (easy) or Maker to map your de novo assembled transcripts to the related genome. Again, satisfied ? View both sets in a genome browser.

      #3 if unsatisfied with #2 perhaps use Trinity genome guided or cufflinks to recreate transcripts.

      #4 Quantify - ie using featureCounts - transcripts from #2 or #3.

      Forget blast for this kind of approach.

      Comment

      • moldach
        Member
        • Jan 2016
        • 10

        #4
        Originally posted by SylvainL View Post
        Hi,

        did you map your reads versus the closely realted genome of directly on your transcriptome assembly? If you did on the transcriptome, simply use bedtools to get the number of reads on each transcripts...
        I mapped directly on my transcription assembly.

        I couldn't find any reference to getting the number of reads on each transcript (maybe it's just worded differently?) from the documentation of bedtools. However, I found a Biostars link that suggested using the multicov sub-command in the bedtools suite.

        However, according to the documentation the multicov from BEDtools requires genome annotation. For example:

        >bedtools multicov –bams run.bam -bed genes.bed

        Are you talking about another sub-command or can multicov be run without the bed file?

        Comment

        • moldach
          Member
          • Jan 2016
          • 10

          #5
          Originally posted by colindaven View Post
          Suggestions

          #1 Remap reads to closely related genome. Satisfied with mapping rate ?
          I had tried mapping at one point some-time-ago to the closely related un-annotated genome. Unfortunately, I used Bowtie2. I now know better; you need to use a splice-junction aware aligner.

          Originally posted by colindaven View Post
          Suggestions
          #2 Use gmap (easy) or Maker to map your de novo assembled transcripts to the related genome. Again, satisfied ? View both sets in a genome browser.
          So GMAP maps and aligns with this command:

          >gmap -d <genome> -A <cdna_file>

          And it would output SAM files.

          What I don't understand is how (or if) GMAP annotates this genome?
          The documentation for maker on the other hand clearly states it annotates but I can't find anything in the GMAP documentation.

          Will gmap and Maker output an annotation file including chromosomal coordinates of features (GTF)? It says that this is a required file to use featureCounts

          Comment

          • moldach
            Member
            • Jan 2016
            • 10

            #6
            Can anyone help?

            Comment

            • SylvainL
              Senior Member
              • Feb 2012
              • 180

              #7
              Hi,

              since you aligned directly on your transcriptome, I guess your reference contains all the transcripts, so you can get the counts for each by using samtools idxstats

              Comment

              • colindaven
                Senior Member
                • Oct 2008
                • 417

                #8
                A GMAP command which produces GFF3 output might look like this:

                ~/gmap-2015-07-23/bin/gmap -f gff3_gene -D gmap/ -d mygenome.fasta.gmap -B 5 -t 12 --intronlength=50000 --totallength=1000000 -p 3 --npaths=1 transcripts.fa > transcripts.gff3

                This is a nice GFF3 which can be used directly by "bedtools multicov"

                If you want to use featureCounts for read counting try using ngsutils to convert from gff3 to gtf.

                Comment

                • shi
                  Wei Shi
                  • Feb 2010
                  • 236

                  #9
                  featureCounts works with both GTF and GFF formats. I think it should be fine if you directly provide your GFF3 annotation to featureCounts program.

                  Comment

                  Latest Articles

                  Collapse

                  • SEQadmin2
                    From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                    by SEQadmin2


                    Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                    The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                    ...
                    06-02-2026, 10:05 AM
                  • SEQadmin2
                    Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                    by SEQadmin2


                    With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                    Introduction

                    Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                    05-22-2026, 06:42 AM
                  • SEQadmin2
                    Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                    by SEQadmin2

                    Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                    Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                    05-06-2026, 09:04 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, Today, 08:59 AM
                  0 responses
                  11 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-02-2026, 12:03 PM
                  0 responses
                  21 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-02-2026, 11:40 AM
                  0 responses
                  17 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 05-28-2026, 11:40 AM
                  0 responses
                  31 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...