Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a mRNA GTF file from fasta for HTSeq ?

    Hi,

    I have a sam file (bwa) from paired end RNASeq short reads that was aligned to a CDS fasta file. I need to use HTSeq.counts on this sam file, so I need the corresponding GTF file.

    I thought I would not be too had to generate a basic GTF file from the original RNA fasta file, but HTSeq does not recognize the ID of any RNA sequence in the SAM file :

    Warning: Skipping read 'XXX0654:58235#TGACCA', because chromosome 'gi|155030243|ref|NM_017599.3|', to which it has been aligned, did not appear in the GFF file.

    However, this sequence id is present in the GTF file :
    gi|155030243|ref|NM_017599.3| ref CDS 1 4580 . + . gene_id "VEZT"; transcript_id "NM_017599.3";
    and in the sam header too :
    @SQ SN:gi|155030243|ref|NM_017599.3| LN:4580

    Where is the mismatch ??

    Many thanks for your help.

    Emmanuel.

  • #2
    Sorry, I answer myself : bad reference version. My gtf is OK but the rna.fasta is from another release, so the NM_XXX accession version was different by the number
    after the dot, hence the mismatch.

    Emmanuel.

    Comment


    • #3
      Hi,
      Can you provide the info on how you created GTF file from fasta sequence file..

      Thank you

      Comment


      • #4
        i wanna repeat the same question, because i am working with the transcripome data of non model organism, but during downstream analysis i am often required GFT files of the fasta file i work with, since this insect does not have a genomic information available, so i could not get the transcripts coordinate data. am thinking whether or not i can creat GFT file, that could enable me to count reads with HTseq-count and edgeR DEG analysis.
        if it could be done, how?

        i do not know my question is ridiculous or not, but i want you guys suggestion.
        Last edited by kurban910; 06-13-2015, 05:09 AM.

        Comment


        • #5
          When you align to the transcriptome like this you don't use htseq-count. Instead, filter out secondary alignments and anything else you want (e.g., remove alignments with very low mapping quality), index the results, and use "samtools idxstats" to get the counts. Alternatively, you can use RSEM or eXpress (or one of the many equivalents) to get estimated counts.

          Comment


          • #6
            hi @dpryan,
            thanks for the reply. but what do u mean by "index the results"? do u mean index the sam file i got after aligning the reads to trinity.fasta by using bowtie2?

            does that mean i could not analyze my data by using edgeR, DESeq, and baySeq ?

            Comment


            • #7
              You'll have to convert it to BAM and sort it, but yes.

              You'll get unique integer counts with the idxstats method, so edgeR/DESeq2/etc. will work fine. For the RSEM/eXpress/etc. route, edgeR will work but DESeq2 will not (no clue about baySeq).

              Comment


              • #8
                thank you @Devon Ryan, really

                Comment


                • #9
                  hi @Devon,
                  i have finished differential the expression analysis with edgeR with the help of its user guide, then searched some stuff about how to export data from R. but i still do not know how to export the edgeR analysis results, am using Ubuntu . could you give me some basic tips there?

                  Comment


                  • #10
                    The "write.table()" function is probably the most convenient method. I assume you have some sort of data frame that you'd like written to a text file so you/others can easily use it (e.g., in Excel, as supplemental data, or again in R).

                    Comment


                    • #11
                      Either read it into R yourself and then deal with the columns however you'd like or use awk to create a new text file of each. You probably want to either just count read #1 or filter out singleton alignments, which can be done with samtools.

                      Comment


                      • #12
                        hey @dpryan,
                        it seems like i did not make myself clear, sorry about it.
                        i used edgeR got this:

                        Code:
                        et <- exactTest(y, dispersion=bcv^2)
                        > summary(de <- decideTestsDGE(et))
                           [,1] 
                        -1   273
                        0  27700
                        1    877
                        >
                        is there any function that i could use to get the 273 down-regulated and 877 up-regulated transcripts?

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM
                        • seqadmin
                          Techniques and Challenges in Conservation Genomics
                          by seqadmin



                          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                          Avian Conservation
                          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                          03-08-2024, 10:41 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 03-27-2024, 06:37 PM
                        0 responses
                        12 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-27-2024, 06:07 PM
                        0 responses
                        11 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-22-2024, 10:03 AM
                        0 responses
                        53 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-21-2024, 07:32 AM
                        0 responses
                        68 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X