Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a mRNA GTF file from fasta for HTSeq ?

    Hi,

    I have a sam file (bwa) from paired end RNASeq short reads that was aligned to a CDS fasta file. I need to use HTSeq.counts on this sam file, so I need the corresponding GTF file.

    I thought I would not be too had to generate a basic GTF file from the original RNA fasta file, but HTSeq does not recognize the ID of any RNA sequence in the SAM file :

    Warning: Skipping read 'XXX0654:58235#TGACCA', because chromosome 'gi|155030243|ref|NM_017599.3|', to which it has been aligned, did not appear in the GFF file.

    However, this sequence id is present in the GTF file :
    gi|155030243|ref|NM_017599.3| ref CDS 1 4580 . + . gene_id "VEZT"; transcript_id "NM_017599.3";
    and in the sam header too :
    @SQ SN:gi|155030243|ref|NM_017599.3| LN:4580

    Where is the mismatch ??

    Many thanks for your help.

    Emmanuel.

  • #2
    Sorry, I answer myself : bad reference version. My gtf is OK but the rna.fasta is from another release, so the NM_XXX accession version was different by the number
    after the dot, hence the mismatch.

    Emmanuel.

    Comment


    • #3
      Hi,
      Can you provide the info on how you created GTF file from fasta sequence file..

      Thank you

      Comment


      • #4
        i wanna repeat the same question, because i am working with the transcripome data of non model organism, but during downstream analysis i am often required GFT files of the fasta file i work with, since this insect does not have a genomic information available, so i could not get the transcripts coordinate data. am thinking whether or not i can creat GFT file, that could enable me to count reads with HTseq-count and edgeR DEG analysis.
        if it could be done, how?

        i do not know my question is ridiculous or not, but i want you guys suggestion.
        Last edited by kurban910; 06-13-2015, 05:09 AM.

        Comment


        • #5
          When you align to the transcriptome like this you don't use htseq-count. Instead, filter out secondary alignments and anything else you want (e.g., remove alignments with very low mapping quality), index the results, and use "samtools idxstats" to get the counts. Alternatively, you can use RSEM or eXpress (or one of the many equivalents) to get estimated counts.

          Comment


          • #6
            hi @dpryan,
            thanks for the reply. but what do u mean by "index the results"? do u mean index the sam file i got after aligning the reads to trinity.fasta by using bowtie2?

            does that mean i could not analyze my data by using edgeR, DESeq, and baySeq ?

            Comment


            • #7
              You'll have to convert it to BAM and sort it, but yes.

              You'll get unique integer counts with the idxstats method, so edgeR/DESeq2/etc. will work fine. For the RSEM/eXpress/etc. route, edgeR will work but DESeq2 will not (no clue about baySeq).

              Comment


              • #8
                thank you @Devon Ryan, really

                Comment


                • #9
                  hi @Devon,
                  i have finished differential the expression analysis with edgeR with the help of its user guide, then searched some stuff about how to export data from R. but i still do not know how to export the edgeR analysis results, am using Ubuntu . could you give me some basic tips there?

                  Comment


                  • #10
                    The "write.table()" function is probably the most convenient method. I assume you have some sort of data frame that you'd like written to a text file so you/others can easily use it (e.g., in Excel, as supplemental data, or again in R).

                    Comment


                    • #11
                      Either read it into R yourself and then deal with the columns however you'd like or use awk to create a new text file of each. You probably want to either just count read #1 or filter out singleton alignments, which can be done with samtools.

                      Comment


                      • #12
                        hey @dpryan,
                        it seems like i did not make myself clear, sorry about it.
                        i used edgeR got this:

                        Code:
                        et <- exactTest(y, dispersion=bcv^2)
                        > summary(de <- decideTestsDGE(et))
                           [,1] 
                        -1   273
                        0  27700
                        1    877
                        >
                        is there any function that i could use to get the 273 down-regulated and 877 up-regulated transcripts?

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Advancing Precision Medicine for Rare Diseases in Children
                          by seqadmin




                          Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                          12-16-2024, 07:57 AM
                        • seqadmin
                          Recent Advances in Sequencing Technologies
                          by seqadmin



                          Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                          Long-Read Sequencing
                          Long-read sequencing has seen remarkable advancements,...
                          12-02-2024, 01:49 PM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 12-17-2024, 10:28 AM
                        0 responses
                        22 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 12-13-2024, 08:24 AM
                        0 responses
                        42 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 12-12-2024, 07:41 AM
                        0 responses
                        28 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 12-11-2024, 07:45 AM
                        0 responses
                        42 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X