Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • rnaseq sam file visualize against the original fasta genome assembly

    Hi all,

    I am struggled myself on this.
    I have used CLC workbench for rnaseq analysis using single reads and an annotated reference genome assembly containing tags of “gene” and “mRNA”. The derived mapping sam file contains the alignments of the reads against the “genes” features and the results are quite promisive. However, I would like to find a way in order to visualize my sam (bam) file against the original fasta genome assembly using a editing software like geneious and importing this either as an alignment file or as a separate track against the genome assembly. I would be grateful for any help and suggestions towards this.

    Thanks in advance

  • #2
    Have you tried Integrative Genome Viewer (IGV): http://www.broadinstitute.org/igv/? You won't be able to edit but view.

    Comment


    • #3
      Thank for your instant reply,
      I have successfully upload and visualized both in IGV and Geneious the sam file derived from the mapping of a read dataset against an annotated reference genome assembly inferring also a respective annotation gff3 file (containing all the annotated features) using Tophat2 (please find attached the format of the derived mapping file A).
      However, when I performed a mapping in clc workbench using the same annotated reference genome and a cleaning and sampliest annotation file (by means of containing only the “gene” and “mRNA” features tags and not all the available ones as in above Tophat2 mapping) the derived sam file is as in file B
      My ids for the annotated reference genome assembly are like below:
      Velvet_120397 (third column in A file)
      As you can see in the attached file the problem in clc is that the exported sam file (file B) is not keeping any records of the initial reference genome assembly IDs and all alignments are referring to the “gene” tags that have been used for the mapping. So I don’t find a way to visualize properly in IGV or Geneious at once the overall contigs mappings (for example of velvet_120397 one) using the fasta reference genome assembly as a input database.
      Please help if you have any idea how I would overpass this..
      Attached Files

      Comment


      • #4
        Can you tell me which exact workflow in CLC did you use for the analysis? RNA-seq analysis under "transcriptomics"?

        Since you provided an annotation file to CLC did it map reads only to those features? Doing the mapping without the annotation file should allow CLC to use the reference genome with original ID's. Have you tried that?

        Comment


        • #5
          Exactly I have used the RNA-seq analysis under "transcriptomics" and CLCmade the mappings only against these features. You have right that if I perform the mapping against the reference genome without the annotation file I will have the mapping results against the original ID’s but in my case I need to have the annotation file in my mapping in order to reduce any mapping bias and also because I need further to have the expression values based on these features (genes or transcripts). So I am looking for a tool/script to convert the derived mapping file using also the reference annotation file to a kind of an alignment file (?) for importing this later in a Viewer.

          Comment


          • #6
            One option could be doing the mapping against the annotated genome for viewing with IGV. Getting the alignments and then using either HTSeq (http://www-huber.embl.de/users/ander.../overview.html) or featureCounts (http://bioinf.wehi.edu.au/featureCounts/) program to do the counting.

            Since you are do multiple analyses in CLC you can keep both versions of the alignments around.

            Comment


            • #7
              Thanks for your suggestions. As far as I understand you suggest running the mapping against the genome without using the annotations for viewing in IGV and getting then the counts of reads by feature using the 2 tools you mentioned. If this is the case that you suggest I am frustrated because the clc mapping will try to predict de novo new splicing sites and exon features by incorporating a lot of bias.

              If you suggest running the mapping against the annotated genome then how I will see again the annotated reference genome assembly in IGV? The output of the CLC would contain the mapping results against each gene and mRNA (around 12000 features) separately in respective bam files and not accordingly to the original fasta contig annotated genome assembly (around 600 contigs).

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin


                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                Yesterday, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              37 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              41 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              35 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              54 views
              0 likes
              Last Post seqadmin  
              Working...
              X