SEQanswers

Go Back   SEQanswers > Introductions



Similar Threads
Thread Thread Starter Forum Replies Last Post
sorting a BAM produces a smaller file than the original oiiio Bioinformatics 18 09-28-2015 12:57 PM
visualize linkage map and assembly scaffold jjjscuedu Bioinformatics 0 08-20-2013 04:48 AM
RNASeq paired-end strand specific attribute location in FASTA file mknut Bioinformatics 2 12-06-2012 09:44 AM
Recovering the original 5' position of reverse strand reads from SAM files microphobe Bioinformatics 1 09-17-2012 08:17 AM
Bam and Sam don't like my fasta file mindlessbrain Bioinformatics 2 12-09-2010 11:47 PM

Reply
 
Thread Tools
Old 02-03-2014, 08:50 AM   #1
antoza
Member
 
Location: France

Join Date: Aug 2013
Posts: 18
Default rnaseq sam file visualize against the original fasta genome assembly

Hi all,

I am struggled myself on this.
I have used CLC workbench for rnaseq analysis using single reads and an annotated reference genome assembly containing tags of “gene” and “mRNA”. The derived mapping sam file contains the alignments of the reads against the “genes” features and the results are quite promisive. However, I would like to find a way in order to visualize my sam (bam) file against the original fasta genome assembly using a editing software like geneious and importing this either as an alignment file or as a separate track against the genome assembly. I would be grateful for any help and suggestions towards this.

Thanks in advance
antoza is offline   Reply With Quote
Old 02-03-2014, 09:26 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,088
Default

Have you tried Integrative Genome Viewer (IGV): http://www.broadinstitute.org/igv/? You won't be able to edit but view.
GenoMax is offline   Reply With Quote
Old 02-03-2014, 10:39 AM   #3
antoza
Member
 
Location: France

Join Date: Aug 2013
Posts: 18
Default

Thank for your instant reply,
I have successfully upload and visualized both in IGV and Geneious the sam file derived from the mapping of a read dataset against an annotated reference genome assembly inferring also a respective annotation gff3 file (containing all the annotated features) using Tophat2 (please find attached the format of the derived mapping file A).
However, when I performed a mapping in clc workbench using the same annotated reference genome and a cleaning and sampliest annotation file (by means of containing only the “gene” and “mRNA” features tags and not all the available ones as in above Tophat2 mapping) the derived sam file is as in file B
My ids for the annotated reference genome assembly are like below:
Velvet_120397 (third column in A file)
As you can see in the attached file the problem in clc is that the exported sam file (file B) is not keeping any records of the initial reference genome assembly IDs and all alignments are referring to the “gene” tags that have been used for the mapping. So I don’t find a way to visualize properly in IGV or Geneious at once the overall contigs mappings (for example of velvet_120397 one) using the fasta reference genome assembly as a input database.
Please help if you have any idea how I would overpass this..
Attached Files
File Type: txt sam_files.txt (2.2 KB, 4 views)
antoza is offline   Reply With Quote
Old 02-03-2014, 11:01 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,088
Default

Can you tell me which exact workflow in CLC did you use for the analysis? RNA-seq analysis under "transcriptomics"?

Since you provided an annotation file to CLC did it map reads only to those features? Doing the mapping without the annotation file should allow CLC to use the reference genome with original ID's. Have you tried that?
GenoMax is offline   Reply With Quote
Old 02-04-2014, 09:08 AM   #5
antoza
Member
 
Location: France

Join Date: Aug 2013
Posts: 18
Default

Exactly I have used the RNA-seq analysis under "transcriptomics" and CLCmade the mappings only against these features. You have right that if I perform the mapping against the reference genome without the annotation file I will have the mapping results against the original ID’s but in my case I need to have the annotation file in my mapping in order to reduce any mapping bias and also because I need further to have the expression values based on these features (genes or transcripts). So I am looking for a tool/script to convert the derived mapping file using also the reference annotation file to a kind of an alignment file (?) for importing this later in a Viewer.
antoza is offline   Reply With Quote
Old 02-04-2014, 09:23 AM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,088
Default

One option could be doing the mapping against the annotated genome for viewing with IGV. Getting the alignments and then using either HTSeq (http://www-huber.embl.de/users/ander.../overview.html) or featureCounts (http://bioinf.wehi.edu.au/featureCounts/) program to do the counting.

Since you are do multiple analyses in CLC you can keep both versions of the alignments around.
GenoMax is offline   Reply With Quote
Old 02-06-2014, 05:51 AM   #7
antoza
Member
 
Location: France

Join Date: Aug 2013
Posts: 18
Default

Thanks for your suggestions. As far as I understand you suggest running the mapping against the genome without using the annotations for viewing in IGV and getting then the counts of reads by feature using the 2 tools you mentioned. If this is the case that you suggest I am frustrated because the clc mapping will try to predict de novo new splicing sites and exon features by incorporating a lot of bias.

If you suggest running the mapping against the annotated genome then how I will see again the annotated reference genome assembly in IGV? The output of the CLC would contain the mapping results against each gene and mRNA (around 12000 features) separately in respective bam files and not accordingly to the original fasta contig annotated genome assembly (around 600 contigs).
antoza is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:50 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO