SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
counting with kallisto bfp7 RNA Sequencing 2 12-03-2015 01:51 PM
how to visualize a few hundreds RNAseq samples in IGV/GenomeBrowser shirley0818 RNA Sequencing 0 01-27-2015 06:28 PM
PanGenome Visualisation lucio89 Bioinformatics 1 10-10-2014 02:00 AM
IGV v2 performance much worse than IGV 1.5? pmiguel Bioinformatics 1 05-17-2012 07:05 PM
SNP visualisation a11msp Bioinformatics 3 08-02-2010 05:36 AM

Reply
 
Thread Tools
Old 04-05-2016, 09:49 AM   #1
KamilSJaron
Junior Member
 
Location: Switzerland

Join Date: Apr 2016
Posts: 7
Question visualisation of RNAseq (kallisto to IGV)

Hello everyone,

as others, I am quite excited about pseudo alignment produced by kallisto in minutes instead of real alignment computed for hours. Now, it would be useful to visualise it using IGV.

So from the .gdb file we extracted cds of our bacteria using python scripts. The name of each sequence in cds was the gene_id (which was the same as transcript_id). Exactly, how we would expect.

On this cds file I run kallisto index to index it and then I produced according to the manual of kallisto pseudobam file. (https://pachterlab.github.io/kallisto/manual.html)

kallisto quant -i cds.idx -o output -b 100 --single -l 100 -s 1 --pseudobam <all_RNAseq_reads.fq.gz> | samtools view -Sb - > pseudomap.bam

The .bam file was then sorted and indexed and loaded with .fasta and .gtf file to IGV giving following error

File does not contain any sequence names which match the current genome.
File: *****S5_genome_87, S5_genome_88, S5_genome_89, S5_genome_90, ...
Genome: S5_genome,

S5_genome_XX are gene_ids of our genome and S5 is our genome. So, I thought, that IGV thinks, that every transcript is a chromosome (from few related posts like http://seqanswers.com/forums/archive...p/t-16407.html). So I ve created alias file like this:

S5_genome_87 S5_genome
S5_genome_88 S5_genome
... ...

Now it loaded the file, but reads are not visualised at all. I guess I miss something somewhere. Imho the easiest way would be to edit somehow the .bam file (or the .sam file before it is converted to .bam) to include the information of the only one chromosome of the genome.

If you are still reading, thank you for it. Any help appreciated.
KamilSJaron is offline   Reply With Quote
Old 07-12-2016, 07:52 AM   #2
KamilSJaron
Junior Member
 
Location: Switzerland

Join Date: Apr 2016
Posts: 7
Default

I wrote a small python script for conversion of .sam produced by kallisto to .sam readable by IGV using .gtf file. It is not perfect (I was bit in rush when I was writing it) and all transctipts on reverse reverse strand have reads viewed as they would be in forward direction (so opposite than they should), but on the correct place (i.e. if you want to check coverage / transcripts, it is fair enough).

So if you would be interested

https://github.com/KamilSJaron/Seque...m_convertor.py

Usage:

python3 kallisto_sam_convertor.py <pseudoalignment.sam> <annotation.gtf> | samtools view -bS - | samtools sort - -o <output.bam>

bam should be loadable to IGV.

---edit---
I think, that to correct the script, it's needed to change a bitflag of reads mapping to transcripts from reverse strand (fw reads - to bw reads and visa reverse) and recompute position of the read (should be symmetric around the middle of a transcript.)

Last edited by KamilSJaron; 12-01-2016 at 10:35 AM. Reason: correction of the specification of the problem, the script in post have.
KamilSJaron is offline   Reply With Quote
Reply

Tags
bam, igv, kallisto, rnaseq, samtools

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:20 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO