SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Viral genome assembly from RNA-seq data mcastro RNA Sequencing 2 11-15-2015 05:21 AM
annotation for viral genome honey Bioinformatics 3 10-07-2012 09:18 PM
how to optimize viral genome assembly in WGS 7.0? rexxi Bioinformatics 1 08-01-2012 11:01 PM
First Helicos Publication! Single Molecule DNA Seq of a "Viral" Genome ECO Helicos / Direct Genomics 15 09-05-2008 06:05 AM
Helicos sequencing: Single-Molecule DNA Sequencing of a Viral Genome eldfors Literature Watch 1 04-04-2008 08:19 AM

Reply
 
Thread Tools
Old 12-02-2012, 05:11 AM   #1
mathew
Member
 
Location: australia

Join Date: Jan 2011
Posts: 81
Default viral genome alignment

I have a question, I am working on a chIPseq data where tumors are having a viral infection. We IP with a human specific antibody for our gene of interest. The S.E reads from hiseq were aligned to human genome using BWA which has worked fine and gave me some probable binding sites after peak calling. Now I am working on to find what happened to viral factors. So I took viral genome (around 10K) using Bowtie. Here is a screen shot for Bowtie SAM file, There are only 0.30% uniquely mapped reads.
QNAME FLAG RNAME POS MAPQ CIGAR MRNM MPOS ISIZE SEQ QUAL OPT
@HD VN:1.0 SO:unsorted
@SQ SN:AF148805 LN:137969
@PG ID:Bowtie VN:0.12.7 CL:"bowtie -q -p 8 -S -n 2 -e 70 -l 28 --maxbts 800 -y -k 1 -a --best --phred33-quals /tmp/3006527.cyberstar.psu.edu/tmp5nKzJC/tmpb50_zP /galaxy/main_pool/pool3/files/005/338/dataset_5338393.dat"
HWI-ST550_0201:3:1101:1671:2197#ACAGTG/1 4 * 0 0 * * 0 0 AAAATTCAGGCTCTCTATTTCACAGTTCATTAGTTCATTCGTTTACTGTG CCCFFFFFHHHHHJGIJJJJHIIJJJIGIHIIIJJGIJJJJJJJIJIJII XM:i:0
HWI-ST550_0201:3:1101:1678:2241#ACAGTG/1 4 * 0 0 * * 0 0 AGTGGTGTTTAATATAGTTTTGGGTATTTTTAACTAAAAATCATTGTTAT ?@@B?2AD?D<<CAE4AGHIF9CEG+AFDHID3C?9?CDFC**:?9*B9D XM:i:0
HWI-ST550_0201:3:1101:1626:2216#ACAGTG/1 4 * 0 0 * * 0 0 GTTGCGGGAGAAGCCAAACGCGGCGAGTCTTGCTAAAGCCGTCGCCGTAG BBCFFFFFFHHHF>GGGHCGEHIGGAE=CDFACEEEEDDDBDD;BB57<? XM:i:0
HWI-ST550_0201:3:1101:1580:2218#ACAGTG/1 4 * 0 0 * * 0 0 ACAGAAATGGCATCAAGAGACCTTGATTACAAGGATATGAATCTCTTAAG CCCFFFFFHHGHHIIJJIJJJJJJJDIJJJIIIJIJJJJIJJIJIJJIJI XM:i:0
HWI-ST550_0201:3:1101:1779:2214#ACAGTG/1 4 * 0 0 * * 0 0 CCAATCTCTGCTACAGTTTGTTTCCCTCAATTTCTAATTACTTTAAAAAG CC@FFFFFHHDHDFGHEGIJIIJJJJGIGJJJJJIIJJEIIEHGJIGJJI XM:i:0

_________________________________________________________

How I should be selecting only uniquely mapped reads to viral genome?
Why I have so low number of uniquely mapped reads? Is there any way that I can increase this unique mapping? What will be the best strategy to align to viral genome in this case, Should I be aligning to viral genome all reads or first align to human then align to un-mapped reads to viral genome. I also tried it with BWA gave around 0.29% unique alignment.
mathew is offline   Reply With Quote
Old 12-02-2012, 05:55 AM   #2
gsgs
Senior Member
 
Location: germany

Join Date: Oct 2009
Posts: 140
Default

I don't know your programs and acronyms, but what I woulf do is
checking subsequences of length -say- 12. How many % of them are
found, this is easy to check even for big databases. (all viruses
from genbank or such)
gsgs is offline   Reply With Quote
Old 12-02-2012, 06:33 AM   #3
mathew
Member
 
Location: australia

Join Date: Jan 2011
Posts: 81
Default viral genome seq

Consider me a new in viral genome. Could you please explian how can I calculate sub seq ratio? Any pointer to URL or guidence will be great. I am using Bowtie aligner.

Thanks
mathew is offline   Reply With Quote
Old 12-02-2012, 07:02 AM   #4
gsgs
Senior Member
 
Location: germany

Join Date: Oct 2009
Posts: 140
Default

well, I'm the new one since I don't know about the normal software and databases and companies etc.
I write my own software, mainly just for influenza.

After some years (!) I noticed, that for most comparisons we don't need
alignment, we can just count the number of matching subsequences of certain length, no matter at what position they appear.

I think this is also basically used in "blast", why it's so fast for big databases.

So I wrote a program for that, (Windows 32-bit,cmd.exe commandline - DOS)
but presumably there are other programs available for UNIX,Win64, etc,

I can send my program, with source code or I run your data through it
(all genbank viruses) it finds matching subsequences length 15-28

(I speculate this is what you want, but am not sure)
gsgs is offline   Reply With Quote
Old 12-02-2012, 07:37 AM   #5
mathew
Member
 
Location: australia

Join Date: Jan 2011
Posts: 81
Default Viral genome

gsg Thanks I have sent a private message with my email. If you can share your script that will be a good start for me.
mathew is offline   Reply With Quote
Old 12-02-2012, 11:55 AM   #6
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

Ideally, if you know your samples has both human and viral sequence in it, you should be aligning to a reference that has both. That will give you the most accurate alignment.

Would you expect the antibody to be binding to viral sequence?

If you are getting very few reads aligning to virus, the simplest explanation is that you have very little virus in your library. Why have you dismissed that possibility?
swbarnes2 is offline   Reply With Quote
Old 12-03-2012, 06:31 AM   #7
mathew
Member
 
Location: australia

Join Date: Jan 2011
Posts: 81
Default Viral genome

Thanks swbarnes2,

I agree, so what I have done is I have aligned with just viral genome that is not very good (around 0.3% aligned reads). Then when I aligned to a combined human and viral genome it gives me around 70% aligned reads. However I am not sure how I can separate out the reads aligned to viral genome only. It may be that most of reads are mapped to human. As far as no of viral particles in that library are concerned the experiment has worked and mostly the induction is by viral infection which has translated well into human genome analysis. Any thought on how we can I get viral genome reads only? I have read papers which state that they just aligned to viral genome and then called peaks but it is not working in my case unless I am missing something.

Thanks
mathew is offline   Reply With Quote
Old 12-03-2012, 07:54 AM   #8
Gators
Member
 
Location: North Carolina

Join Date: Feb 2011
Posts: 22
Default

There are bowtie parameters to output only uniquely aligning reads.

I think it is -m 1, the only output alignments are unique
Gators is offline   Reply With Quote
Old 12-03-2012, 08:33 AM   #9
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

Quote:
Originally Posted by mathew View Post
Thanks swbarnes2,

I agree, so what I have done is I have aligned with just viral genome that is not very good (around 0.3% aligned reads). Then when I aligned to a combined human and viral genome it gives me around 70% aligned reads. However I am not sure how I can separate out the reads aligned to viral genome only.
Samtools view is how you do that.
swbarnes2 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:26 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO