SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
bwa:how to align color space reads SOLiDance Bioinformatics 13 01-08-2013 05:27 AM
BWA align SOLiD PE data gives poor mapping & 0.5% properly paired alig Bioinformatics 3 07-08-2011 08:44 AM
Can BWA align reads to references that are even shorter? asiangg Bioinformatics 4 04-06-2011 11:06 AM
sam output from bwa for SOLiD reads in colorspace? nisha SOLiD 19 01-07-2010 04:05 AM
using BWA to align SOLiD fastq files from 1000 Genomes tgenahmet Bioinformatics 1 10-15-2009 05:19 PM

Reply
 
Thread Tools
Old 06-24-2010, 09:45 PM   #1
KevinLam
Senior Member
 
Location: SEA

Join Date: Nov 2009
Posts: 199
Default how do I output the CS tag for BWA align of SOLID reads?

Hi,
I am trying to use GATK to run some analysis on BWA aligned colorspace reads but I encountered this problem described here
http://getsatisfaction.com/gsa/topic...y_notification

basically "org.broadinstitute.sting.utils.StingException: Unable to find color space information in SOLiD read."

I am aware that BFAST generates this info. How do I make BWA generate this tag as well?
KevinLam is offline   Reply With Quote
Old 06-25-2010, 07:58 AM   #2
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by KevinLam View Post
Hi,
I am trying to use GATK to run some analysis on BWA aligned colorspace reads but I encountered this problem described here
http://getsatisfaction.com/gsa/topic...y_notification

basically "org.broadinstitute.sting.utils.StingException: Unable to find color space information in SOLiD read."

I am aware that BFAST generates this info. How do I make BWA generate this tag as well?
It doesn't. Contact the developer if you want this feature in the future.
nilshomer is offline   Reply With Quote
Old 07-01-2010, 12:25 AM   #3
KevinLam
Senior Member
 
Location: SEA

Join Date: Nov 2009
Posts: 199
Default

Quote:
Originally Posted by nilshomer View Post
It doesn't. Contact the developer if you want this feature in the future.
Hi Nils, can you share how bfast generates the CS tag?

I think I probably will have to write a post alignment script to add this tag
KevinLam is offline   Reply With Quote
Old 07-01-2010, 07:44 AM   #4
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by KevinLam View Post
Hi Nils, can you share how bfast generates the CS tag?

I think I probably will have to write a post alignment script to add this tag
The CSFASTQs I use as input are not "double-encoded" but instead contain the origin color sequence (unadulterated) and color qualities. This allows the CS/CQ tags to filled out easily. BWA "double-encodes" the color sequence and does some trimming too, thus making it impossible to recover the CS/CQ without going back to the original data (CSFASTA and QUAL).
nilshomer is offline   Reply With Quote
Old 07-05-2010, 03:21 AM   #5
KevinLam
Senior Member
 
Location: SEA

Join Date: Nov 2009
Posts: 199
Default

I see...
So far, I only have these info. so presumably I have to reverse the order of the original CS and CQ if the read is mapped in another direction?

the trimming bit might indeed be a problem though even if trying to construct a query db of the original csfasta and qual files isn't computationally intensive.


Color read sequence on the same strand as the reference 4
CS Z
Color read quality on the same strand as the reference; encoded in the same way as <QUAL> 4
CQ Z

On a raw SOLiD read, the first nucleotide is the primer base and the first color is the one between the primer base
and the first nucleotide from the sample being sequenced. The primer base and the first color must be present in CS.
KevinLam is offline   Reply With Quote
Old 07-05-2010, 07:16 AM   #6
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by KevinLam View Post
I see...
So far, I only have these info. so presumably I have to reverse the order of the original CS and CQ if the read is mapped in another direction?

the trimming bit might indeed be a problem though even if trying to construct a query db of the original csfasta and qual files isn't computationally intensive.


Color read sequence on the same strand as the reference 4
CS Z
Color read quality on the same strand as the reference; encoded in the same way as <QUAL> 4
CQ Z

On a raw SOLiD read, the first nucleotide is the primer base and the first color is the one between the primer base
and the first nucleotide from the sample being sequenced. The primer base and the first color must be present in CS.
I don't ever match the direction of the CS/CQ tags to the reference, since the reverse (not compliment) is not symmetric. The adapter would then be the last base.
nilshomer is offline   Reply With Quote
Old 07-05-2010, 09:19 PM   #7
KevinLam
Senior Member
 
Location: SEA

Join Date: Nov 2009
Posts: 199
Default

Thanks Nils,
I have an example of CS CQ tags

VAB_S1332068_1358_1351 131 1 227 255 25M = 1373 1171 CTAACCCCTAACCCTAACCCTAAAC !A@B?@@@CAC?@?AAC?
??B@AA! RG:Z:TG133 CS:Z:G3230100023010023010023001 CQ:Z:<<<<<<<<<<;:<;* MD:Z:25 OQ:Z:!@@@@@@@@@@@@@@@@@@@@@@@!

So am I correct in saying that CS and CQ are essentially the original csfasta and qual line?
or at least bfast outputs it in this way?

additionally
but if I were to do the same I might have problems as bwa does trimming?
KevinLam is offline   Reply With Quote
Old 07-06-2010, 06:26 PM   #8
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by KevinLam View Post
Thanks Nils,
I have an example of CS CQ tags

VAB_S1332068_1358_1351 131 1 227 255 25M = 1373 1171 CTAACCCCTAACCCTAACCCTAAAC !A@B?@@@CAC?@?AAC?
??B@AA! RG:Z:TG133 CS:Z:G3230100023010023010023001 CQ:Z:<<<<<<<<<<;:<;* MD:Z:25 OQ:Z:!@@@@@@@@@@@@@@@@@@@@@@@!

So am I correct in saying that CS and CQ are essentially the original csfasta and qual line?
or at least bfast outputs it in this way?

additionally
but if I were to do the same I might have problems as bwa does trimming?
They should be the qualities from the csfasta/qual lines. I don't think it has to be matched to the trimming. When it means "original", it means unaltered "original" values IMHO.
nilshomer is offline   Reply With Quote
Old 12-12-2010, 12:55 PM   #9
Todd Scheetz
Junior Member
 
Location: USA

Join Date: May 2010
Posts: 4
Default Solution?

Hi Kevin,

Did you work out a program to integrate the color space data into the SAM files for BWA alignments? If so, could you share? I am running into the same issue, and would like avoid re-implementation if possible.

Thanks,
Todd
Todd Scheetz is offline   Reply With Quote
Old 12-12-2010, 05:21 PM   #10
KevinLam
Senior Member
 
Location: SEA

Join Date: Nov 2009
Posts: 199
Default

Hi Todd,
Unfortunately, I decided to switch mapper for SOLID reads in the end.
I am trying to find the fastq indexer program that I intended to use for this with scripts to post process the bam
but I can only find this http://ivory.idyll.org/blog/mar-10/s...ving-sequences

Hope it helps!
KevinLam is offline   Reply With Quote
Old 12-12-2010, 05:23 PM   #11
KevinLam
Senior Member
 
Location: SEA

Join Date: Nov 2009
Posts: 199
Default

Ah found it!
http://kevin-gattaca.blogspot.com/20...ads-g-sqz.html
KevinLam is offline   Reply With Quote
Old 12-13-2010, 01:57 PM   #12
Todd Scheetz
Junior Member
 
Location: USA

Join Date: May 2010
Posts: 4
Default

Thanks Kevin! I will take a look.

BTW, did you ever try the BWA alignment from within BFAST (bfast+bwa)? That would seem to solve the problem -- assuming it includes the CS and CQ tags. I will be trying that soon.

Todd
Todd Scheetz is offline   Reply With Quote
Old 12-14-2010, 04:58 AM   #13
drio
Senior Member
 
Location: 4117'49"N / 24'42"E

Join Date: Oct 2008
Posts: 323
Default

bfast+bwa only replaces the match step. Postprocess is the same as in traditional bfast. You will have those tags.
__________________
-drd
drio is offline   Reply With Quote
Old 02-21-2011, 02:36 AM   #14
ulz_peter
Senior Member
 
Location: Graz, Austria

Join Date: Feb 2010
Posts: 219
Default

Just in case anyone's still interested:
I wrote a small (rather dirty) python script to integrate those tags. It is slow and relies on the fact that bwa solid2fastq, bwa aln and bwa samse step doesn't change the order of the reads. Anyway it might be of interest to someone...


Code:
#! /usr/bin/python

#ADD CS and CQ tags from original CSfasta and csqual file

import sys

#try getting file names from comand line
try:
  SAMfile = sys.argv[1]
  csfastafile = sys.argv[2]
  qualfile = sys.argv[3]
  outputfile=sys.argv[4]
except: 
  print ("Usage: ./add_CSCQ.py <input SAM> <input csfasta> <input csqual> <output SAM>")
  sys.exit()

#try open files specified in command line
try:
   SAM = open(SAMfile)
   csfasta = open (csfastafile)
   qual = open (qualfile)
   output = open (outputfile, "w")
except:
  print ("Couldn't open SAMfile")
  sys.exit()

#reading the first lines of the three files
SAMline = SAM.readline()
cs = csfasta.readline()

#iterate till no comment
startcs=cs[0:1]
while startcs=='#':
  cs=csfasta.readline()
  startcs=cs[0:1]
  
cq = qual.readline()
startcq=cq[0:1]
while startcq=='#':
  cq=qual.readline()
  startcq=cq[0:1]

count = 0

#iterate through all the files and add CS / CQ tags in the reads
#assuming solid2fastq didn't change the order of the reads
while SAMline:
  info=SAMline.split()
  start=SAMline[0:1]
  alignment=SAMline[:-1]
  
  if (((count % 100000) == 0) and (count != 0)):
    print count, "alignments processed"

#print out header section
  if start == '@':    
    output.write(alignment+'\n')

#print alignment section and add CS and CQ tags
  else:
    #read csfasta file until no comments
    cs=csfasta.readline()
    cq=qual.readline()
    cs="\tCS:Z:"+cs
    cs=cs[:-1]
    
    #encode quals to sanger quals
    cq=cq[:-1]
    intquals=cq.split()
    asciiquals=""
    for quality in intquals:
       quality = int (quality)
       ascii=chr(quality+33)
       asciiquals=asciiquals+ascii
    cq="\tCQ:Z:"+asciiquals
    
    #write alignment to file
    output.write(alignment+cs+cq+'\n')
    cs=csfasta.readline()
    cq=qual.readline()
  SAMline = SAM.readline()
  count = count + 1
  
SAM.close()
csfasta.close()
qual.close()
ulz_peter is offline   Reply With Quote
Old 02-21-2011, 06:06 AM   #15
KevinLam
Senior Member
 
Location: SEA

Join Date: Nov 2009
Posts: 199
Default

Nice! Have u tried it with gatk?
KevinLam is offline   Reply With Quote
Old 02-21-2011, 09:19 PM   #16
ulz_peter
Senior Member
 
Location: Graz, Austria

Join Date: Feb 2010
Posts: 219
Default

Yes. Works nicely with GATK (at least in my case).
ulz_peter is offline   Reply With Quote
Old 07-23-2011, 10:06 PM   #17
aldino
Junior Member
 
Location: California

Join Date: Nov 2009
Posts: 3
Default

Yes, still interested!

Was just about to start doing this myself but I thought I'd check this forum again...actually, would really like to use BFAST for solid but just don't have the computing resources available to me...

anyway, thanks again...
aldino is offline   Reply With Quote
Reply

Tags
bwa, colorspace, samtools, solid, tags

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:24 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO