![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
bwa:how to align color space reads | SOLiDance | Bioinformatics | 13 | 01-08-2013 06:27 AM |
BWA align SOLiD PE data gives poor mapping & 0.5% properly paired | alig | Bioinformatics | 3 | 07-08-2011 09:44 AM |
Can BWA align reads to references that are even shorter? | asiangg | Bioinformatics | 4 | 04-06-2011 12:06 PM |
sam output from bwa for SOLiD reads in colorspace? | nisha | SOLiD | 19 | 01-07-2010 05:05 AM |
using BWA to align SOLiD fastq files from 1000 Genomes | tgenahmet | Bioinformatics | 1 | 10-15-2009 06:19 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Senior Member
Location: SEA Join Date: Nov 2009
Posts: 203
|
![]()
Hi,
I am trying to use GATK to run some analysis on BWA aligned colorspace reads but I encountered this problem described here http://getsatisfaction.com/gsa/topic...y_notification basically "org.broadinstitute.sting.utils.StingException: Unable to find color space information in SOLiD read." I am aware that BFAST generates this info. How do I make BWA generate this tag as well?
__________________
http://kevin-gattaca.blogspot.com/ |
![]() |
![]() |
![]() |
#2 | |
Nils Homer
Location: Boston, MA, USA Join Date: Nov 2008
Posts: 1,285
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#3 | |
Senior Member
Location: SEA Join Date: Nov 2009
Posts: 203
|
![]() Quote:
I think I probably will have to write a post alignment script to add this tag
__________________
http://kevin-gattaca.blogspot.com/ |
|
![]() |
![]() |
![]() |
#4 |
Nils Homer
Location: Boston, MA, USA Join Date: Nov 2008
Posts: 1,285
|
![]()
The CSFASTQs I use as input are not "double-encoded" but instead contain the origin color sequence (unadulterated) and color qualities. This allows the CS/CQ tags to filled out easily. BWA "double-encodes" the color sequence and does some trimming too, thus making it impossible to recover the CS/CQ without going back to the original data (CSFASTA and QUAL).
|
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: SEA Join Date: Nov 2009
Posts: 203
|
![]()
I see...
So far, I only have these info. so presumably I have to reverse the order of the original CS and CQ if the read is mapped in another direction? the trimming bit might indeed be a problem though even if trying to construct a query db of the original csfasta and qual files isn't computationally intensive. Color read sequence on the same strand as the reference 4 CS Z Color read quality on the same strand as the reference; encoded in the same way as <QUAL> 4 CQ Z On a raw SOLiD read, the first nucleotide is the primer base and the first color is the one between the primer base and the first nucleotide from the sample being sequenced. The primer base and the first color must be present in CS.
__________________
http://kevin-gattaca.blogspot.com/ |
![]() |
![]() |
![]() |
#6 | |
Nils Homer
Location: Boston, MA, USA Join Date: Nov 2008
Posts: 1,285
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#7 |
Senior Member
Location: SEA Join Date: Nov 2009
Posts: 203
|
![]()
Thanks Nils,
I have an example of CS CQ tags VAB_S1332068_1358_1351 131 1 227 255 25M = 1373 1171 CTAACCCCTAACCCTAACCCTAAAC !A@B?@@@CAC?@?AAC? ??B@AA! RG:Z:TG133 CS:Z:G3230100023010023010023001 CQ:Z:<<<<<<<<<<;:<;* MD:Z:25 OQ:Z:!@@@@@@@@@@@@@@@@@@@@@@@! So am I correct in saying that CS and CQ are essentially the original csfasta and qual line? or at least bfast outputs it in this way? additionally but if I were to do the same I might have problems as bwa does trimming?
__________________
http://kevin-gattaca.blogspot.com/ |
![]() |
![]() |
![]() |
#8 | |
Nils Homer
Location: Boston, MA, USA Join Date: Nov 2008
Posts: 1,285
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#9 |
Junior Member
Location: USA Join Date: May 2010
Posts: 4
|
![]()
Hi Kevin,
Did you work out a program to integrate the color space data into the SAM files for BWA alignments? If so, could you share? I am running into the same issue, and would like avoid re-implementation if possible. Thanks, Todd |
![]() |
![]() |
![]() |
#10 |
Senior Member
Location: SEA Join Date: Nov 2009
Posts: 203
|
![]()
Hi Todd,
Unfortunately, I decided to switch mapper for SOLID reads in the end. I am trying to find the fastq indexer program that I intended to use for this with scripts to post process the bam but I can only find this http://ivory.idyll.org/blog/mar-10/s...ving-sequences Hope it helps!
__________________
http://kevin-gattaca.blogspot.com/ |
![]() |
![]() |
![]() |
#11 |
Senior Member
Location: SEA Join Date: Nov 2009
Posts: 203
|
![]()
__________________
http://kevin-gattaca.blogspot.com/ |
![]() |
![]() |
![]() |
#12 |
Junior Member
Location: USA Join Date: May 2010
Posts: 4
|
![]()
Thanks Kevin! I will take a look.
BTW, did you ever try the BWA alignment from within BFAST (bfast+bwa)? That would seem to solve the problem -- assuming it includes the CS and CQ tags. I will be trying that soon. Todd |
![]() |
![]() |
![]() |
#13 |
Senior Member
Location: 41°17'49"N / 2°4'42"E Join Date: Oct 2008
Posts: 323
|
![]()
bfast+bwa only replaces the match step. Postprocess is the same as in traditional bfast. You will have those tags.
__________________
-drd |
![]() |
![]() |
![]() |
#14 |
Senior Member
Location: Graz, Austria Join Date: Feb 2010
Posts: 219
|
![]()
Just in case anyone's still interested:
I wrote a small (rather dirty) python script to integrate those tags. It is slow and relies on the fact that bwa solid2fastq, bwa aln and bwa samse step doesn't change the order of the reads. Anyway it might be of interest to someone... Code:
#! /usr/bin/python #ADD CS and CQ tags from original CSfasta and csqual file import sys #try getting file names from comand line try: SAMfile = sys.argv[1] csfastafile = sys.argv[2] qualfile = sys.argv[3] outputfile=sys.argv[4] except: print ("Usage: ./add_CSCQ.py <input SAM> <input csfasta> <input csqual> <output SAM>") sys.exit() #try open files specified in command line try: SAM = open(SAMfile) csfasta = open (csfastafile) qual = open (qualfile) output = open (outputfile, "w") except: print ("Couldn't open SAMfile") sys.exit() #reading the first lines of the three files SAMline = SAM.readline() cs = csfasta.readline() #iterate till no comment startcs=cs[0:1] while startcs=='#': cs=csfasta.readline() startcs=cs[0:1] cq = qual.readline() startcq=cq[0:1] while startcq=='#': cq=qual.readline() startcq=cq[0:1] count = 0 #iterate through all the files and add CS / CQ tags in the reads #assuming solid2fastq didn't change the order of the reads while SAMline: info=SAMline.split() start=SAMline[0:1] alignment=SAMline[:-1] if (((count % 100000) == 0) and (count != 0)): print count, "alignments processed" #print out header section if start == '@': output.write(alignment+'\n') #print alignment section and add CS and CQ tags else: #read csfasta file until no comments cs=csfasta.readline() cq=qual.readline() cs="\tCS:Z:"+cs cs=cs[:-1] #encode quals to sanger quals cq=cq[:-1] intquals=cq.split() asciiquals="" for quality in intquals: quality = int (quality) ascii=chr(quality+33) asciiquals=asciiquals+ascii cq="\tCQ:Z:"+asciiquals #write alignment to file output.write(alignment+cs+cq+'\n') cs=csfasta.readline() cq=qual.readline() SAMline = SAM.readline() count = count + 1 SAM.close() csfasta.close() qual.close() |
![]() |
![]() |
![]() |
#15 |
Senior Member
Location: SEA Join Date: Nov 2009
Posts: 203
|
![]()
Nice! Have u tried it with gatk?
__________________
http://kevin-gattaca.blogspot.com/ |
![]() |
![]() |
![]() |
#16 |
Senior Member
Location: Graz, Austria Join Date: Feb 2010
Posts: 219
|
![]()
Yes. Works nicely with GATK (at least in my case).
|
![]() |
![]() |
![]() |
#17 |
Junior Member
Location: California Join Date: Nov 2009
Posts: 3
|
![]()
Yes, still interested!
Was just about to start doing this myself but I thought I'd check this forum again...actually, would really like to use BFAST for solid but just don't have the computing resources available to me... anyway, thanks again... |
![]() |
![]() |
![]() |
Tags |
bwa, colorspace, samtools, solid, tags |
Thread Tools | |
|
|