SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > SOLiD



Similar Threads
Thread Thread Starter Forum Replies Last Post
SOLiD 5500xl EEC basespace vs colourspace mapping NestorNotabilis Bioinformatics 18 10-16-2013 08:27 AM
Conversion of SAM to ELAND format ayushraman Bioinformatics 0 09-13-2011 12:05 AM
query on file format conversion Alaguraj SOLiD 3 09-24-2010 03:58 AM
Bowtie format conversion - CisGenome tec Bioinformatics 0 06-08-2010 02:12 AM
SCARF Format Conversion RockChalkJayhawk Illumina/Solexa 3 02-08-2010 08:27 AM

Reply
 
Thread Tools
Old 05-06-2010, 10:37 PM   #1
kasutubh
Member
 
Location: US

Join Date: Mar 2010
Posts: 25
Default Conversion of colourspace into basespace format.

Hello Everyone,

Sorry if this is a re-post..but is there any way to convert SOLiD .bam files data into basespace format. We are trying to use IMAGE algorithm (http://genomebiology.com/2010/11/4/R41) which needs the files to be in the fastq format.

Any help is hugely appreciated!

Thanks in advance,

Kaustubh Gokhale.

Last edited by kasutubh; 05-07-2010 at 12:37 AM.
kasutubh is offline   Reply With Quote
Old 05-07-2010, 01:33 AM   #2
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by kasutubh View Post
Hello Everyone,

Sorry if this is a re-post..but is there any way to convert SOLiD .bam files data into basespace format. We are trying to use IMAGE algorithm (http://genomebiology.com/2010/11/4/R41) which needs the files to be in the fastq format.

Any help is hugely appreciated!

Thanks in advance,

Kaustubh Gokhale.
What program did you use to generate the BAM file? The SEQ/QUAL fields should be in basespace, with the original colors/color-qualities optionally in the CS/CQ tags.
nilshomer is offline   Reply With Quote
Old 05-07-2010, 01:39 AM   #3
kasutubh
Member
 
Location: US

Join Date: Mar 2010
Posts: 25
Default

These files were sent to me by the ABI guys. We had asked them to align the sequences to a reference. As a output they have sent these files. What is the raw data format of SOLiD? I need files in the fastq format.
kasutubh is offline   Reply With Quote
Old 05-07-2010, 09:40 AM   #4
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

The 'raw data' format from SOLiD is the color-space reads that look like FastA files. But often the core center people will do more processing in order to map the reads to the reference, do SNP calls, transcriptomes, etc. All of these subsequent steps will generate different types of files -- FastA-like, GFF, SAM, etc. No FastQ though.
westerman is offline   Reply With Quote
Old 07-01-2010, 06:16 AM   #5
ambarrio
Junior Member
 
Location: Uppsala

Join Date: Oct 2009
Posts: 1
Default

I think you could use this software (if you don't have the time to develop yours) because it seems it does the task you are looking for. But I haven't found the place to download though (and I am interested as well). I think you have to email them personally maybe.

http://genome.sph.umich.edu/wiki/Bam2FastQ
ambarrio is offline   Reply With Quote
Old 07-02-2010, 05:11 AM   #6
Brugger
Member
 
Location: Cambridge, UK

Join Date: Mar 2010
Posts: 21
Default

Firstly you have make sure that the unmapped reads exist in you bam file, if they do they will be in colour space as written above.

Secondly: doing a colourspace --> basespace transformation, will push the colour space technology considerably. I recently did a raw transformation from colourspace to basespace of 10000 reads. I know that these reads maps to the reference genome using colourspace. I then tried to redo the alignment using blat and only about 30% gave considerable hits.

Thirdly: as the qualities will be pr colour, and not pr base you have to transform the colour qualities into base qualities, this is important especially you clip low quality off. As I understand to get a base QV you should add the two colour QV surrounding a base.
The data was from a solid3 run, so if your run was done using solid4 you should get better results.

I am sure that there are other things you should consider before spending considerable time on this.

I know that curtain is using a similar approach for gapclosing, and that works, as I understand, in colourspace with out any great hacks.
Brugger is offline   Reply With Quote
Old 07-02-2010, 05:48 AM   #7
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

The abstract of IMAGE said that "a practical approach that uses *Illumina* sequences". No, it does not work with SOLiD, unless they update the software after the publication. The base sequence is derived after the alignment. But for unmapped reads, you do not have base sequences.
lh3 is offline   Reply With Quote
Old 07-09-2010, 01:50 PM   #8
snetmcom
Senior Member
 
Location: USA

Join Date: Oct 2008
Posts: 157
Default

I am 95% sure this person's BAM file is in basespace already. None of the AB tools output BAM until after mapping./

i have never converted BAM to fastq, but i imagine there is something in samtools.
snetmcom is offline   Reply With Quote
Old 07-10-2010, 04:47 AM   #9
Brugger
Member
 
Location: Cambridge, UK

Join Date: Mar 2010
Posts: 21
Default

The whole idea of image is that it is using read pairs where only one read is mapped facing towards a gap. The other ends are then assembled and joined, if possible, with the end of the contig that the ends map to. As this read is unmapped it will only exist in colourspace and *not* basespace.

As Heng correctly points out the abstract states that IMAGE is for *Illumina* reads, that mean that it will not work with unmapped colourspace reads. Spending time getting IMAGE to do this task is like using pliers to remove a screw. What you really want is a tool (screwdriver) developed to the task at hand.

As I wrote above curtain should support colourspace assembly as it can use velvet that I know for certain supports colourspace assemblies.
Brugger is offline   Reply With Quote
Old 11-03-2010, 05:07 AM   #10
epigen
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 101
Default BAM to fastq color space

Quote:
Originally Posted by ambarrio View Post
I think you could use this software (if you don't have the time to develop yours) because it seems it does the task you are looking for. But I haven't found the place to download though (and I am interested as well). I think you have to email them personally maybe.

http://genome.sph.umich.edu/wiki/Bam2FastQ
Have you found out by now? I need a tool that makes a fastq file (BFAST style) of the original color space sequence and quality scores from a BAM file (CS and CQ tags). It's not clear if the Bam2FastQ in the link does that or just uses the nucleotide space sequences and their qualities.
epigen is offline   Reply With Quote
Old 11-03-2010, 07:01 AM   #11
drio
Senior Member
 
Location: 4117'49"N / 24'42"E

Join Date: Oct 2008
Posts: 323
Default

What about this.

Code:
$ samtools view my.bam | ./bam2fastq.rb > my.fastq
__________________
-drd
drio is offline   Reply With Quote
Old 11-03-2010, 08:06 AM   #12
Brugger
Member
 
Location: Cambridge, UK

Join Date: Mar 2010
Posts: 21
Default

Code:
samtools view bfast.bam  | perl -pe 's/^(\w+?)\t.*\tCS:Z:(.*?)\t.*CQ:Z:(.*?)(\t.*|\z)/>$1\n$2\n$3/'
Should do it I think...
Brugger is offline   Reply With Quote
Old 11-04-2010, 08:19 AM   #13
epigen
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 101
Default

Quote:
Originally Posted by Brugger View Post
Code:
samtools view bfast.bam  | perl -pe 's/^(\w+?)\t.*\tCS:Z:(.*?)\t.*CQ:Z:(.*?)(\t.*|\z)/>$1\n$2\n$3/'
Should do it I think...
Almost perfect - it's just missing the "+" line, which I added here in case someone is interested:
Code:
samtools view bfast.bam | perl -pe 's/^(\w+?)\t.*\tCS:Z:(.*?)\t.*CQ:Z:(.*?)(\t.*|\z)/>$1\n$2\n+\n$3/'
However, I was looking out for a tool that can reconstruct the paired end fastq. I guess I'll write one myself that operates on name sorted BAM files. Thanks for the neat Perl trick, this should help do the job.
epigen is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:01 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO