SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Complete Genomics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Threshold quality score to determine the quality read of ILLUMINA reads problem edge Illumina/Solexa 35 11-02-2015 11:31 AM
NGS reads quality andreitudor Bioinformatics 6 04-18-2011 01:46 PM
Threshold quality score to determine the quality read of ILLUMINA reads problem edge General 1 09-13-2010 03:22 PM
Considering Quality scores of reads when aligning thinkRNA Bioinformatics 2 06-01-2010 08:40 AM
Assess the quality of solexa reads AnamikaDarwin Bioinformatics 4 12-15-2008 01:35 PM

Reply
 
Thread Tools
Old 04-15-2012, 02:36 AM   #1
fuad193
Member
 
Location: N/A

Join Date: Feb 2012
Posts: 17
Question Quality of the reads

Hi

I have recently received first whole genome data from Complete Genomic. The file format is 1.5. I have read the documentation and I realized that the reads quality format is in ASCI-33 coding. How to convert this quality format to the standard fastq quality so to be used in BWA or Bowtie2 ?

This is an example of how quality at the TSV file looks like

93::499'2888521408):;%;*:7*81+3090.577774.6259;'82*=<;%48,7435%77;;&%-

Thanks
fuad193 is offline   Reply With Quote
Old 04-15-2012, 09:32 AM   #2
thedamian
Member
 
Location: Barcelona

Join Date: Feb 2012
Posts: 49
Default

I think BWA support both format convention. Here http://bio-bwa.sourceforge.net/bwa.shtml is written: "-I The input is in the Illumina 1.3+ read format (quality equals ASCII-64).". If not, you don't put "I"
thedamian is offline   Reply With Quote
Old 04-15-2012, 09:53 AM   #3
fuad193
Member
 
Location: N/A

Join Date: Feb 2012
Posts: 17
Default

Quote:
Originally Posted by thedamian View Post
I think BWA support both format convention. Here http://bio-bwa.sourceforge.net/bwa.shtml is written: "-I The input is in the Illumina 1.3+ read format (quality equals ASCII-64).". If not, you don't put "I"
ok but I have already the ASCII-33 quality so why to use -I ?? I don't know what exact quality I have ? I already worked with FASTQ quality and it is completely different from the one I have here.

This is how the quality should look like: this is just an example

SXXX<NDUETSUBTMW]#\Z
fuad193 is offline   Reply With Quote
Old 04-15-2012, 10:01 AM   #4
thedamian
Member
 
Location: Barcelona

Join Date: Feb 2012
Posts: 49
Default

Try http://www.bioinformatics.babraham.a...ojects/fastqc/
thedamian is offline   Reply With Quote
Old 04-16-2012, 03:21 AM   #5
gtyrelle
Member
 
Location: the Netherlands

Join Date: Feb 2011
Posts: 16
Default

Could you explain to us why you want to re-align reads that have already been aligned ? If you do re-align CG reads with BWA or Bowtie2 you will likely produce worse alignments as these aligners are not aware of the CG read structure (sub-reads).
__________________
Bioinformatics Applications, Europe
Lifetech Inc. http://www.lifetech.com/
gtyrelle is offline   Reply With Quote
Old 04-16-2012, 03:26 AM   #6
fuad193
Member
 
Location: N/A

Join Date: Feb 2012
Posts: 17
Default

Quote:
Originally Posted by gtyrelle View Post
Could you explain to us why you want to re-align reads that have already been aligned ? If you do re-align CG reads with BWA or Bowtie2 you will likely produce worse alignments as these aligners are not aware of the CG read structure (sub-reads).
ok I realized later that I can simply convert CG reads/mapping data to SAM without going to fastq, so you are right.

I need basically to call SNPs and indels from SAM and annotate variations using last dbsnp and 1000g versions
fuad193 is offline   Reply With Quote
Old 04-16-2012, 03:37 AM   #7
gtyrelle
Member
 
Location: the Netherlands

Join Date: Feb 2011
Posts: 16
Default

If you have CG data, then SNPs and indels have already been called. Why do you need to do it again ? Put simply, going down this path will result in poor SNP and indel calls. In fact it is unlikely that you will get past the alignement stage with non-CG aware tools.

Basically the CG read structure makes the data incompatible with most third-party tools for alignement and SNP calling.
__________________
Bioinformatics Applications, Europe
Lifetech Inc. http://www.lifetech.com/
gtyrelle is offline   Reply With Quote
Old 04-16-2012, 03:43 AM   #8
fuad193
Member
 
Location: N/A

Join Date: Feb 2012
Posts: 17
Default

Quote:
Originally Posted by gtyrelle View Post
If you have CG data, then SNPs and indels have already been called. Why do you need to do it again ? Put simply, going down this path will result in poor SNP and indel calls. In fact it is unlikely that you will get past the alignement stage with non-CG aware tools.

Basically the CG read structure makes the data incompatible with most third-party tools for alignement and SNP calling.
So why they have map2sam tool at cgatools package ? you can convert your CG data to SAM and sort the results using samtools according to (cgatools-methods.pdf)?

If it isn't going to work, so what you suggest to do to annotate our CG variants with last dbsnp and 1000g versions ?
fuad193 is offline   Reply With Quote
Old 04-16-2012, 04:07 AM   #9
gtyrelle
Member
 
Location: the Netherlands

Join Date: Feb 2011
Posts: 16
Default

Quote:
Originally Posted by fuad193 View Post
So why they have map2sam tool at cgatools package ?
Why indeed.

So for clarity I work for CG, in the App Sci team. Also, if you are already a CG customer you can get direct help by contacting customer support.

If you want to re-annotate the provided small variant calls, you could try snpEff, convert your masterVar to VCF and then use that as input. The VCF conversion tool is on the CG community website. There are numerous options for annotation annovar, SeattleSeq etc.
__________________
Bioinformatics Applications, Europe
Lifetech Inc. http://www.lifetech.com/
gtyrelle is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:54 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO