SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Can the upcoming Sandy Bridge i7 Extreme assemble a genome? ymc Bioinformatics 30 06-06-2012 06:38 AM
help. Casava 1.8 demultiplexing senpeng Illumina/Solexa 1 09-19-2011 07:40 AM
CASAVA v1.8 with indels tonio100680 Bioinformatics 3 08-19-2011 04:53 AM
Demultiplexing and CASAVA 1.7 tonio100680 Bioinformatics 14 06-16-2011 10:48 PM
Upcoming in 2009? dsturgill Events / Conferences 1 11-07-2008 01:41 AM

Reply
 
Thread Tools
Old 02-23-2011, 08:02 AM   #41
skruglyak
Member
 
Location: San Diego

Join Date: Sep 2010
Posts: 44
Default

Quote:
Originally Posted by aparna View Post
Hi Seymon,

Are you guys changing anything in demultiplex process?Pretty confused to deal with the directories created during demultiplex process.
Thx.
Yes, we are changing the demultiplex process. The demultiplexing will happen along with the bcl conversion. The directories will simply be organized by sample name. Within each sample directory, you will have the the zipped fastq files for that sample only. The sample names will come from the sample sheet that you provide. An example of the directory structure is on page 6 of the pdf that I attached to the original psot.

Thanks,
Semyon
skruglyak is offline   Reply With Quote
Old 02-23-2011, 08:34 AM   #42
aparna
Member
 
Location: USA

Join Date: Feb 2009
Posts: 15
Default

Thx.Thats refreshing.
I have another question to ask...about the qseq-mask. Right now I need to specify qseq-mask to get all 7 barcode basepairs reported in fastq files-yet If I want to do such analysis for only one/few lanes (some lanes Illumina 6 barcode nts and other lanes 7nt barcode indexed run),I got to break the pipeline to start the Gerald process manually for such lanes.
So my question is:

Can we configure the qseq-mask for specific lanes during demultiplex process itself and continue through Gerald without manually starting the process?


Thx
aparna is offline   Reply With Quote
Old 02-24-2011, 09:01 AM   #43
skruglyak
Member
 
Location: San Diego

Join Date: Sep 2010
Posts: 44
Default

Quote:
Originally Posted by aparna View Post
Thx.Thats refreshing.
I have another question to ask...about the qseq-mask. Right now I need to specify qseq-mask to get all 7 barcode basepairs reported in fastq files-yet If I want to do such analysis for only one/few lanes (some lanes Illumina 6 barcode nts and other lanes 7nt barcode indexed run),I got to break the pipeline to start the Gerald process manually for such lanes.
So my question is:

Can we configure the qseq-mask for specific lanes during demultiplex process itself and continue through Gerald without manually starting the process?


Thx

If I am understanding your question correctly, I think that your use case should already work. If you qseq mask 7 bases in all lanes, but specify the index sequences in the sample sheet as appropriate - with 6 bases in some lanes and 7 in others, then the program should do what you want. Have you already tried this?

thanks,
Semyon
skruglyak is offline   Reply With Quote
Old 02-24-2011, 10:18 AM   #44
aparna
Member
 
Location: USA

Join Date: Feb 2009
Posts: 15
Default

Sorry I forgot to mention that I am trying to perform alignment of indexed qseq files without
actually demultiplexing by supplying an empty formatted sample sheet (demultiplexing with pipeline sw is utterly confusing)and I depend on the barcode sequences reported in the fastq files for my downstream analysis.

what do you advise to perform demultiplex and Gerald (making use of make align=YES) for all lanes no matter what indexes I use.

Right now without qseq-mask I get 6 nts barcode reported in fastq read description lines.
But with qseq-mask option (Y# I7 or 8 y#), I need to perform analysis in different batches to kick off demultiplex+Gerald at once.

BTW Is Illumina planning to release more than 12 barcodes in near future?

Thx.
aparna is offline   Reply With Quote
Old 02-24-2011, 12:50 PM   #45
skruglyak
Member
 
Location: San Diego

Join Date: Sep 2010
Posts: 44
Default

Quote:
Originally Posted by aparna View Post
Sorry I forgot to mention that I am trying to perform alignment of indexed qseq files without
actually demultiplexing by supplying an empty formatted sample sheet (demultiplexing with pipeline sw is utterly confusing)and I depend on the barcode sequences reported in the fastq files for my downstream analysis.

what do you advise to perform demultiplex and Gerald (making use of make align=YES) for all lanes no matter what indexes I use.

Right now without qseq-mask I get 6 nts barcode reported in fastq read description lines.
But with qseq-mask option (Y# I7 or 8 y#), I need to perform analysis in different batches to kick off demultiplex+Gerald at once.

BTW Is Illumina planning to release more than 12 barcodes in near future?

Thx.
Well, the 1.8 version should clear up the confusing directory structure you are currently facing. I don't know how to best process the data with the existing version if you don't want to use the demultiplexing script. I would be happy to put you in touch with tech support if you would like.
As for more bar codes, this is under development, but I am not the right person to comment on release time lines.

thanks,
Semyon
skruglyak is offline   Reply With Quote
Old 03-27-2011, 09:10 AM   #46
MrRight
Junior Member
 
Location: Italy-Milan

Join Date: Mar 2011
Posts: 2
Default

Would it be possible during the bcl demultiplexing specify the error/mismatch value for the index ?
MrRight is offline   Reply With Quote
Old 03-28-2011, 03:27 AM   #47
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Default

as this already worked in 1.7, why should Illumina remove this "feature"?
sklages is offline   Reply With Quote
Old 04-21-2011, 07:52 AM   #48
visivas
Junior Member
 
Location: Boston, USA

Join Date: May 2010
Posts: 7
Default

It seems that there are lots of things that has changed with v1.8. I wish Illumina releases at least a user guide/early version of the software. We can discuss all year long and still will not get the complete picture of the new version from the release notes alone. Many centers like ours have wrappers around these software for automation.
visivas is offline   Reply With Quote
Old 05-11-2011, 12:24 AM   #49
sparks
Senior Member
 
Location: Kuala Lumpur, Malaysia

Join Date: Mar 2008
Posts: 126
Default

Hi,
V1.8 has some extra fields:
<is filtered> is Y if the read is filtered, N otherwise.
<control number> is 0 when none of the control bits are on, otherwise it is an even number.
Does anyone know what these are for?
Is is_filtered reminiscent of QSEQ quality flag and if so does 'Y' mean high or low quality?

Colin
sparks is offline   Reply With Quote
Old 05-11-2011, 11:48 AM   #50
caddymob
Member
 
Location: USA

Join Date: Apr 2009
Posts: 36
Default

@sparks -- this is just like the 0/1 (fail/pass) in the last field old qseq files. The problem, to my knowledge is that all reads are output to the fastq.qz. Y means failed QC. Seems backwards I know...

Illumina should have a flag in the configureBclToFastq.pl script to either a) exclude non-passing filter reads or b) write them into a different fastq.gz. Otherwise you have to unzip and do filtering this via your own scripting, and this is just a waste of time...

One other thing I'll say Illumina about the format is the pass/fail and barcode string in the read header are delimited by a space. Spaces are bad! Shame! Lots of aligners will discard everything after the space.
caddymob is offline   Reply With Quote
Old 05-11-2011, 02:03 PM   #51
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,542
Default

Quote:
Originally Posted by caddymob View Post
One other thing I'll say Illumina about the format is the pass/fail and barcode string in the read header are delimited by a space. Spaces are bad! Shame! Lots of aligners will discard everything after the space.
On a similar point, I'd already posted earlier on this thread that I thought removing the forward/reverse suffix (i.e. /1 or /2 at the end of the read name) and sticking this in the read description (after the space) was a bad idea.
maubp is offline   Reply With Quote
Old 05-11-2011, 02:04 PM   #52
caddymob
Member
 
Location: USA

Join Date: Apr 2009
Posts: 36
Default

Quote:
Originally Posted by maubp View Post
On a similar point, I'd already posted earlier on this thread that I thought removing the forward/reverse suffix (i.e. /1 or /2 at the end of the read name) and sticking this in the read description (after the space) was a bad idea.
I missed that, but yes, very good point!
caddymob is offline   Reply With Quote
Old 05-11-2011, 04:42 PM   #53
sparks
Senior Member
 
Location: Kuala Lumpur, Malaysia

Join Date: Mar 2008
Posts: 126
Default

Quote:
Originally Posted by caddymob View Post
@sparks -- this is just like the 0/1 (fail/pass) in the last field old qseq files. The problem, to my knowledge is that all reads are output to the fastq.qz. Y means failed QC. Seems backwards I know...

Illumina should have a flag in the configureBclToFastq.pl script to either a) exclude non-passing filter reads or b) write them into a different fastq.gz. Otherwise you have to unzip and do filtering this via your own scripting, and this is just a waste of time...

One other thing I'll say Illumina about the format is the pass/fail and barcode string in the read header are delimited by a space. Spaces are bad! Shame! Lots of aligners will discard everything after the space.
Thanks for update, I'll ad a function in novoalign to filter the failed reads.

With regard the barcode sequence it appears Illumina will have already demux'd the reads so all reads should have the same barcode. Is this correct or could we get a file with mixed index tags?

Colin
sparks is offline   Reply With Quote
Old 05-11-2011, 07:31 PM   #54
sparks
Senior Member
 
Location: Kuala Lumpur, Malaysia

Join Date: Mar 2008
Posts: 126
Default

Does anyone have a few V1.8 fastq records they could share for testing? I'd like to identify file as V1.8 from header and parse the is_filtered field. I can fake some records for testing but real records would be better.

Thanks, Colin
sparks is offline   Reply With Quote
Old 05-12-2011, 10:30 AM   #55
caddymob
Member
 
Location: USA

Join Date: Apr 2009
Posts: 36
Default

Couple test CASVA 1.8 fastqs with 400 reads for read 1 and read 2 attached, no QC filtering applied. Hope this helps!
Attached Files
File Type: gz test.R1.fastq.gz (21.4 KB, 29 views)
File Type: gz test.R2.fastq.gz (19.0 KB, 22 views)
caddymob is offline   Reply With Quote
Old 05-12-2011, 11:32 PM   #56
sparks
Senior Member
 
Location: Kuala Lumpur, Malaysia

Join Date: Mar 2008
Posts: 126
Default

Quote:
Originally Posted by caddymob View Post
Couple test CASVA 1.8 fastqs with 400 reads for read 1 and read 2 attached, no QC filtering applied. Hope this helps!
Hi Caddymob,

Thanks for that. The reads went perfectly though not many aligned against hg36, I guess they are not human.

Novoalign now recognises the 1.8 format and has options to skip, use or QC the is_filtered='Y' reads.

Cheers, Colin
sparks is offline   Reply With Quote
Old 05-13-2011, 06:50 PM   #57
caddymob
Member
 
Location: USA

Join Date: Apr 2009
Posts: 36
Default

Quote:
Originally Posted by sparks View Post

Thanks for that. The reads went perfectly though not many aligned against hg36, I guess they are not human.
Correct, they're rat RNA-seq. Glad they worked anyway
caddymob is offline   Reply With Quote
Old 06-28-2011, 07:48 AM   #58
SeqAnswerSeeker
Junior Member
 
Location: Heidelberg, Germany

Join Date: Apr 2010
Posts: 3
Default FASTQ quality score above 40

With the new CASAVA version, base quality scores now include 41 (=J in ASCII)?

@HWI-ST750:72:B0812ABXX:5:1101:5504:2021 1:N:0:
TTGCAGGGTAGGTATAAGAGTTCTTAAAGAAAAGGAAATAGGACAACAATAAGAAGATAAGAAAAATCATTTGGACTTAAATTAGTTACATTGCTAAAGTTTCTC
+
BCCFFFFFCFHHCGHJJJIJHHIJJGJJJIJJJJJJDCGIIJJJJJJJJJJJJGHIJJJJJIJJJJIIJJIHHHHHHFFFFFFFEEEEEEEEDDDDDDDDDEEDD

Just wondering, since so far in our raw read data Phred scores ranged from 0 to 40 only.
Or is there an additional meaning behind the "J" base qual, like it was used for the stretch of "B"s at end of reads?

Thanks,
Natalie
SeqAnswerSeeker is offline   Reply With Quote
Old 06-28-2011, 07:51 AM   #59
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

Quote:
Originally Posted by SeqAnswerSeeker View Post
With the new CASAVA version, base quality scores now include 41 (=J in ASCII)?

@HWI-ST750:72:B0812ABXX:5:1101:5504:2021 1:N:0:
TTGCAGGGTAGGTATAAGAGTTCTTAAAGAAAAGGAAATAGGACAACAATAAGAAGATAAGAAAAATCATTTGGACTTAAATTAGTTACATTGCTAAAGTTTCTC
+
BCCFFFFFCFHHCGHJJJIJHHIJJGJJJIJJJJJJDCGIIJJJJJJJJJJJJGHIJJJJJIJJJJIIJJIHHHHHHFFFFFFFEEEEEEEEDDDDDDDDDEEDD

Just wondering, since so far in our raw read data Phred scores ranged from 0 to 40 only.
Or is there an additional meaning behind the "J" base qual, like it was used for the stretch of "B"s at end of reads?

Thanks,
Natalie
See this: http://seqanswers.com/forums/showthread.php?t=12339
GenoMax is offline   Reply With Quote
Old 06-28-2011, 08:45 AM   #60
skruglyak
Member
 
Location: San Diego

Join Date: Sep 2010
Posts: 44
Default

Quote:
Originally Posted by SeqAnswerSeeker View Post
With the new CASAVA version, base quality scores now include 41 (=J in ASCII)?

@HWI-ST750:72:B0812ABXX:5:1101:5504:2021 1:N:0:
TTGCAGGGTAGGTATAAGAGTTCTTAAAGAAAAGGAAATAGGACAACAATAAGAAGATAAGAAAAATCATTTGGACTTAAATTAGTTACATTGCTAAAGTTTCTC
+
BCCFFFFFCFHHCGHJJJIJHHIJJGJJJIJJJJJJDCGIIJJJJJJJJJJJJGHIJJJJJIJJJJIIJJIHHHHHHFFFFFFFEEEEEEEEDDDDDDDDDEEDD

Just wondering, since so far in our raw read data Phred scores ranged from 0 to 40 only.
Or is there an additional meaning behind the "J" base qual, like it was used for the stretch of "B"s at end of reads?

Thanks,
Natalie
Hi Natalie,

there have been some improvements to the chemistry and a refinement of the quality model. As a result, we are now starting to see Q41. There is no additional meaning behind the "J".

Thanks,

Semyon
skruglyak is offline   Reply With Quote
Reply

Tags
casava, illumina, secondary analysis

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:59 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO