SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
Do illumina fastq file have barcode and adaptor? shangzhong0619 Illumina/Solexa 15 08-26-2014 11:57 AM
How to split fastq into small fastq based on barcode? peterrjp Illumina/Solexa 6 12-30-2013 06:25 PM
Fastq conversion using CASAVA tahamasoodi Illumina/Solexa 3 10-15-2012 11:46 PM
CASAVA v1.8 (Bcl to Fastq) Kacper Illumina/Solexa 2 08-04-2011 09:08 PM
Help with FastQ/CASAVA format problems Airwalker810 Bioinformatics 4 01-12-2011 08:20 AM

Reply
 
Thread Tools
Old 08-06-2014, 06:25 AM   #1
Malfet
Junior Member
 
Location: Berlin, Germany

Join Date: Jan 2013
Posts: 7
Default How to get barcode fastq with CASAVA

Hi, all,

With the previous version of CASAVA, one first had to do conversion of bcl to qseq files and then convert qseq to fastq getting separate fastq files: one with reads (two for paired end runs) and another one for barcodes. Next step was demultiplex your data using all these fastq-files.

New version of CASAVA directly converts bcl files into fastq and perform demultiplexing in the same time. Is it possible to extract barcode fastq files? I have not found how to do it in the manual provided for bcltofactqconversion script.

The thing is that, results of my last sequencing run looks quite weird and most reads can not be assign to a specific library, so I would like to have a look on the barcodes separately and check the sequencing quality.

Will be very grateful for any suggestions.
Malfet is offline   Reply With Quote
Old 08-06-2014, 06:35 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

You can check on the barcodes (they would not have the Q-score) in the fastq headers of the "undetermined" pool files (I assume your sequences ended up mostly in there). You would want to look in the "Undetermined_indices/Sample_lane*" directories under the "Unaligned" directory. Depending on SE/PE run, there should be one or two files in there.
GenoMax is offline   Reply With Quote
Old 08-06-2014, 06:44 AM   #3
Malfet
Junior Member
 
Location: Berlin, Germany

Join Date: Jan 2013
Posts: 7
Default

Quote:
Originally Posted by GenoMax View Post
You can check on the barcodes (they would not have the Q-score) in the fastq headers of the "undetermined" pool files (I assume your sequences ended up mostly in there)
I see! That's so simple... but could you suggest any way to analyze the sequencing quality?

I can see by eye multiple mismatches, so one way would be to repeat demultiplexing procedure allowing 1-3 mismatches, but I also would like to better understand what happened and whether it was the sequencing problem or libraries or anything else.
Malfet is offline   Reply With Quote
Old 08-06-2014, 06:50 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

Number of mismatches probably would depend on your barcodes. You can experiment with "--mismatches n" (or "--mismatches n,n" for 2D barcodes) parameter for the configureBclToFastq.pl command. You can generally recover additional data by allowing for 1 mismatch.

If your sequence looks otherwise good then one surefire way of running into problems with tag reads is because of overloading of samples. Beyond what you can recover by the above method your only option would be to re-run with a lower loading conc (if you are seeing lot of N's in the tag reads).

Last edited by GenoMax; 08-06-2014 at 06:58 AM.
GenoMax is offline   Reply With Quote
Old 08-06-2014, 07:21 AM   #5
Malfet
Junior Member
 
Location: Berlin, Germany

Join Date: Jan 2013
Posts: 7
Default

Quote:
Originally Posted by GenoMax View Post
If your sequence looks otherwise good
Not really. This is paired-end run. Normally forward and reverse reads look quite similar in quality. However, this time quality per base for the forward read looks pretty different from the reverse and it makes me suspicious. I attach the graphs for both reads...
Within the barcode sequences I don't see N - I just see the wrong letter. G instead of C. C instead of A and so on. One or two mismatches within each barcode. Let's see how much I'll recovery with --mismatches 2 options on...
Attached Images
File Type: png R1.per_base_quality.png (12.0 KB, 8 views)
File Type: png R2.per_base_quality.png (11.6 KB, 4 views)

Last edited by Malfet; 08-06-2014 at 07:24 AM.
Malfet is offline   Reply With Quote
Old 08-06-2014, 07:35 AM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

Looks like something was going on with this run/lane for multiple cycles. I am surprised this data was released (unless you had asked for it to be released). Have you inquired to see if this was a lane specific problem or a more general run wide one?
GenoMax is offline   Reply With Quote
Old 08-06-2014, 07:37 AM   #7
Malfet
Junior Member
 
Location: Berlin, Germany

Join Date: Jan 2013
Posts: 7
Default

Quote:
Originally Posted by GenoMax View Post
Looks like something was going on with this run/lane for multiple cycles. I am surprised this data was released (unless you had asked for it to be released). Have you inquired to see if this was a lane specific problem or a more general run wide one?
Not yet, I'll tomorrow. Something was definitely going wrong there...
Malfet is offline   Reply With Quote
Reply

Tags
bcl2fastq, bclconveter, casava 1.8.2

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:46 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO