SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Casava1.8.2 cnicolet Illumina/Solexa 5 01-20-2012 08:25 AM
Illumina samplesheet fields nicolallias Core Facilities 2 08-24-2010 02:16 AM

Reply
 
Thread Tools
Old 04-11-2012, 05:37 AM   #1
biofreak
Member
 
Location: USA

Join Date: Jun 2011
Posts: 44
Default noIndex in SampleSheet casava1.8

I simply want to convert .bcl files to .fastQ.
I am having trouble with SampleSheet.csv for casava1.8. My samples do not have index field and I kept the "index" field of the SampleSheet empty (as suggested in the manual). But it keeps throwing an error.
"ERROR: Conflicting sample sheet definitions for lane lane 1. Sample sheet line: 5. Existing:NoIndex New: NoIndex."

Can someone please help.
biofreak is offline   Reply With Quote
Old 04-11-2012, 06:03 AM   #2
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,153
Default

Well it looks like you have two entries for lane #1; if you aren't using indexes you can't do this. But if you really want help you need to upload/paste a copy of the samplesheet.csv file so we can look at it.
kmcarr is offline   Reply With Quote
Old 04-11-2012, 06:06 AM   #3
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,800
Default

Is this a MiSeq or a HiSeq dataset?

It is perfectly ok to have mixed samples (multiplex and non-m) on a flowcell. Attach a copy (or pm me) of the samplesheet file (redact sample names if they mean something).
GenoMax is offline   Reply With Quote
Old 04-11-2012, 06:26 AM   #4
biofreak
Member
 
Location: USA

Join Date: Jun 2011
Posts: 44
Default

thanks all. Here is the SampleSheet
FCID,Lane,SampleID,SampleRef,Index,Description,Control,Recipe,Operator,SampleProject
HHPDCAZX,1,H-1,Rat,,Hu,Control,HiSeq_R_90,Rob,Hu
HHPDCAZX,1,H-2,Rat,,Hu,Control,HiSeq_R_90,Rob,Hu
HHPDCAZX,1,H-3,Rat,,Hu,Control,HiSeq_R_90,Rob,Hu
HHPDCAZX,1,H-4,Rat,,Hu,Control,HiSeq_R_90,Rob,Hu
HHPDCAZX,1,H-5,Rat,,Hu,Control,HiSeq_R_90,Rob,Hu
biofreak is offline   Reply With Quote
Old 04-11-2012, 06:30 AM   #5
biofreak
Member
 
Location: USA

Join Date: Jun 2011
Posts: 44
Default

Actually my samples do have Indexes which are not Illumina. I want to convert the files to fastq and then use barsplitter ot something.
biofreak is offline   Reply With Quote
Old 04-11-2012, 06:37 AM   #6
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,153
Default

Quote:
Originally Posted by biofreak View Post
thanks all. Here is the SampleSheet
FCID,Lane,SampleID,SampleRef,Index,Description,Control,Recipe,Operator,SampleProject
HHPDCAZX,1,H-1,Rat,,Hu,Control,HiSeq_R_90,Rob,Hu
HHPDCAZX,1,H-2,Rat,,Hu,Control,HiSeq_R_90,Rob,Hu
HHPDCAZX,1,H-3,Rat,,Hu,Control,HiSeq_R_90,Rob,Hu
HHPDCAZX,1,H-4,Rat,,Hu,Control,HiSeq_R_90,Rob,Hu
HHPDCAZX,1,H-5,Rat,,Hu,Control,HiSeq_R_90,Rob,Hu
biofreak,

If you do not list indexes to distinguish the various samples in lane 1 then CASAVA will throw and error; after all, how could CASAVA know which sample (H-1, H-2, etc.) a read was from without an index.

Since you don't want to perform demultiplexing with CASAVA what you really need it to do is output a single FASTQ file containing all of your samples mixed together. To do this your SampleSheet.csv file should include a SINGLE line for lane #1, e.g.:

Code:
HHPDCAZX,1,H-x,Rat,,Hu,Control,HiSeq_R_90,Rob,Hu
Feed this file to whatever demultiplexer you like and configure that software as needed to identify your samples.
kmcarr is offline   Reply With Quote
Old 04-11-2012, 06:42 AM   #7
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,800
Default

Quote:
Originally Posted by biofreak View Post
Actually my samples do have Indexes which are not Illumina. I want to convert the files to fastq and then use barsplitter ot something.
Use a single entry for that lane (with some name in sampleID field) in the SampleSheet file as kmcarr suggested to avoid the error you are getting. You can then parse the resulting sequence files to sort the reads based on your tags.
GenoMax is offline   Reply With Quote
Old 04-11-2012, 06:44 AM   #8
biofreak
Member
 
Location: USA

Join Date: Jun 2011
Posts: 44
Default

Oh thanks you so much. I'll make changes to my csv file for all the lanes.
I tried using casava to demultiplex the barcodes that I have (nugen) using --use-bases-mask field. It does not complain and creates fastq files. However, after the alignment, the number of reads in the SAM files are very less ( in the range of 1000 to something).. So something is wrong. DO you know if casava 1.8 can handle non Illumina indexes?
Once again thanks a lot.
biofreak is offline   Reply With Quote
Old 04-11-2012, 06:54 AM   #9
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,153
Default

Quote:
Originally Posted by biofreak View Post
Oh thanks you so much. I'll make changes to my csv file for all the lanes.
I tried using casava to demultiplex the barcodes that I have (nugen) using --use-bases-mask field. It does not complain and creates fastq files. However, after the alignment, the number of reads in the SAM files are very less ( in the range of 1000 to something).. So something is wrong. DO you know if casava 1.8 can handle non Illumina indexes?
Once again thanks a lot.
Yes, CASAVA can handle non-Illumina barcodes just fine. If you are using a dual indexing protocol then you need to use the most recent version of CASAVA (I don't recall if dual index support was added in 1.8.1 or 1.8.2 and I'm too lazy to check). You just have to make sure that the format provided with "--use-bases-mask" properly reflects the library construction. I'm not familiar with the NuGen library prep protocol so I can't give you any specifics.

Curious as to why you think the demultiplexing led to problems with alignment. What fraction of your reads were demultiplexed by CASAVA and was the distribution into pools as expected? Is there some adapter remnant resulting from the NuGen prep left over that needs to be removed (it may be possible to deal with that using the "--use-bases-mask" as well).

[ETA Dual index support was added in v1.8.2 of CASAVA]

Last edited by kmcarr; 04-11-2012 at 07:38 AM. Reason: Update version info for dual indexing.
kmcarr is offline   Reply With Quote
Old 04-11-2012, 07:02 AM   #10
biofreak
Member
 
Location: USA

Join Date: Jun 2011
Posts: 44
Default

I am sorry for my lack of knowledge but I did change the csv and it runs and does not output any fastq files.
The sample sheet now looks like.

HHPDCAZX,1,H-X,Rat,,NoIndex1,Control,HiSeq_R_90,Rob,NoIndex1
HHPDCAZX,2,N-X,Rat,,NoIndex2,Control,HiSeq_R_90,Rob,NoIndex2
HHPDCAZX,3,Y-X,Rat,,NoIndex3,Control,HiSeq_R_90,Rob,NoIndex3
HHPDCAZX,4,M-X,Rat,,NoIndex4,Control,HiSeq_R_90,Rob,NoIndex4
HHPDCAZX,5,P-X,Rat,,NoIndex5,Control,HiSeq_R_90,Rob,NoIndex5
HHPDCAZX,6,P-X,Rat,,NoIndex6,Control,HiSeq_R_90,Rob,NoIndex6

thanks a lot for helping
biofreak is offline   Reply With Quote
Old 04-11-2012, 07:14 AM   #11
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,153
Default

Quote:
Originally Posted by biofreak View Post
I am sorry for my lack of knowledge but I did change the csv and it runs and does not output any fastq files.
The sample sheet now looks like.

HHPDCAZX,1,H-X,Rat,,NoIndex1,Control,HiSeq_R_90,Rob,NoIndex1
HHPDCAZX,2,N-X,Rat,,NoIndex2,Control,HiSeq_R_90,Rob,NoIndex2
HHPDCAZX,3,Y-X,Rat,,NoIndex3,Control,HiSeq_R_90,Rob,NoIndex3
HHPDCAZX,4,M-X,Rat,,NoIndex4,Control,HiSeq_R_90,Rob,NoIndex4
HHPDCAZX,5,P-X,Rat,,NoIndex5,Control,HiSeq_R_90,Rob,NoIndex5
HHPDCAZX,6,P-X,Rat,,NoIndex6,Control,HiSeq_R_90,Rob,NoIndex6

thanks a lot for helping
Could you please provide some more information on where the procedure is failing. Does configureBclToFastq.pl run successfully with your new SampleSheet.csv? Does it create a working directory (default is a directory named Unaligned in the main run directory)? Does it fail when you try to run make in that directory?
kmcarr is offline   Reply With Quote
Old 04-11-2012, 07:24 AM   #12
biofreak
Member
 
Location: USA

Join Date: Jun 2011
Posts: 44
Default

I have specified separate output dir which shows all the 6 folders for the 6 lanes. Each one contains just the csv file for the respective sample but no zipped fastq files. My command is:

nohup ~/home/CASAVA_v1.8.0/bin/configureBclToFastq.pl --force --input-dir \
~/home/HU_02_HPDCAZX/Data/Intensities/BaseCalls \
--output-dir /home/HU_02_HPDCAZX/Data/NoIndexUnaligned --sample-sheet ~/home/HU_02_HPDCAZX/Data/Intensities/BaseCalls/SampleSheet_NoIndex.csv > nohup_first.out
biofreak is offline   Reply With Quote
Old 04-11-2012, 07:35 AM   #13
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,153
Default

Quote:
Originally Posted by biofreak View Post
I have specified separate output dir which shows all the 6 folders for the 6 lanes. Each one contains just the csv file for the respective sample but no zipped fastq files. My command is:

nohup ~/home/CASAVA_v1.8.0/bin/configureBclToFastq.pl --force --input-dir \
~/home/HU_02_HPDCAZX/Data/Intensities/BaseCalls \
--output-dir /home/HU_02_HPDCAZX/Data/NoIndexUnaligned --sample-sheet ~/home/HU_02_HPDCAZX/Data/Intensities/BaseCalls/SampleSheet_NoIndex.csv > nohup_first.out
You have only completed the first step of the BCL->FASTQ conversion, namely running configureBclToFastq.pl. This script merely checks that all of the input files are present, creates the necessary output locations and configures demultiplexing (if indexes are being used, not in your case obviously). [As an aside you do not need to run nohup for this command as it runs very quickly and you want to see the output in your terminal to see if any errors are reported.]

Once you have successfully run configureBclToFastq.pl you 'cd' to the output directory it created (/home/HU_02_HPDCAZX/Data/NoIndexUnaligned in your example) and run 'make'. This is where the BCL to FASTQ conversion (and demultiplexing if configured) happens. [This is also where you would use nohup.] So assuming you are now in the /home/HU_02_HPDCAZX/Data/NoIndexUnaligned directory

Code:
#> /usr/bin/nohup make -j n &
where 'n' is the number of threads you would like to run make with (hint, don't use more than the number of cpu cores in your computer).
kmcarr is offline   Reply With Quote
Old 04-11-2012, 08:33 AM   #14
biofreak
Member
 
Location: USA

Join Date: Jun 2011
Posts: 44
Default

oops! My bad
thanks
biofreak is offline   Reply With Quote
Old 04-16-2012, 06:30 AM   #15
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 621
Default

Quote:
Originally Posted by biofreak View Post
oops! My bad
thanks
And, did it work?

Btw, 'Control' in your samplesheet should be either 'Y' or 'N'.
sklages is offline   Reply With Quote
Old 04-17-2012, 07:45 AM   #16
biofreak
Member
 
Location: USA

Join Date: Jun 2011
Posts: 44
Default

yes. It did work.. I then split the fastq files per barcode using fastx barcode splitter. However, it still did not solve my problem of less number of reads being aligned after running tophat. Also, fastq files I obtained from fastx and casava were totally different!
biofreak is offline   Reply With Quote
Old 04-17-2012, 07:50 AM   #17
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 621
Default

Quote:
Originally Posted by biofreak View Post
.. Also, fastq files I obtained from fastx and casava were totally different!
How did you run casava? What is your input for fastx barcode splitter and how did you start it? And, what is "different"? What did you expect?

Sven
sklages is offline   Reply With Quote
Old 04-17-2012, 08:02 AM   #18
biofreak
Member
 
Location: USA

Join Date: Jun 2011
Posts: 44
Default

well my barcodes are not illumina but are nugen. I ran casava normally with the added
use-bases-mask parameter. it did not complain and generated fastq files. When I ran tophat with these files, it somehow could not align most of the reads. Final read count of SAM files was in thousands or even less in some cases.
I then generated 1 fastq files per lane through casava ignoring the barcodes. Then used barcodespliiter to split the fastq file according to the barcode.
For any sample, fastq file generated this way did not match with the one generated by casava. (in terms of number of lines as well as contents).
Also, tophat alignment does better job then the previous version. But the line counts of the SAM file are still not in millions.. I am not sure of my results at this point.
biofreak is offline   Reply With Quote
Old 04-17-2012, 08:11 AM   #19
biofreak
Member
 
Location: USA

Join Date: Jun 2011
Posts: 44
Default

I ran barsplitter as follows:
cat combined.fastq | fastx_barcode_splitter.pl --bcfile ../barcode1.txt --bol --mismatches 1 --prefix "lane1" --suffix ".fastq"

It creates separate fastq files but barcodes are retained in the file. So I removed those (first 4) first using:
fastx_trimmer -i fastqfile -o trim_fastqfile -f 5 -l 50 -Q 33

then ran tophat on the fastq files.
biofreak is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:23 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO