Seqanswers Leaderboard Ad

**kmcarr** · 11-08-2012, 07:26 AM

The first thing I would check would be the SampleSheet.csv file. Are there any samples with barcodes defined for lane 3?

**Jluis** · 11-08-2012, 07:31 AM

Yes, the samplesheet is correct, samples and barcodes are specified exactly the same way as they are for the other lanes. We've tried multiple "solutions" the Illumina tech-support offered us but none of them seemed to work...we are afraid we may loose the data from that lane sequences which would be a tremendous disaster for us since our whole study depneds on that unique samples...

**C.R.** · 11-09-2012, 02:47 AM

I would check the overall performance of this lane in the Illumina Sequencing Analysis Viewer. I guess that the barcode read will have some errors which results in the failure of the demultiplex. You can directly have a look at the thumbnail images in the SAV software. We also had similar problems. Maybe you can try to demultiplex without the barcode, just to see how many sequences you would pass the filter. Next one could also extract to qseq format and check the barcode sequencing data by eye just to get an idea what is going on.

**Jluis** · 11-09-2012, 03:11 AM

Thank you for your answers C.R. and kmcarr

I'll try the C.R. strategy and see what happens. I'll cross fingers and see if any data can be rescued for further processing...

**GenoMax** · 11-09-2012, 04:38 AM

You can also take a look at the "undertermined" indices file and try to parse the tags there. Since the rest of the lanes have demultiplexed fine there is probably no issue with the tag read per se.

It is quite possible that you have wrong tags (life happens) on the sample(s) (which can be easily determined by the tags that occur most frequently in the "undetermined" pile of reads).

If there are no reads in the "undetermined" file then you could have a sample failure.

**Jluis** · 11-09-2012, 04:57 AM

After demultiplexing without the barcode (following C.R. advice), we have retrieved a Fastq file. In this file we expect to have the reads of the 2 samples and also the reads of the Phi-X we used as spike in.
Now we need to sort this sequences trying to separate the two samples based on the index (taking into account the possible index errors).
Any suggestion on how to do this sorting?

**Jluis** · 11-09-2012, 05:11 AM

Thank you GenoMax, but after running the samplesheet with no index the info from the Undetermined doesn't give me any clue, although we have the spike in fastq there (as it should)...

**GenoMax** · 11-09-2012, 05:16 AM

Originally posted by Jluis View Post

After demultiplexing without the barcode (following C.R. advice), we have retrieved a Fastq file. In this file we expect to have the reads of the 2 samples and also the reads of the Phi-X we used as spike in.
Now we need to sort this sequences trying to separate the two samples based on the index (taking into account the possible index errors).
Any suggestion on how to do this sorting?

If you have access to a unix machine then use the code example in post #2 (kmcarr) in this thread (http://seqanswers.com/forums/showthread.php?t=21598) to first check the indexes.

In fact you may want to modify that code slightly to look at the entire file like so

Code:

zgrep ^@HWI your_file_name.fastq.gz | cut -d":" -f10 | sort | uniq -c

This example assumes your read names start with @HWI. You will need to replace @HWI with the right string from your machine name.

**GenoMax** · 11-09-2012, 05:19 AM

Originally posted by Jluis View Post

Thank you GenoMax, but after running the samplesheet with no index the info from the Undetermined doesn't give me any clue, although we have the spike in fastq there (as it should)...

If you run an analysis with no index info then there will be no "undetermined" file since no demultiplexing occurs for that lane.

If you still have the files from the old analysis (where you had provided indexes for lane 3) you can look in there.

**Jluis** · 11-13-2012, 01:34 AM

Dear GenoMax,

I have no files from the old analysis, and after trying your code on the "undemultiplexed" Lane3 fastq file, it seems no index info is present in the reads, since the scripts results do not yield any index such as shown in the example of the post you recommended me, eg:

I tried it on the first 29 million reads...and it yielded no index result:

Code:

grep ^@HSCAN Lane3.fastq | head -29000000 | cut -d":" -f10 | sort | uniq -c

29000000

Instead or giving back the index references it did in the mentioned post example, eg:

1 CGATAT
2 CGATGA
1 CGATGG
987 CGATGT
2 CGCTGT
1 CGGTGT
6 TGATGT

Now I've been asked to separate the Lane3 fastq into samples, but since I don't have any index information, I believe there's not much I can do to achieve that goal...am I right or is there some alternative I didn't figure out to do this?

Thank you again, and please forgive me for bothering you with such weird questions

**GenoMax** · 11-13-2012, 05:24 AM

Something is fishy here. Let me re-cap (correct me if I am wrong):

The only set of files you currently have were analyzed by providing no index info for lane 3?

As I said in post #10 if no index info was provided for lane 3 then there should be no file produced for "undetermined" reads for lane 3 at all. All sequences should end up in "sampleID_no_index.fastq.gz" file (or "lane3_no_index_fastq.gz", if you provided no sample ID info for lane 3 in samplesheet. I am probably getting the name wrong since I have not run any generic files of late but you get the gist).

Can you post a few reads (3 -4 would be enough) from the files (do you have more than one for lane 3) so we can check if your ID's are very different than what the code example from kmcarr expects. Perhaps HiScan files are different than the regular sequencer files.

How about posting the sizes of the files for lane 3?

**Jluis** · 11-13-2012, 05:43 AM

Hello again,

@GenoMax, I'll answer your questions as accurately as I can

-The only set of files you currently have were analyzed by providing no index info for lane 3?

Yes, te only set of files for lane 3 was retrieved providin no index.

-As I said in post #10 if no index info was provided for lane 3 then there should be no file produced for "undetermined" reads for lane 3 at all. All sequences should end up in "sampleID_no_index.fastq.gz" file (or "lane3_no_index_fastq.gz", if you provided no sample ID info for lane 3 in samplesheet).

Right again, undetermined reads should go to "lane3_Undetermined_L003_R1_001.fastq.gz"

-Can you post a few reads (3 -4 would be enough) from the files (do you have more than one for lane 3) so we can check if your ID's are very different than what the code example from kmcarr expects. Perhaps HiScan files are different than the regular sequencer files.

I only got 1 file (P60_L003_R1.fastq.gz) where the PhiX "spike in", and the 2 samples were included.

It's size is ~=10Gb instead of the typical ~=3Gb /per sample obtained from the other lanes.

Here you are the first 5 reads from the file:

HTML Code:

[QUOTE]@HSCAN:308:D1F2YACXX:3:1101:1170:2048 1:N:0:
CGNAAAGTGTATTTGAGCGTGTTTTTGGTGGTGGGTATGTTTTTTTTTTC
+
BB#4ADFBFDHHHJJJJJJGHIIJJJJJ?GH?FHIDHHIHIIIJJJHFDC
@HSCAN:308:D1F2YACXX:3:1101:1239:2069 1:N:0:
GTTATTACAGGTTGTTAAGGAGAGCGAGTGCGAGCGCGAGATCGCGTAAG
+
CBCFFFFFHHHFHIHIIJJJJIIJIIJJEGHHGGIIIIJJIGIHHF@CCE
@HSCAN:308:D1F2YACXX:3:1101:1118:2075 1:N:0:
TAGTTATATTATTTTTGGGTATATATTTAAAATATATTTTATTATGTTAT
+
CCCFFFFFHHHHHJJJJJJFHHIJJJJJIJJJHIIJJJJJJJJJJJIIIJ
@HSCAN:308:D1F2YACXX:3:1101:1118:2113 1:N:0:
GGCACAAGGAGAGCCTGCGCAGGAATCTGTGCGTCTCAGTCGGGCGGGCC
+
@?@DFFFF?FFHHJHGIJJJGGGIJHHHGGHIGFGHIGDDDAHGIHDBDD
@HSCAN:308:D1F2YACXX:3:1101:1144:2118 1:N:0:
TAGGAAGTAAAGGTTAGTGTGATTTCGTATTTAGAAGTTGGTGATTTTTT
+
BCCFFFFDHHHHHFHIJGIHHIIIJGHFHJJJIIJIJGHIJCGHIIIIJJ[/QUOTE]

I think these are the answers you were asking for, if I forgot something, please tell me and I'll try to answer as swift as possible.

Thanks again

**GenoMax** · 11-13-2012, 06:19 AM

By providing no index info this data is treated as "non-multiplex". In this instance you are not going to be able to de-multiplex the data since there is no "tag" info (e.g. 1:N:0:TAGINFO).

You should try and rerun casava (sounds like you have access to a standalone install?) for lane 3 with index info and see if the "undetermined" tags file gets populated. What was the exact problem when you could not "demultiplex" this data the first time around?

Disclaimer: I am not familiar with HiScan instrument so the following speculation could be simply incorrect.

If HiScan allows multiplexing (or non-multiplexing) to be specified on a per lane basis (and if this lane was run as a non-multiplexed sample by mistake) then you are out of luck. This sample will need to be re-run as multiplex.

**Jluis** · 11-13-2012, 06:53 AM

Hello again,

I'll try to answer your questions again:

-What was the exact problem when you could not "demultiplex" this data the first time around?

We don't know, it just didn't work, maybe it is a issue while reading the index tags...we modified the samplesheet in every single way Illumina techsupport asked us to, but nothing worked (although the thumbnail images do not seem to be worse than those images from the same cycles on the other lanes...)

-If HiScan allows multiplexing (or non-multiplexing) to be specified on a per lane basis (and if this lane was run as a non-multiplexed sample by mistake) then you are out of luck. This sample will need to be re-run as multiplex.

We run all the lanes as multiplexed and so we did with this lane...but for some reason no demultiplexing was achieved.

-You should try and rerun casava (sounds like you have access to a standalone install?) for lane 3 with index info and see if the "undetermined" tags file gets populated.

That's our last hope, so that this file shed some light on this index failure issue and we can try to sort sequences out based on some kind of TAGINFO

I'll keep you updated on how this whole thing ends.

Thank you again

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Today, 08:47 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Illumina HiScanSQ not decoding 1 Lane

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News