Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Illumina HiScanSQ not decoding 1 Lane

    Dear all,

    We have experienced a weird issue with our last sequencing run in a Illumina HiScanSQ with a flowcell where samples were multiplexed.
    Every lane of the flowcell unless Lane3 have been decoded and converted to FASTQ. We have the bcl files and thumbnail images are present in our system, but surprisingly CASAVA1.8.2 (configureBclToFastq.pl) can't perform the demultiplexing step for that particular lane, so that we can't obtain the correspondent FASTQ files from the two libraries sequenced in that lane.

    Has anybody experienced this issue? If you could solve it, would you please give us some advice on how to do it?

  • #2
    The first thing I would check would be the SampleSheet.csv file. Are there any samples with barcodes defined for lane 3?

    Comment


    • #3
      Yes, the samplesheet is correct, samples and barcodes are specified exactly the same way as they are for the other lanes. We've tried multiple "solutions" the Illumina tech-support offered us but none of them seemed to work...we are afraid we may loose the data from that lane sequences which would be a tremendous disaster for us since our whole study depneds on that unique samples...

      Comment


      • #4
        I would check the overall performance of this lane in the Illumina Sequencing Analysis Viewer. I guess that the barcode read will have some errors which results in the failure of the demultiplex. You can directly have a look at the thumbnail images in the SAV software. We also had similar problems. Maybe you can try to demultiplex without the barcode, just to see how many sequences you would pass the filter. Next one could also extract to qseq format and check the barcode sequencing data by eye just to get an idea what is going on.

        Comment


        • #5
          Thank you for your answers C.R. and kmcarr

          I'll try the C.R. strategy and see what happens. I'll cross fingers and see if any data can be rescued for further processing...

          Comment


          • #6
            You can also take a look at the "undertermined" indices file and try to parse the tags there. Since the rest of the lanes have demultiplexed fine there is probably no issue with the tag read per se.

            It is quite possible that you have wrong tags (life happens) on the sample(s) (which can be easily determined by the tags that occur most frequently in the "undetermined" pile of reads).

            If there are no reads in the "undetermined" file then you could have a sample failure.

            Comment


            • #7
              After demultiplexing without the barcode (following C.R. advice), we have retrieved a Fastq file. In this file we expect to have the reads of the 2 samples and also the reads of the Phi-X we used as spike in.
              Now we need to sort this sequences trying to separate the two samples based on the index (taking into account the possible index errors).
              Any suggestion on how to do this sorting?

              Comment


              • #8
                Thank you GenoMax, but after running the samplesheet with no index the info from the Undetermined doesn't give me any clue, although we have the spike in fastq there (as it should)...

                Comment


                • #9
                  Originally posted by Jluis View Post
                  After demultiplexing without the barcode (following C.R. advice), we have retrieved a Fastq file. In this file we expect to have the reads of the 2 samples and also the reads of the Phi-X we used as spike in.
                  Now we need to sort this sequences trying to separate the two samples based on the index (taking into account the possible index errors).
                  Any suggestion on how to do this sorting?
                  If you have access to a unix machine then use the code example in post #2 (kmcarr) in this thread (http://seqanswers.com/forums/showthread.php?t=21598) to first check the indexes.

                  In fact you may want to modify that code slightly to look at the entire file like so

                  Code:
                  zgrep ^@HWI your_file_name.fastq.gz | cut -d":" -f10 | sort | uniq -c
                  This example assumes your read names start with @HWI. You will need to replace @HWI with the right string from your machine name.

                  Comment


                  • #10
                    Originally posted by Jluis View Post
                    Thank you GenoMax, but after running the samplesheet with no index the info from the Undetermined doesn't give me any clue, although we have the spike in fastq there (as it should)...
                    If you run an analysis with no index info then there will be no "undetermined" file since no demultiplexing occurs for that lane.

                    If you still have the files from the old analysis (where you had provided indexes for lane 3) you can look in there.

                    Comment


                    • #11
                      Dear GenoMax,

                      I have no files from the old analysis, and after trying your code on the "undemultiplexed" Lane3 fastq file, it seems no index info is present in the reads, since the scripts results do not yield any index such as shown in the example of the post you recommended me, eg:

                      I tried it on the first 29 million reads...and it yielded no index result:

                      Code:
                      grep ^@HSCAN Lane3.fastq | head -29000000 | cut -d":" -f10 | sort | uniq -c
                      29000000
                      Instead or giving back the index references it did in the mentioned post example, eg:

                      1 CGATAT
                      2 CGATGA
                      1 CGATGG
                      987 CGATGT
                      2 CGCTGT
                      1 CGGTGT
                      6 TGATGT
                      Now I've been asked to separate the Lane3 fastq into samples, but since I don't have any index information, I believe there's not much I can do to achieve that goal...am I right or is there some alternative I didn't figure out to do this?

                      Thank you again, and please forgive me for bothering you with such weird questions

                      Comment


                      • #12
                        Something is fishy here. Let me re-cap (correct me if I am wrong):

                        The only set of files you currently have were analyzed by providing no index info for lane 3?

                        As I said in post #10 if no index info was provided for lane 3 then there should be no file produced for "undetermined" reads for lane 3 at all. All sequences should end up in "sampleID_no_index.fastq.gz" file (or "lane3_no_index_fastq.gz", if you provided no sample ID info for lane 3 in samplesheet. I am probably getting the name wrong since I have not run any generic files of late but you get the gist).

                        Can you post a few reads (3 -4 would be enough) from the files (do you have more than one for lane 3) so we can check if your ID's are very different than what the code example from kmcarr expects. Perhaps HiScan files are different than the regular sequencer files.

                        How about posting the sizes of the files for lane 3?

                        Comment


                        • #13
                          Hello again,

                          @GenoMax, I'll answer your questions as accurately as I can

                          -The only set of files you currently have were analyzed by providing no index info for lane 3?

                          Yes, te only set of files for lane 3 was retrieved providin no index.

                          -As I said in post #10 if no index info was provided for lane 3 then there should be no file produced for "undetermined" reads for lane 3 at all. All sequences should end up in "sampleID_no_index.fastq.gz" file (or "lane3_no_index_fastq.gz", if you provided no sample ID info for lane 3 in samplesheet).

                          Right again, undetermined reads should go to "lane3_Undetermined_L003_R1_001.fastq.gz"

                          -Can you post a few reads (3 -4 would be enough) from the files (do you have more than one for lane 3) so we can check if your ID's are very different than what the code example from kmcarr expects. Perhaps HiScan files are different than the regular sequencer files.

                          I only got 1 file (P60_L003_R1.fastq.gz) where the PhiX "spike in", and the 2 samples were included.

                          It's size is ~=10Gb instead of the typical ~=3Gb /per sample obtained from the other lanes.

                          Here you are the first 5 reads from the file:
                          HTML Code:
                          [QUOTE]@HSCAN:308:D1F2YACXX:3:1101:1170:2048 1:N:0:
                          CGNAAAGTGTATTTGAGCGTGTTTTTGGTGGTGGGTATGTTTTTTTTTTC
                          +
                          BB#4ADFBFDHHHJJJJJJGHIIJJJJJ?GH?FHIDHHIHIIIJJJHFDC
                          @HSCAN:308:D1F2YACXX:3:1101:1239:2069 1:N:0:
                          GTTATTACAGGTTGTTAAGGAGAGCGAGTGCGAGCGCGAGATCGCGTAAG
                          +
                          CBCFFFFFHHHFHIHIIJJJJIIJIIJJEGHHGGIIIIJJIGIHHF@CCE
                          @HSCAN:308:D1F2YACXX:3:1101:1118:2075 1:N:0:
                          TAGTTATATTATTTTTGGGTATATATTTAAAATATATTTTATTATGTTAT
                          +
                          CCCFFFFFHHHHHJJJJJJFHHIJJJJJIJJJHIIJJJJJJJJJJJIIIJ
                          @HSCAN:308:D1F2YACXX:3:1101:1118:2113 1:N:0:
                          GGCACAAGGAGAGCCTGCGCAGGAATCTGTGCGTCTCAGTCGGGCGGGCC
                          +
                          @?@DFFFF?FFHHJHGIJJJGGGIJHHHGGHIGFGHIGDDDAHGIHDBDD
                          @HSCAN:308:D1F2YACXX:3:1101:1144:2118 1:N:0:
                          TAGGAAGTAAAGGTTAGTGTGATTTCGTATTTAGAAGTTGGTGATTTTTT
                          +
                          BCCFFFFDHHHHHFHIJGIHHIIIJGHFHJJJIIJIJGHIJCGHIIIIJJ[/QUOTE]
                          I think these are the answers you were asking for, if I forgot something, please tell me and I'll try to answer as swift as possible.

                          Thanks again

                          Comment


                          • #14
                            By providing no index info this data is treated as "non-multiplex". In this instance you are not going to be able to de-multiplex the data since there is no "tag" info (e.g. 1:N:0:TAGINFO).

                            You should try and rerun casava (sounds like you have access to a standalone install?) for lane 3 with index info and see if the "undetermined" tags file gets populated. What was the exact problem when you could not "demultiplex" this data the first time around?

                            Disclaimer: I am not familiar with HiScan instrument so the following speculation could be simply incorrect.

                            If HiScan allows multiplexing (or non-multiplexing) to be specified on a per lane basis (and if this lane was run as a non-multiplexed sample by mistake) then you are out of luck. This sample will need to be re-run as multiplex.
                            Last edited by GenoMax; 11-13-2012, 06:35 AM.

                            Comment


                            • #15
                              Hello again,

                              I'll try to answer your questions again:

                              -What was the exact problem when you could not "demultiplex" this data the first time around?

                              We don't know, it just didn't work, maybe it is a issue while reading the index tags...we modified the samplesheet in every single way Illumina techsupport asked us to, but nothing worked (although the thumbnail images do not seem to be worse than those images from the same cycles on the other lanes...)

                              -If HiScan allows multiplexing (or non-multiplexing) to be specified on a per lane basis (and if this lane was run as a non-multiplexed sample by mistake) then you are out of luck. This sample will need to be re-run as multiplex.

                              We run all the lanes as multiplexed and so we did with this lane...but for some reason no demultiplexing was achieved.

                              -You should try and rerun casava (sounds like you have access to a standalone install?) for lane 3 with index info and see if the "undetermined" tags file gets populated.

                              That's our last hope, so that this file shed some light on this index failure issue and we can try to sort sequences out based on some kind of TAGINFO

                              I'll keep you updated on how this whole thing ends.

                              Thank you again

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin




                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                04-22-2024, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Today, 08:47 AM
                              0 responses
                              12 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              60 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              59 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              54 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X