Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Assuming you have the images saved...

    You probably should have just removed the first cycle from image analysis and basecalling by using the command-line arguments. There isn't really a need to change file locations and folder names.

    Comment


    • #17
      Originally posted by sramshey View Post
      Hello-

      I have a question regarding the use of the script illumina2srf. We recently had a HiSeq run in which the first cycle did not contain any data (clogged fluidics?). Illumina technical support advised us that we could improve the overall quality of our data for the lane in question by removing the first cycle. This involved removing the data folder in <run folder>/Data/Intensities/<lane>/C1.1, renaming the folders for all of the subsequent cycles, editing the config.xml in the Intensities folder to reflect the changes, and then repeating the entire procedure for the control lane as well. Following these steps we were able to generate fastq files, but when we attempt to run illumina2srf to generate our srf files we encounter an error indicating that cycle 1 is missing from our renumbered tiles:

      /house/sdm/prod/illumina/staging/hiseq05/110224_HISEQ05_0066_B816YKABXX_1606/Data/Intensities/Bustard1.8.0_25-04-2011_sdm/../../../Config/FlowCellId.xml:
      No such file or directory
      Processing sequence files
      /house/sdm/prod/illumina/staging/hiseq05/110224_HISEQ05_0066_B816YKABXX_1606/Data/Intensities/Bustard1.8.0_25-04-2011_sdm/s_3_1_0001_qseq.txt
      /house/sdm/prod/illumina/staging/hiseq05/110224_HISEQ05_0066_B816YKABXX_1606/Data/Intensities/Bustard1.8.0_25-04-2011_sdm/s_3_2_0001_qseq.txt
      Error: Missing cycle 1 for lane 3 tile 1 from CIF files.

      I don't know how illumina2srf knows about cycles - perhaps they are encoded in the cif files? Is there a way that we can (easily) fool illumina2srf and force it to process the lane in a similar way to how we generated our fastqs?

      Thanks in advance!
      Yes, illumina2srf reads the cycle number from the .cif files, so it can't be fooled simply by changing the directory structure. You could try using the following perl script to fix them. You give it a list of .cif files to mangle on the command line.

      Code:
      #!/usr/bin/perl
      
      use strict;
      use warnings;
      
      foreach my $file (@ARGV) {
          # Open .cif file read-write
          open(my $f, '+<', $file) || die "Couldn't open $file for update: $!\n";
          my $data;
          # Read header
          read($f, $data, 13) || die "Couldn't read $file: $!\n";
          # Subtract 1 from cycle number
          substr($data, 5, 2) = pack('v', unpack('v', substr($data, 5, 2)) - 1);
          # Write header back out
          seek($f, 0, 0) || die "Couldn't rewind $file: $!\n";
          print $f $data || die "Couldn't write to $file: $!\n";
          close($f) || die "Error writing to $file: $!\n";
      }
      An example of what it does:

      Code:
      $ hexdump -C -n 16 s_1_43.cif
      00000000  43 49 46 01 02 19 00 01  00 81 3d 05 00 2a 00 dc  |CIF.......=..*..|
      $ ./cif_fix.pl s_1_43.cif
      $ hexdump -C -n 16 s_1_43.cif
      00000000  43 49 46 01 02 18 00 01  00 81 3d 05 00 2a 00 dc  |CIF.......=..*..|
      Note that this updates the .cif files in place, so I would strongly recommend backing them up before attempting to run it. Also, there's no guarantee that illumina2srf will work even after doing this. It would depend on whether it finds any other inconsistencies in the data.

      If you can live without the intensity data then an easier solution would be to not use the -b or -r options. Illumina2srf will then ignore the .cif files and will generate a considerably smaller .srf file.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        Yesterday, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      57 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      53 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      45 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      55 views
      0 likes
      Last Post seqadmin  
      Working...
      X