Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Converting Solexa new format to FASTQ

    Hi,
    1. I got from Illumina sequnces in the following single-lined format:
    HWI-EAS306:1:1:16:678#0/1:GGGGCTGTAGCTCAGNTGGTCGTATGNNNNNNNNNNNNNNNNNNNNNNNN:a_\
    XNW`NKQ]X]UBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

    Any idea how to convert to fastq in order to use MAQ? (without loosing quality scores). I didn't see that the fq_all2std.pl script can handle this format.

    2. Many of the sequences are in the format <seq tag><3' adaptor><AAA...>. Therefore I think MAQ fails to remove the 3' adaptor (because it is not in the 3' end of the sequence). Any idea how to overcome this in MAQ or other progrms?

    Thanks
    Asaf

  • #2
    You have several options to convert Illumina 1.3+ FASTQ to Sanger FASTQ. All you really need to do is shift the ASCII values of the quality string as they both use PHRED scores.

    Option One - Use an updated MAQ fq_all2std.pl script, there is a patch for Illumina to Sanger, but it isn't included in MAQ yet, see e.g.



    Option Two - Use Biopython 1.51b (or later)

    Option Three - Use the latest BioPerl (not sure if this code is in a public release yet)

    Option Four - Use the latest EMBOSS seqret (but there are a couple of minor issues in version 6.1.0 to watch out for).

    Comment


    • #3
      Originally posted by asafle View Post
      Hi,
      1. I got from Illumina sequnces in the following single-lined format:
      HWI-EAS306:1:1:16:678#0/1:GGGGCTGTAGCTCAGNTGGTCGTATGNNNNNNNNNNNNNNNNNNNNNNNN:a_\
      XNW`NKQ]X]UBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

      Any idea how to convert to fastq in order to use MAQ? (without loosing quality scores). I didn't see that the fq_all2std.pl script can handle this format.
      I didn't read you message quite carefully enough. That looks like a 50bp read, a kind of FASTQ entry forced onto one line. Are there any tabs in there? What was the filename - the extension might be of interest?

      I would guess converted to an Illumina 1.3+ FASTQ file it probably looks like this:

      Code:
      @HWI-EAS306:1:1:16:678#0/1
      GGGGCTGTAGCTCAGNTGGTCGTATGNNNNNNNNNNNNNNNNNNNNNNNN
      +
      a_\XNW`NKQ]X]UBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
      Or, as a Sanger standard FASTQ file,

      Code:
      @HWI-EAS306:1:1:16:678#0/1
      GGGGCTGTAGCTCAGNTGGTCGTATGNNNNNNNNNNNNNNNNNNNNNNNN
      +
      B@=9/8A/,2>9>6####################################
      Converted to a PHRED QUAL file,

      Code:
      >HWI-EAS306:1:1:16:678#0/1
      33 31 28 24 14 23 32 14 11 17 29 24 29 21 2 2 2 2 2 2 2 2 2
      2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
      If you have some other files with this one, you can probably confirm if this interpretation is correct or not.

      Peter

      Comment


      • #4
        Originally posted by maubp View Post
        You have several options to convert Illumina 1.3+ FASTQ to Sanger FASTQ. All you really need to do is shift the ASCII values of the quality string as they both use PHRED scores.

        Option One - Use an updated MAQ fq_all2std.pl script, there is a patch for Illumina to Sanger, but it isn't included in MAQ yet, see e.g.



        Option Two - Use Biopython 1.51b (or later)

        Option Three - Use the latest BioPerl (not sure if this code is in a public release yet)

        Option Four - Use the latest EMBOSS seqret (but there are a couple of minor issues in version 6.1.0 to watch out for).
        Too add an option I'd recommend to patch maq with : this patch
        hope it's helpful

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Advancing Precision Medicine for Rare Diseases in Children
          by seqadmin




          Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
          12-16-2024, 07:57 AM
        • seqadmin
          Recent Advances in Sequencing Technologies
          by seqadmin



          Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

          Long-Read Sequencing
          Long-read sequencing has seen remarkable advancements,...
          12-02-2024, 01:49 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 12-17-2024, 10:28 AM
        0 responses
        33 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-13-2024, 08:24 AM
        0 responses
        48 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-12-2024, 07:41 AM
        0 responses
        34 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-11-2024, 07:45 AM
        0 responses
        46 views
        0 likes
        Last Post seqadmin  
        Working...
        X