Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Convert illumina v1.5 fastq to sanger fastq

    Hi everybody !

    I am a very new user of new generation sequncing. I download the software BWA and SAMtools to analyse data of a illumina GA 2. I saw that BWA need .fastq format in input for the reads. I have data in qseq.txt format.
    I saw that .txt and .fastq can be the same thing but there are variants in .fastq. I read BWA needs sanger-fastq and i think i have illumina v1.5-fastq.
    Do you know if there is a software to convert illumina v1.5 fastq to sanger-fastq? Or do you know the code to do this ?
    Thanks !

  • #2
    See https://www.seqanswers.com/node/4344 for a short perl script that converts .qseq.txt to a sangr-fastq file. The quality value conversion is actually done by this line:
    Code:
    $q_line =~ tr/\x40-\xff\x00-\x3f/\x21-\xe0\x21/;
    It's fairly easy to convert this into a quick-and-dirty perl script that will do the same thing for a fastq file:

    Code:
    #!/usr/bin/perl
    
    use strict;
    use warnings;
    
    my $count = 0;
    while (<>) {
        chomp;
        if ($count++ % 4 == 3) { tr/\x40-\xff\x00-\x3f/\x21-\xe0\x21/; }
        print "$_\n";
    }
    N.B.: The script above assumes that the sequence and quality values in the fastq file are on single lines. This is not necessarily true, but you can usually get away with it for short read data. You should check the output carefully, to make sure that it is doing what you want. It should be fairly obvious if it gets out of synchronization, or if you run it on a sanger-fastq file by mistake.

    Comment


    • #3
      Originally posted by zouzou View Post
      Do you know if there is a software to convert illumina v1.5 fastq to sanger-fastq? Or do you know the code to do this ?
      Thanks !
      You can use several existing tools to do the conversion from Illumina FASTQ to Sanger FASTQ, including EMBOSS seqret, Biopython, BioPerl, BioJava, BioRuby etc.


      Note in recent pipelines Illumina FASTQ files some of the low quality scores have special meaning:
      Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc
      Last edited by maubp; 05-31-2010, 06:55 AM. Reason: adding missing last two words of my sentence.

      Comment


      • #4
        Originally posted by zouzou View Post
        Hi everybody !

        I am a very new user of new generation sequncing. I download the software BWA and SAMtools to analyse data of a illumina GA 2. I saw that BWA need .fastq format in input for the reads. I have data in qseq.txt format.
        I saw that .txt and .fastq can be the same thing but there are variants in .fastq. I read BWA needs sanger-fastq and i think i have illumina v1.5-fastq.
        Do you know if there is a software to convert illumina v1.5 fastq to sanger-fastq? Or do you know the code to do this ?
        Thanks !
        Also, bfast comes with a perl script to perform the conversion. It's under scripts (ill2fastq.pl).
        -drd

        Comment


        • #5
          Originally posted by zouzou View Post
          Hi everybody !

          Do you know if there is a software to convert illumina v1.5 fastq to sanger-fastq? Or do you know the code to do this ?
          Thanks !
          You may try to patch latest bwa version with the appropriate patch listed
          here. It is the first one. It adds a '-I' option to 'bwa aln' predicate so that one can use Illumina (pipeline 1.3+ or 1.5+) fastq and trim as they were in sanger scale. Output in the SAM file is in Sanger scale as well.

          d

          Comment


          • #6
            Hey, Galaxy has a tool called FASTQ Groomer under NGS: QC and manipulation menu.
            you can convert bw various quality format (sanger, solexa, Illumina 1.3 and above, colorspace sanger).

            I think you can also download the script directly from the website ...

            NT
            Nicolas Tremblay
            Graduate Student

            Cardiovascular Genetics - Andelfinger Lab
            CHU Ste-Justine Research Center

            Comment


            • #7
              Questions on '-I' option

              Originally posted by dawe View Post
              You may try to patch latest bwa version with the appropriate patch listed
              here. It is the first one. It adds a '-I' option to 'bwa aln' predicate so that one can use Illumina (pipeline 1.3+ or 1.5+) fastq and trim as they were in sanger scale. Output in the SAM file is in Sanger scale as well.

              d
              I have used patch to update my bwa.I followed you directions.But I don't know how to use the the "-I",and I have browsed your patch file and saw " -I Input files are in Illumina quallity scale." Meanwhile,when I type bwa aln after I used your patch file,I thought I would see the "-I" option ,but I didn't.
              So,can you give me some explanations?Supposed I will use Sanger quality 15,how to set -q INT after I used your patch.Shoud I set 15 or not?
              I really appreciate of you threads and sorry for bothering.

              bioinformatics@localhost bwa-0.5.8a]$ bwa aln

              Usage: bwa aln [options] <prefix> <in.fq>

              Options: -n NUM max #diff (int) or missing prob under 0.02 err rate (float)
              [0.04]
              -o INT maximum number or fraction of gap opens [1]
              -e INT maximum number of gap extensions, -1 for disabling long
              gaps [-1]
              -i INT do not put an indel within INT bp towards the ends [5]
              -d INT maximum occurrences for extending a long deletion [10]
              -l INT seed length [32]
              -k INT maximum differences in the seed [2]
              -m INT maximum entries in the queue [2000000]
              -t INT number of threads [1]
              -M INT mismatch penalty [3]
              -O INT gap open penalty [11]
              -E INT gap extension penalty [4]
              -R INT stop searching when there are >INT equally best hits [30]
              -q INT quality threshold for read trimming down to 35bp [0]
              -c input sequences are in the color space
              -L log-scaled gap penalty for long deletions
              -N non-iterative mode: search for all n-difference hits
              (slooow)
              -f FILE file to write output to instead of stdout

              Comment


              • #8
                It appears you haven't applied the patch (or you haven't installed the patched binary).

                d

                Comment


                • #9
                  Questions on BWA patch

                  Originally posted by dawe View Post
                  It appears you haven't applied the patch (or you haven't installed the patched binary).

                  d
                  I'm sorry I don't unstand your reply.Would you give me some explicit directions.Thanks very much!

                  I followed the directions:
                  cd bwa-source-directory
                  patch -p1 < patch.file
                  make

                  Comment


                  • #10
                    Originally posted by zeam View Post
                    I'm sorry I don't unstand your reply.Would you give me some explicit directions.Thanks very much!

                    I followed the directions:
                    cd bwa-source-directory
                    patch -p1 < patch.file
                    make
                    Could you successfully apply the patch? If yes, well, try to issue
                    Code:
                    ./bwa aln
                    and see if the -I options appear. If yes, substitute the installed binary with this, i.e.

                    Code:
                    sudo install bwa `which bwa`
                    d

                    Comment


                    • #11
                      BWA Illumina Quality Patch

                      Hi dawe,

                      I just tried to apply your SVN v50 patch to the current svn download, which lists version 50, and the patch fails.

                      Code:
                      $ patch -p1 < bwa-svn-r50_illumina-qual.patch 
                      missing header for unified diff at line 5 of patch
                      can't find file to patch at input line 5
                      Perhaps you used the wrong -p or --strip option?
                      The text leading up to this was:
                      --------------------------
                      |Index: bwape.c
                      |===================================================================
                      |--- bwape.c	(revision 50)
                      |+++ bwape.c	(working copy)
                      --------------------------
                      File to patch:
                      Steps:
                      1) svn download of current bio-bwa subversion (version 50)

                      Code:
                      svn co https://bio-bwa.svn.sourceforge.net/svnroot/bio-bwa bio-bwa
                      ....
                      bunch of stuff
                      ....
                      Checked out revision 50.
                      2) cd bio-bwa/trunk/bwa
                      3) make
                      4) copied patch to current directory
                      5) attempted to patch as noted above

                      I tried the archived bwa-0.5.8 patch and that applied perfectly

                      Any suggestions?

                      PS - thanks for this patch and the previous maq ill2sanger patch they are life savers.

                      Comment


                      • #12
                        My bad, sorry. Anyway, as suggested by 'patch' error, you should use a different strip:

                        Code:
                        $ patch -p0 < / path/to/patch
                        That should work.

                        HTH
                        D

                        Comment


                        • #13
                          Thanks, that worked perfectly

                          Comment


                          • #14
                            I am new to NGS and bioinformatics. I just got my data and am trying out Galaxy. I am trying to use Fastq Groomer to convert into fastq-sanger. I have 8GB's of data, does anyone know an estimate of how long this process should take? I don't know whether to quit and execute again, it has been running for about 3.5 hours. Am I being impatient?

                            Sorry for the novice/inexperienced question

                            Thanks
                            nsl

                            Comment


                            • #15
                              It will depend on which Galaxy installation you are using (e.g. the main http://usegalaxy.org Penn State one), and how busy it is with other people's work. If you asked on the Galaxy mailing list you'd probably get a better answer.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM
                              • seqadmin
                                The Impact of AI in Genomic Medicine
                                by seqadmin



                                Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                02-26-2024, 02:07 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-14-2024, 06:13 AM
                              0 responses
                              34 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-08-2024, 08:03 AM
                              0 responses
                              72 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-07-2024, 08:13 AM
                              0 responses
                              81 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-06-2024, 09:51 AM
                              0 responses
                              68 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X