Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bowtie & FASTQ Sanger/Illumina 1.9

    Hi,

    Our lab has obtained 80 FASTQ files that we want to analyze using Bowtie version 0.12.7. The Bowtie command we will be using is this:

    Code:
    ./bowtie -m 1 -v 2 -p 8 /bowtie-0.12.7/indexes/saccer2 -1 path/to/file_1.fastq -2 path/to/file_2.fastq --al path/to/file.out --un path/to/file.un
    We used FASTQC to determine the specific format of the FASTQ files. FASTQC reported the files as "Sanger / Illumina 1.9".

    Our question is, do the FASTQ files need to be prepared/converted before sending them to the above Bowtie command?

    Thank you!

  • #2
    Would you be willing to use bowtie2? You can specify phred33 or phred64. I haven't used the version of bowtie you are interested in so I'm not sure. Maybe just type ./bowtie to see the available options (perhaps you can specify the quality score encoding as you can with bowtie2) or look through the manual to see what it takes the default as. Of course someone else may know for that specific version.

    Comment


    • #3
      That's phred+33, which is the default anyway. So no, no changes needed.

      Comment


      • #4
        Thank you for your help. I too was able to confirm that Bowtie 0.12.7 uses phred33 by default via the manual:

        Code:
        --phred33-quals : input quals are Phred+33 (default)
        We are using Bowtie 0.12.7 because we are collaborating with another lab who used this versin of Bowtie to analyze the data.

        After executing the above Bowtie command, our lab also performed some additional analysis on the FASTQ files. I have included the results of those additional steps below. Our question is regarding the results of step 2: it appears Bowtie is not producing tab-delimited SAM files. Is this the correct behavior?


        Step 1

        Process FASTQ files (80 files / 40 pairs total) using the above Bowtie command.
        No pre-processing was performed on the files, as confirmed in my original post.

        Here is a sample of one of our FASTQ files (first 10 lines):
        @HWI-ST382:154:C26DBACXX:1:1101:2868:1990 1:N:0:GATCAG
        CANATCTTTAGGATCTGGCAAAGAATCACCGGTTAACTCTACAACTTTATCTCCGGTCACTGGTGTTATTTTCTTTCTCCAGTCATCAACCCTTTCACGT
        +
        @C#4AADDFHFHHJJJIJIIGIEGJJGHGIIJIGGIEGGHJIGHGIIIIJJJJJJFAFGHIJGEEEFFFFFFFEECECEEDDDDDEDDDDDDDDDDCDDD
        @HWI-ST382:154:C26DBACXX:1:1101:3654:1991 1:N:0:GATCAG
        GGNGTTTTGATAGGAACAATATTGTGCGATGAATTATTTTCCGGTGGAGAAGCATCGATTGAAGGTGAACGGTGTATAATCCTTTTCTCAGTCTCTTGGT
        +
        CC#4ADDFHHHHHJJJJJJJJJJJHIJJJJJJJJJIJJJJJJJIGIIIHGJJJJJJJJJJJIJHHAHHFFFFACBCBFEDDDDDDDDDDEDDDDDDDCCC
        @HWI-ST382:154:C26DBACXX:1:1101:4978:1999 1:N:0:GATCAG
        GGGGTCAATAACGACATTTAGTTTACCAGTTTTGAATTCAATGTTCAAGTCTTTCAATTTGAAATCTTGATTGTCCTTATCCCAAGAGATGGTGGAATTC

        Results:

        80 aligned files were produced. These files have the extension .out
        Here is a sample of one of the files (first 10 lines):
        @HWI-ST382:154:C26DBACXX:1:1101:2618:2220 1:N:0:GATCAG
        GTGCAAAGCCTTGTAGACGTTGTAAACTTGTTCTGGCTTGGTGTACAAGTCTTCCTTGTCAGCGTTTTCGTTGTTAACACCATCTTCTTCACCACCGGTA
        +
        C@CFFFFFHHHHHIIIJJJIJJHIJJIJJJHIJIJJJIJGIDEFFHIJF@FIIIJJJIHIJGIGFHIJJHHHFFFEEFEEDDDDDDEDDDEDDDDDDB9B
        @HWI-ST382:154:C26DBACXX:1:1101:3357:2232 1:N:0:GATCAG
        GCCTTCCATTGCCTCCTTTTTTTCTCTTCCAGAACTCTCTCAGCAGTATTCTCACAAACACATCTCCTCCCCTTTAGCAAACCTCCATTTATATCCTGCG
        +
        @@@DDDDDHHDHHGDHBGFGIAE;F@HGEHHI>BGEHBGEHIEIHF?BBGCAAHEGIIIDHIGIGGEHCHC<B9BCCCA>CABCCACC;CBDDDEEECC<
        @HWI-ST382:154:C26DBACXX:1:1101:4793:2223 1:N:0:GATCAG
        CTGCTTATTCAGATAAAAATTTTATTATTTCATCGACAGTTCCTGTTTTATCCCATGCTGATTTACTTTTCCATATTATTTTTGCAACGGACGAATGAAC

        80 unaligned files were produced. These files have the extension .un
        Here is a sample of one of the files (first 10 lines):
        @HWI-ST382:154:C26DBACXX:1:1101:7079:1989 1:Y:0:GATCAG
        TCNACCGTGAGTAGAGTCGTACTTGAACATGTAAGCGGAGTAGTCGTTAGAGATGAAAGGATCGTTCAAAGCAACAACTTCGCCGTTCTTTCTTTGCAAA
        +
        ;<#2<552@@;@<4)..<5(..@@@89;3<3@####################################################################
        @HWI-ST382:154:C26DBACXX:1:1101:4978:1999 1:N:0:GATCAG
        GGGGTCAATAACGACATTTAGTTTACCAGTTTTGAATTCAATGTTCAAGTCTTTCAATTTGAAATCTTGATTGTCCTTATCCCAAGAGATGGTGGAATTC
        +
        @@CF=DDFHAHHGEHGGIJGGIGIIGGGGHIIJI7@FIIEGIIFGIGAHIHHIIJJIIIJJIIEHGIIHGIGGJGGHIHHEEFHFFDE@CEDCCBBCCCD
        @HWI-ST382:154:C26DBACXX:1:1101:3654:1991 1:N:0:GATCAG
        GGNGTTTTGATAGGAACAATATTGTGCGATGAATTATTTTCCGGTGGAGAAGCATCGATTGAAGGTGAACGGTGTATAATCCTTTTCTCAGTCTCTTGGT



        Step 2

        Align the unaligned reads to the spikin library:

        Code:
        ./bowtie -m 1 -v 2 -p 8 -S /path/to/bowtie-0.12.7/indexes/NIST_Spikeins -1 /path/to/file_1.un -2 /path/to/file_2.un --al /path/to/file_spike.sam
        Results:

        80 files were produced rangining in size from 2.5 MB to 105 MB.
        Here is a sample of one of the files (first 10 lines) named file_spike_1.sam:
        @HWI-ST382:154:C26DBACXX:1:1101:17367:2210 1:N:0:GATCAG
        CTGAGAAATACCAAATTGCCCACAGCCCCCATGCAGTAAGCGCCTAGGCCGAGCGCACCGAGGCTGACCAGTCCGGTAAAAACATCCCCTAGGATAACCC
        +
        CCCFFFFFHHHHHIJJJJJJJJJJJJJJJJJJJJJJIJJIJJJJJJJJJJJJIHFFDDDDDDDDDDDDDDDDCDDDBBDDDDDDDDDDDCDDDDDDDDDD
        @HWI-ST382:154:C26DBACXX:1:1101:5976:2730 1:N:0:GATCAG
        GACCCGCAGGACAGGTGAATCTGCTGGGACATGTAGACCGCTGATGGGCTGTGGATAGCCTTCCGCGATGATTACGCCTGAGTAGAGTGGACAGGGCGTT
        +
        C@@FFFFFHHGHHJJCGIIJIJJJJJJJJJJJJGHJJJJIHEGJJJIIJJJHEEHHHFFFFFDEDDDDDDDEFED@DDDDDD>ACDDACDCDDDDDDDBB
        @HWI-ST382:154:C26DBACXX:1:1101:14266:2987 1:N:0:GATCAG
        CACAAAACTTAACTACATCTTCAACAGTTTTTGGATTTAATGCCAGTCCAAGCTCTCTTCCACATTCGTAAATAACTCCATGAGCCCCTCTTCCTAAATA

        Here is a sample of one of the files (first 10 lines) named file_spike_2.sam:
        @HWI-ST382:154:C26DBACXX:1:1101:17367:2210 2:N:0:GATCAG
        CTAAAGACTATGNNAACCAGGTGTCCCAGTCGATCAGACGACGAAGTCGGGAAGGAAGCATGGATACCAAAAAGGCTTTATATACTGGGTTATCCTAGGG
        +
        CCCFFFFFHHHH##2>EHIJJEGHJJJJJJJJIJJJJIJJJJJIJJGIIJHEDFFCEDCDDDDCDEDDDDDDDDDDDDDDDCDDEEEDDCDDDDDECDDD
        @HWI-ST382:154:C26DBACXX:1:1101:5976:2730 2:N:0:GATCAG
        GTCTCATCGAACTCCTTTCCCGTTCATGCAGATACTTCAACTGTGACTAGTGGGGTTCGGGAGCACCCGCACTACTTCATTCTTGGCGGTGGGCCACTTT
        +
        CCCFFFFFHHHHHJJJJEIJIJHGIJJJIIIIJIJJJIJJJIJGIGIJJIGGHIIAGIIGJIHHFFFFDDDDDDDDDDDEEEEACDDDB9BDD@?BBCDC
        @HWI-ST382:154:C26DBACXX:1:1101:14266:2987 2:N:0:GATCAG
        TTATGGCATTAAAATTCACCATTGAAGAGTTATCAAATCAAAAAAGAGATACATTAGGAAGAAATATTGACGTAACTGTTTTTAGATTAATAAGATTTAT


        Question

        Step 2 appears to have produced SAM files that are not tab-delimited. Is this the correct behavior ?

        Comment


        • #5
          Reread the manual, bowtie is doing what you're telling it to do. The --al flag says to write reads that have at least one alignment to a file. That file will be in fastq format, since that's the format of a read. If you want the alignments as well, you would normally pass the -S flag and then either specify a file name as the last input parameter, or just pipe everything to samtools to produce a BAM files (it saves space).

          BTW, you should really quality trim your reads.

          Comment


          • #6
            Thank you. So I should use this Bowtie command instead, correct? (removed "--al") :

            Code:
            ./bowtie -m 1 -v 2 -p 8 -S /path/to/bowtie-0.12.7/indexes/NIST_Spikeins -1 /path/to/file_1.un -2 /path/to/file_2.un /path/to/file_spike.sam
            If so, we should expect to see 40 SAM files created, since we are starting with 80 files / 40 pairs, correct ?

            For quality trimming, do you recommend tools such as LUCY2 or ngs_backbone? Should quality trimming occur prior to Bowtie ?

            Comment


            • #7
              That should work. One thing to check on one file, purely because I haven't used the original version of bowtie in forever, is that the reads in file_1.un and file_2.un are in sync. I know with bowtie2, they won't always be (depending on the original settings used), but I honestly don't remember if that's the case for bowtie1 (the documentation vaguely hints at this issue not existing in bowtie1).

              So yeah, as long as you have 40 different iterations of "file_spike.sam", you should get the results you want.

              For quality trimming, I mostly recommend trim_galore or trimmomatic, both of which are quite flexible. Yes, you would use those on each of the pairs of files (processing them as pairs is important) prior to alignment. The impetus behind this is primarily to remove low-quality bases (and also overly short sequences, though this is a byproduct) that might increase false-positive alignments.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Recent Innovations in Spatial Biology
                by seqadmin


                Spatial biology is an exciting field that encompasses a wide range of techniques and technologies aimed at mapping the organization and interactions of various biomolecules in their native environments. As this area of research progresses, new tools and methodologies are being introduced, accompanied by efforts to establish benchmarking standards and drive technological innovation.

                3D Genomics
                While spatial biology often involves studying proteins and RNAs in their...
                01-01-2025, 07:30 PM
              • seqadmin
                Advancing Precision Medicine for Rare Diseases in Children
                by seqadmin




                Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                12-16-2024, 07:57 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 01-09-2025, 04:04 PM
              0 responses
              443 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 01-09-2025, 09:42 AM
              0 responses
              444 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 01-08-2025, 03:17 PM
              0 responses
              459 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 01-03-2025, 11:18 AM
              1 response
              50 views
              1 like
              Last Post Tonia
              by Tonia
               
              Working...
              X