Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Use of Bowtie to remove PhiX from Illumina data

    I have Illumina data that was spiked with PhiX sequences as an internal control, and I'd like to make sure those sequences are removed from my data before I try assembling it. I thought of using Bowtie and saving the unmapped reads, but I'm not sure Bowtie would produce the right output or that I have correctly formatted input files.

    I have trimmed fastq files for input, unpaired end reads since the pairs are not interleaved and have been disrupted by trimming and discarding too-short sequences. They aren't in tab-delimited format- is there a handy tool to convert them? I am not much good at coding.

    I have the PhiX genome in Fasta format and plan to use Bowtie-build to make an index that can be used with Bowtie.

    If I use the --un option, I can save reads that did not align, which are the ones I want to keep- they will be in tab-delimited format, and I will need to re-convert them back to fastq format? (I plan to use them with Trinity, which uses fastq as its input format).

    Thanks for any help you can provide,

    Liz

  • #2
    Hi Liz,

    We have used Bowtie for this very purpose too; it works very well and normally finishes for a lane of GAIIx in around 5 mins.

    By default, Bowtie uses FastQ files as input, and will write unmapped reads (specified by -un) out in the same format they were read in. So if you have FastQ format and need it further downstream then you don't need to perform any conversion of intermediate files.

    Comment


    • #3
      Hi fkrueger,

      Can you give me an example of the command line you would use if you had unpaired reads (like I do) in fastq format? There are a few examples in the manual, but I'm still not certain what the command line should look like.

      Thanks for your help,

      Liz

      Comment


      • #4
        A straight forward option would be:

        Code:
        bowtie -t -S --un not_phiX_file_1.fastq /data/public/Genomes/Coliphage_phiX174/phiX174/PhiX input_1.fastq PhiX_alignments_1.sam
        bowtie -t -S --un not_phiX_file_2.fastq /data/public/Genomes/Coliphage_phiX174/phiX174/PhiX input_2.fastq PhiX_alignments_2.sam
        If you have more cores then you can use -p [THREADS]. Also you need to ensure Bowtie is using the correct quality encoding, by default it is Phred33 (Sanger) scale.

        Comment


        • #5
          One more question: does it matter if some reads are unpaired (that is, I have removed poor quality or too-short sequences, so not all reads are paired)? I think if this is a problem I need to specify that the input file contains unpaired reads, and I am not sure how to do this.

          Comment


          • #6
            If you wanted to align the trimmed files are paired-ends then, yes, it will be a problem if you didn't preserve the order.

            One way to do it would be to run your trimming on both paired-end files individually while keeping all sequences (even if they are 0 bp long). In a next step you could go through both files at the same time and remove entire read pairs if one partner read becomes too short. We have recently put a tool (trim_galore) to do this on our website which you are welcome to try out (it is currently contained in the RRBS Guide but will soon be a download of its own).

            Your could then do the PhiX alignment step as paired-end alignments which should also leave the order of the sequences untouched.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin


              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
              Yesterday, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            37 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            41 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            35 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            54 views
            0 likes
            Last Post seqadmin  
            Working...
            X