Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bowtie command line for Illumina Hiseq 2000 with Illumina 1.5+ quality encoding files

    I am trying to align some paired end exome reads with bowti, but despite using the bow tie -q" switch to recognize fast file, I keep getting the error:

    ./bowtie -q -n -r -S -t -p 2 -I 0 -X 100 -ff --solexa1.3-quals indexes/hg19 -1 ~/Otogenetics_final_exome-original-110707_I327_FCB0AMFABXX_L3_index11_1.fastq.gz -2 ~/Otogenetics_final_exome-original-110707_I327_FCB0AMFABXX_L3_index11_2.fastq.gz bowtie.exome.sam
    Time loading reference: 00:00:10
    Time loading forward index: 00:00:18
    Time loading mirror index: 00:00:14
    Error: reads file does not look like a FASTA file
    Seeded quality full-index search: 00:00:00
    Time searching: 00:00:42
    Overall time: 00:00:42

    Same error when the files are unzipped. The fastq data look like:


    @FCB0AMFABXX:3:1101:1497:2086#GGCTAAAT/1
    TTACTTGCAAAGGAAAGACAATTTTGCATTACATGGGAACCAGCACATTTTCTGAATACACAGTTGTGGCTGGTATCTCTGTTGCTAAAA
    +
    gggggggdggdggdggggggggggggdggggggggggeggggggggegggegggeggggggeggggdefga`BBBBBBBBBBBBBBBBBB
    @FCB0AMFABXX:3:1101:1378:2107#GGCTACAT/1
    AATGCAGGAAGTGCTTCAAGGGAAATTTCTCACATTAAAAGCATATATTAGAACATTCAGGAGATCTAAAAGTCAATGCTCTAAGTTTCC
    +
    gggggggggggdggggggggggdggggggggdggggggggggggegdggdggfgegggefgcgeeegdacegbggfggfagggdgbfceT

    Any ideas why bowtie will not accept the fastq format from Illumina Hiseq 2000 output?
    Last edited by rworthi; 09-28-2011, 07:49 AM.
    Professor of Pharmaceutical Sciences at Southern Illinois University. Subject of the Personal Genomes Project with CGI genome sequencing, funded by a university seed grant. Annotated human and mouse genomes at the Washington University (St. Louis) GSC during Bob Waterston's tenure. Worked in nematode gene discovery at Divergence, Inc. Recent NSF grant in yeast recombinant genetics. Teach pharmacogenomics, human genomics and pharmaceutical biotechnology.

  • #2
    There seem to be quite a few things going wrong here. From the error message "Error: reads file does not look like a FASTA file" you can see that Bowtie expects Fasta sequences, and not FastQ. My guess is that the switch "-ff" does that. Did you mean to use "--ff"?

    A few other things:
    -q is default already, no need to specify
    -n expects a value between 0 and 3, default is 2
    -r and --solexa1.3-quals are conflicting. Your reads are not in Raw format, so simply use --solexa1.3-quals
    -I 0 is default so not needed
    -X 100 seems to be a very low value, what was the read length in your experiment?
    - Did you really mean to use the --ff option?
    From the Bowtie manual:
    --ff The upstream/downstream mate orientations for a valid paired-end alignment against the forward reference strand. E.g., if --fr is specified and there is a candidate paired-end alignment where mate1 appears upstream of the reverse complement of mate2 and the insert length constraints are met, that alignment is valid. Also, if mate2 appears upstream of the reverse complement of mate1 and all other constraints are met, that too is valid. --rf likewise requires that an upstream mate1 be reverse-complemented and a downstream mate2 be forward-oriented. --ff requires both an upstream mate1 and a downstream mate2 to be forward-oriented. Default: --fr when -C (colorspace alignment) is not specified, --ff when -C is specified.

    -Bowtie does not read input in .gz format (at least not yet)
    And as a final remark, if you want to align mRNA reads you might want to use an aligner which can cope with exon-spanning reads, such as TopHat.

    Hope this helps.

    Comment


    • #3
      thanks very much for all that great advice! The read length is 90 bp. I made your modifications and shortened the read file names (and unzipped them) and now get this error, which I am sure is progress:


      pgx:bowtie-0.12.7 rworthi$ ./bowtie -n 2 -S -t -p 2 -X 100 --solexa1.3-quals indexes/hg19 -1 ~/oto-L3_1.fq -2 ~/oto-L3_2.fq bowtie.exome.sam
      Time loading reference: 00:00:09
      Time loading forward index: 00:00:14
      Time loading mirror index: 00:00:14
      Warning: Exhausted best-first chunk memory for read FCB0AMFABXX:3:1101:16474:2240#GGCTACAT/1 (patid 400); skipping read
      ... continuing indefinitely for each line

      Do I need a memory switch in the line?
      Professor of Pharmaceutical Sciences at Southern Illinois University. Subject of the Personal Genomes Project with CGI genome sequencing, funded by a university seed grant. Annotated human and mouse genomes at the Washington University (St. Louis) GSC during Bob Waterston's tenure. Worked in nematode gene discovery at Divergence, Inc. Recent NSF grant in yeast recombinant genetics. Teach pharmacogenomics, human genomics and pharmaceutical biotechnology.

      Comment


      • #4
        This is much better aleady, this time Bowtie doesn't die but merely produces warnings. You can avoid these by giving it more memory for each alignment with the --chunkmbs option. Try --chunkmbs 512 to start with, this normally fixes everything (consult the Bowtie manual for more info on this parameter).

        Other than that:
        -n 2 is not needed as it is the default
        -X 100: I still think this is likely not what you meant.

        From the Bowtie Manual:
        -X/--maxins <int>
        The maximum insert size for valid paired-end alignments. E.g. if -X 100 is specified and a paired-end alignment consists of two 20-bp alignments in the proper orientation with a 60-bp gap between them, that alignment is considered valid (as long as -I is also satisfied). A 61-bp gap would not be valid in that case. If trimming options -3 or -5 are also used, the -X constraint is applied with respect to the untrimmed mates, not the trimmed mates. Default: 250.

        If you specify -X 100 at a read length of 90bp then you will only get reads which overlap over nearly the entire length. Remember length(read1)+lenght(insert)+length(read2) need to be smaller than X in order to yield a valid alignment. So in your case something like 400-600 would seem more realistic.

        Good luck (and still think about using a splice mapper instead)!

        Comment


        • #5
          excellent! bow tie is purring away building the sam file. Thanks so much!

          What splice mapper program(s) would you recommend, and is there anything I can use to recalibrate quality scores on these Illumina reads (these are exome capture reads from gDNA, not RNA-seq)? Stampy does not seem to work so far.
          Last edited by rworthi; 09-28-2011, 11:28 AM.
          Professor of Pharmaceutical Sciences at Southern Illinois University. Subject of the Personal Genomes Project with CGI genome sequencing, funded by a university seed grant. Annotated human and mouse genomes at the Washington University (St. Louis) GSC during Bob Waterston's tenure. Worked in nematode gene discovery at Divergence, Inc. Recent NSF grant in yeast recombinant genetics. Teach pharmacogenomics, human genomics and pharmaceutical biotechnology.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:37 PM
          0 responses
          10 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 06:07 PM
          0 responses
          10 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          51 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          67 views
          0 likes
          Last Post seqadmin  
          Working...
          X