Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • using bwt to align 454 reads

    Hello All,

    First of all I am very new at this NGS sequence analysis.
    I have a sff file generated by the 454 machine.
    In it I have 500000 reads
    200000 of them are <200bp
    300000 of them are >200bp (but less than 600bp)

    I want to use bwt to align the reads to a reference:

    1) Do I have to separate the short reads from the longer ones? if yes what is the fastest way to do so?
    2) What input files can I use with bwt. fasta? fastq?
    The 454 machine did not give any fastq files. I made one using the perl script that came with bwt. (how reliable is that perlscript)?


    at the moment I am issuing the following command:

    a) index the reference first
    bwa index -p ref_index -a is my_ref.fa (note: my reference is 1.5MB that is why I am using '-a is')

    b) align the reads to reference

    bwa aln -t 4 ref_index my_reads.fasta > my_reads.bwa

    c) convert to SAM

    bwa samse ref_index my_reads.bwa my_reads.fasta > my_reads.sam



    Are steps a to c right?
    I read that I should use 'bwa bwasw' to align long reads.


    Any guidance you are able to provide is greatly appreciated!!!1




    Thanks
    Noah

  • #2
    hi haonmada,

    have you read this, FAQ on http://bio-bwa.sourceforge.net/

    according to bwa manual, for 454 reads you should use bwa bwasw, and for short reads less than 200 bp sensitivity is lower.

    i am attaching

    1) bioperl script for extracting reads greater than 200 bp, just change ">" sign to "<" sign in condition it will extract reads < 200p.

    -- usage - change file.fasta to yourfile.fasta

    2) perl script for converting fasta and qual file into fastq file.
    Attached Files

    Comment


    • #3
      Thanks for the Info ketan_bnf!

      Yes I know about the "bwa bwasw" component of bwa. That is what confusing me at the moment.

      As I mentioned in the post I have a mixture of short reads and long reads. should I separate them and run them separately? or just use 'bwa bwasw' on all?

      If I run them separately how can I merge the two data sets?

      and thanks for the scripts .... FYI there is a perl script in bwa that does the same...


      Thanks
      Noah

      Comment


      • #4
        In my experience the 454 package gsMapper will give you better results - more alignments - than aligners like bwasw or Stampy.

        You could try that first if you have access to the Roche Newbler software.

        Apparently LastZ is also good, but I haven't tried it.

        Comment


        • #5
          bwasw works very well with 454 reads. If the short reads correspond to junk (e.g. primer dimers in an amplicon experiment), you won't care about the sensitivity. In any case, bwasw will align a bunch of them.

          One way to check yourself is to first run bwasw, then take all the reads that didn't align and try running them through regular bwa. I know what the FAQ says, but I'd be surprised if many more align.

          Merging two datasets is easy with SAMtools -- "samtools merge" -- but I think you need to have both datasets sorted prior (if not, you'll want to do it afterwards). Alternatively, if you just concatenate one SAM header with the body of each of the files, that will generate a new legal SAM file (SAM does not track how many reads are in the file)

          Comment


          • #6
            Thanks for the replies everyone!

            So far I have separated my sff file into shortreads(<200bp) and longreads(>200bp).

            have run the bwa short read aligner on the shortreads and the long read aligner on the longreads.

            have used samtools to sort the individual files.

            All I have to do now is create a new sam file with one header? is this correct?

            Comment


            • #7
              For 454, just use bwa-sw. You do not need to separate them.

              Comment


              • #8
                I used bwa sw on 454 reads, I got the sam file but there was something wrong as I could not convert it to bam either by samtools view -bS -t ref.fa.fai file.sam > file.bam, or by picard Sortsam.jar.

                My sam file from bwasw is as follow.
                GPYR5PO02HT2RB.f 0 Scaffold34 978186 203 1S65M1D90M * 0 0 TCACGAAAATACAAATGTAGCGACGTTCTATTCGTTTTATTGGATGCTTGGGCAATGAAATTTCCGTTAAGTTTTAATACCTGCTACTTTGCAGGGCCGAAGAAGCTTTGATGTGTAGTGACAAGGATATTGCCTATCTATTCAAAAGCGAAATAC GHG?;333370??@@GGIIIIIIIIIIHHIGBAA==>=I22==IIIIIIBB?GEEG?111111A=:--/1B9=77776FIIIIHHHIHGGGGIHEHIA177ADFFG??;;;CGEE????677BE77476???;;;>>;:;:>C8===EEE>>>?7C AS:i:148 XS:i:0 XF:i:3 XE:i:3 XN:i:0
                GPYR5PO02ICVBA.r 0 Scaffold139 669741 0 108M1S * 0 0 TTGAAAGCGTTTGCCACCCCCTTTCAACATTGTTGAAACGTGTTGAAAGGATGTTGAATCGATGTTGAAAGAGTTTAAAAGCCTTTAAACTTTGCTTCAACATCCATTA :<B===B<6/11>45<33335BBBFGGIIIIIHFBBBFFHIIHIIHHHIHHHIIICBBEEEIIHGD;555@7F76576<9E;;111555BBBBGGIIIIIIIIIIFIII AS:i:108 XS:i:108 XF:i:2 XE:i:0 XN:i:0

                Any suggestion will be very appreciated.

                Best,
                Sutada

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                25 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                29 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                25 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                52 views
                0 likes
                Last Post seqadmin  
                Working...
                X