Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • PE SOLiD reads alignment by bwa

    Dear users,
    I have PE reads from SOLiD to align to human genome.
    I have these files:

    - solid_data_F3.csfasta
    - solid_data_F3_QV.qual
    - solid_data_F5-P2.csfasta
    - solid_data_F5-P2_QV.qual

    I want to convert in fastq these files by using bwa0.5.7/solid2fastq.pl
    This script runs only for F3 but with F5-P2 the program doesn't run. (it says Fail to open solid_data_F5-P2_F3.csfasta)

    So, if I use:
    > solid2fastq.pl solid_data_ solid_data_total
    I generate only one file fastq for F3 and F5-P2. It includes all the paired-end?

    This fastq is in colorspace but the colors are represented as ACTG.
    So to index the genome and to perform bwa alignment, have I to use -c option?

    Thanks a lot,
    ME

  • #2
    Originally posted by m_elena_bioinfo View Post
    Dear users,
    I have PE reads from SOLiD to align to human genome.
    I have these files:

    - solid_data_F3.csfasta
    - solid_data_F3_QV.qual
    - solid_data_F5-P2.csfasta
    - solid_data_F5-P2_QV.qual

    I want to convert in fastq these files by using bwa0.5.7/solid2fastq.pl
    This script runs only for F3 but with F5-P2 the program doesn't run. (it says Fail to open solid_data_F5-P2_F3.csfasta)

    So, if I use:
    > solid2fastq.pl solid_data_ solid_data_total
    I generate only one file fastq for F3 and F5-P2. It includes all the paired-end?

    This fastq is in colorspace but the colors are represented as ACTG.
    So to index the genome and to perform bwa alignment, have I to use -c option?

    Thanks a lot,
    ME
    It looks like the script doesn't support the paired end protocol. Bug the BWA mailing list ([email protected]) or the author (username:lh3).

    Comment


    • #3
      If you want to use the script with the PE data make this change in the script:

      98 #if (/^>(\d+)_(\d+)_(\d+)_[FR]3/) {
      99 if (/^>(\d+)_(\d+)_(\d+)_[F3|R3|F5-P2]/) {

      And also rename the F5-P2 to R3:

      solid_data_F5-P2.csfasta -> solid_data_R3.csfasta
      solid_data_F5-P2_QV.qual -> solid_data_R3_QV.qual

      Also, bfast has a solid2fastq (in the git repo) that supports now bwa output and
      handles PE data. You can use that too.
      -drd

      Comment


      • #4
        Thanx very much for your help Drio!
        I'll try and let you know if the program run!

        Comment


        • #5
          Originally posted by m_elena_bioinfo View Post
          Dear users,
          I have PE reads from SOLiD to align to human genome.
          I have these files:

          - solid_data_F3.csfasta
          - solid_data_F3_QV.qual
          - solid_data_F5-P2.csfasta
          - solid_data_F5-P2_QV.qual

          I want to convert in fastq these files by using bwa0.5.7/solid2fastq.pl
          This script runs only for F3 but with F5-P2 the program doesn't run. (it says Fail to open solid_data_F5-P2_F3.csfasta)

          So, if I use:
          > solid2fastq.pl solid_data_ solid_data_total
          I generate only one file fastq for F3 and F5-P2. It includes all the paired-end?

          This fastq is in colorspace but the colors are represented as ACTG.
          So to index the genome and to perform bwa alignment, have I to use -c option?

          Thanks a lot,
          ME
          You will loose a lot of information by converting the color space files to fasta, you would be better off aligning the solid reads to a color space reference

          John

          Comment


          • #6
            There is information lost because of the dinucleotide 'color' encoding but the alignments are performed in CS (http://seqanswers.com/forums/showthread.php?t=5245). BWA will do a good job aligning those reads.
            -drd

            Comment


            • #7
              Originally posted by drio View Post
              There is information lost because of the dinucleotide 'color' encoding but the alignments are performed in CS (http://seqanswers.com/forums/showthread.php?t=5245). BWA will do a good job aligning those reads.
              We utilize a modified BWA in our NextGENe software which adds a couple of additional steps to the BWA alignment, creating a much more robust alignment, addtionally, we utilize a fully annotated color space reference so no information is lost, if you would like to try, we can supply a trial.
              John

              Comment


              • #8
                Cool, any plans to integrate that into the main bwa repo?
                -drd

                Comment


                • #9
                  Thanks! Elena and drio

                  This was useful. i am trying to run the solid pe barcoded analysis.
                  I have submitted it to run just now.
                  I hope this works.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  24 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  25 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  21 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  52 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X