Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BFAST mapping paired end reads.

    Hi have a set of data from Illumina:
    s_7_1_sequence.txt
    s_7_2_sequence.txt

    How can I change them into BFAST.fq file.

    Here is the ill2fastq.pl comment:

    ill2fastq.pl [[ -b <bar code length> | -B ] -n <number of reads> -o <output prefix> -q -s] <input prefix>

    But dont know what -q -s stand for.



    For single mapping, do I still need to do any format change by this script?


    Thanks,

  • #2
    Originally posted by tanghz View Post
    Hi have a set of data from Illumina:
    s_7_1_sequence.txt
    s_7_2_sequence.txt

    How can I change them into BFAST.fq file.

    Here is the ill2fastq.pl comment:

    ill2fastq.pl [[ -b <bar code length> | -B ] -n <number of reads> -o <output prefix> -q -s] <input prefix>

    But dont know what -q -s stand for.



    For single mapping, do I still need to do any format change by this script?


    Thanks,
    I have added a description to the latest GIT commit. It is as follows:
    Code:
    The -q option specifies that qseq.txt files are expected, while 
    the -s option specifies that sequence.txt files are expected.
    Thank-you for finding these undocumented options.

    Comment


    • #3
      Thank you for the clarification,
      I have done it.

      Could you also clarify if I need to transform the sequence.txt file into fastq by your script?
      Can I use the sequence firectly?

      thanks
      Last edited by tanghz; 09-15-2010, 01:07 PM.

      Comment


      • #4
        Originally posted by tanghz View Post
        Thank you for the clarification,
        I have done it.

        Could you also clarify if I need to transform the sequence.txt file into fastq by your script?
        Can I use the sequence firectly?

        thanks
        You will have to convert your input files to the FASTQ format if they are not in that format already.

        Comment


        • #5
          Hi , I am using your readgenerate scripts, vert handy. However, I notice the ID of paired read is the same as the first one. e.g.
          @readNum=1_strand=+_contig=17_pos=30714265_numends=2_pel=0_rl=36_wrv=1_si=-1_il=0_r1=000000000000000000000000000000000000_r2=0000000000000000000000
          00000020020000
          GCTCTGAGTATCAGACACACCGTGGCCTCCCCAAGG
          +
          ::::::::::::::::::::::::::::::::::::
          @readNum=1_strand=+_contig=17_pos=30714265_numends=2_pel=0_rl=36_wrv=1_si=-1_il=0_r1=000000000000000000000000000000000000_r2=0000000000000000000000
          00000020020000
          GGCCAAAGGGACACCGGTTTGACAACCAACAGCGTG
          +
          ::::::::::::::::::::::::::::::::::::




          There is no reads space info. Did I do sth wrong? How do I parse the second read coordinates for later verification?
          thanks.
          Last edited by tanghz; 09-20-2010, 08:36 AM.

          Comment


          • #6
            Originally posted by tanghz View Post
            Hi , I am using your readgenerate scripts, vert handy. However, I notice the ID of paired read is the same as the first one. e.g.
            @readNum=1_strand=+_contig=17_pos=30714265_numends=2_pel=0_rl=36_wrv=1_si=-1_il=0_r1=000000000000000000000000000000000000_r2=0000000000000000000000
            00000020020000
            GCTCTGAGTATCAGACACACCGTGGCCTCCCCAAGG
            +
            ::::::::::::::::::::::::::::::::::::
            @readNum=1_strand=+_contig=17_pos=30714265_numends=2_pel=0_rl=36_wrv=1_si=-1_il=0_r1=000000000000000000000000000000000000_r2=0000000000000000000000
            00000020020000
            GGCCAAAGGGACACCGGTTTGACAACCAACAGCGTG
            +
            ::::::::::::::::::::::::::::::::::::




            There is no reads space info. Did I do sth wrong? How do I parse the second read coordinates for later verification?
            thanks.
            Feel free to dig into the code on this one as I am not supporting that read simulator very heavily; I would be happy to incorporate a patch though,. Otherwise, I would recommend the "dwgsim" tool within http://dnaa.sf.net. The latter is something I am supporting and actively maintaining.

            Comment


            • #7
              Dear nilshomer,
              thanks for your easy-to-use ill2fastq.pl script. Since I'm working on a huge dataset and need to convert from Illumina 1.3+ to fastq I used this script and it worked well the first 20GB, then I got the following error:

              C:\path-to-file>perl ill2fastq.pl -s my_sequences > C:\path-to-file\file.fastq
              ON 0
              ON 1
              Unicode character 0xffffffffffffffff is illegal at ill2fastq.pl line 383, <FH_on
              e> line 4.
              Unicode character 0xfffffffffffffffe is illegal at ill2fastq.pl line 383, <FH_on
              e> line 4.
              Unicode character 0xffffffffffffffff is illegal at ill2fastq.pl line 383, <FH_on
              e> line 4.
              Unicode character 0xffffffffffffffff is illegal at ill2fastq.pl line 383, <FH_on
              e> line 4.
              Unicode character 0xfffffffffffffffe is illegal at ill2fastq.pl line 383, <FH_tw
              o> line 4.
              Unicode character 0xffffffffffffffff is illegal at ill2fastq.pl line 383, <FH_tw
              o> line 4.
              Unicode character 0xffffffffffffffff is illegal at ill2fastq.pl line 383, <FH_tw
              o> line 4.
              Unicode character 0xfffffffffffffffe is illegal at ill2fastq.pl line 383, <FH_tw
              o> line 4.
              Unicode character 0xfffffffffffffffe is illegal at ill2fastq.pl line 383, <FH_tw
              o> line 4.
              Unicode character 0xffffffffffffffff is illegal at ill2fastq.pl line 383, <FH_tw
              o> line 4.
              Malformed UTF-8 character (byte 0xff) in reverse at ill2fastq.pl line 397, <FH_t
              wo> line 4.
              vw'ε\ê↔█P@▌╚*⌂┴╤§Φ╒E▬ª↔_påZ(*ijJ┼⌂■{x⌂√■∩┐▓╖¢█╒7ⁿw²mu╡┌ⁿ╧╒*¡■U¶y^╥OVΣY^µYYû┘«,v♣
              ╛╦±Qαë▌ƒ╟┐■≈+╔╬╣εαú≈▒∩¢╛∩█╢∩╦≥☺▲╒cU5╧mk£i█√≡τ±»*$$▀▌▐É`╘½(Q╣,+±☺OE╢╦╦▌p→.ªX«▓
              ╢"uL ♥mysequences_2_sequence.txt ┤}█VδJ¼σ√∙ì~∞╤ú !↨╓╙1▲º}6ù►áê‼▀]σ*∩**é╓¡
              £└r♣♫8¼═Z►╪→èóƪñ⌐╩⌂■w·■÷╧*∙»Φm╛ܲ┴?╫⌂fW│┼*║·┐╫*◄G¢▼⌂ⁿ╟*>'╣║√∙╟⌂ⁿτêΣ⌡jNé‼5▒╩^≡/¶
              ▲╫xy+→'‼k∞♣7Sk<╗║a╖ê'╓╪♂╓zjìg7δ♂⌐∞%Oε╔½╡∟╛⌐=┘♂.⌂ß↑πV^,û$9ÜZσA▓■àÖGu^▄,.s·╝α║₧Xπ¢
              ò↑yjì╜αë5₧├▒^]$Å£H■╣╞A¥%zF╙δ|{æL☻Æτ╫╖↨╥┘Kn&≈ìkë∙‼▼└‼╔ô█y╣\\╞╠^≡─↓{■τz=╗♦*:
              Died at ill2fastq.pl line 229.
              I tried to figure out, what happened here, but was only suggest that the problem lies in perls encoding of strings? (http://jeremy.zawodny.com/blog/archives/010546.html and http://perldoc.perl.org/perldiag.htm...ter-%28%25s%29)
              Perhaps someone has an idea or can provide a fast script to do the conversion fast and correct! Thanks a lot! Yours Jenzo

              Comment


              • #8
                ill2fastq.pl failed

                Hi,

                I am having difficulty using ill2fastq.pl. I have successfully used BFAST for alignment of all of my SOLiD data, but cannot get step 1 to work for my Illumina data. I am using bfast-0.6.4e

                This is what happens when I try to run the perl script (my two files are names 100247_1_sequence.txt and 100247_2_sequence.txt):

                Code:
                $ perl ill2fastq.pl -s 100247
                ON 0
                Malformed UTF-8 character (byte 0xff) in reverse at ill2fastq.pl line 395, <FH_two> line 4.
                @HWUSI-E@HWUSI-EAS570R_0028:6:1:1311:1079#0/2
                Died at ill2fastq.pl line 227
                .

                If you can help me out that would be great! Thanks in advance,

                Kelly

                Comment


                • #9
                  Googling "Malformed UTF-8 character" there seems to be something wrong with your encoding. What is your platform/OS?

                  Comment


                  • #10
                    Originally posted by nilshomer View Post
                    Googling "Malformed UTF-8 character" there seems to be something wrong with your encoding. What is your platform/OS?
                    I have a 64-bit linux running RedHat. I just tried it again using bfast-0.6.5a and the same thing happened.

                    Comment


                    • #11
                      Can you try on a different machine?

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM
                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      27 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      31 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      27 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-04-2024, 09:00 AM
                      0 responses
                      52 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X