Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BFAST mapping paired end reads.

    Hi have a set of data from Illumina:
    s_7_1_sequence.txt
    s_7_2_sequence.txt

    How can I change them into BFAST.fq file.

    Here is the ill2fastq.pl comment:

    ill2fastq.pl [[ -b <bar code length> | -B ] -n <number of reads> -o <output prefix> -q -s] <input prefix>

    But dont know what -q -s stand for.



    For single mapping, do I still need to do any format change by this script?


    Thanks,

  • #2
    Originally posted by tanghz View Post
    Hi have a set of data from Illumina:
    s_7_1_sequence.txt
    s_7_2_sequence.txt

    How can I change them into BFAST.fq file.

    Here is the ill2fastq.pl comment:

    ill2fastq.pl [[ -b <bar code length> | -B ] -n <number of reads> -o <output prefix> -q -s] <input prefix>

    But dont know what -q -s stand for.



    For single mapping, do I still need to do any format change by this script?


    Thanks,
    I have added a description to the latest GIT commit. It is as follows:
    Code:
    The -q option specifies that qseq.txt files are expected, while 
    the -s option specifies that sequence.txt files are expected.
    Thank-you for finding these undocumented options.

    Comment


    • #3
      Thank you for the clarification,
      I have done it.

      Could you also clarify if I need to transform the sequence.txt file into fastq by your script?
      Can I use the sequence firectly?

      thanks
      Last edited by tanghz; 09-15-2010, 01:07 PM.

      Comment


      • #4
        Originally posted by tanghz View Post
        Thank you for the clarification,
        I have done it.

        Could you also clarify if I need to transform the sequence.txt file into fastq by your script?
        Can I use the sequence firectly?

        thanks
        You will have to convert your input files to the FASTQ format if they are not in that format already.

        Comment


        • #5
          Hi , I am using your readgenerate scripts, vert handy. However, I notice the ID of paired read is the same as the first one. e.g.
          @readNum=1_strand=+_contig=17_pos=30714265_numends=2_pel=0_rl=36_wrv=1_si=-1_il=0_r1=000000000000000000000000000000000000_r2=0000000000000000000000
          00000020020000
          GCTCTGAGTATCAGACACACCGTGGCCTCCCCAAGG
          +
          ::::::::::::::::::::::::::::::::::::
          @readNum=1_strand=+_contig=17_pos=30714265_numends=2_pel=0_rl=36_wrv=1_si=-1_il=0_r1=000000000000000000000000000000000000_r2=0000000000000000000000
          00000020020000
          GGCCAAAGGGACACCGGTTTGACAACCAACAGCGTG
          +
          ::::::::::::::::::::::::::::::::::::




          There is no reads space info. Did I do sth wrong? How do I parse the second read coordinates for later verification?
          thanks.
          Last edited by tanghz; 09-20-2010, 08:36 AM.

          Comment


          • #6
            Originally posted by tanghz View Post
            Hi , I am using your readgenerate scripts, vert handy. However, I notice the ID of paired read is the same as the first one. e.g.
            @readNum=1_strand=+_contig=17_pos=30714265_numends=2_pel=0_rl=36_wrv=1_si=-1_il=0_r1=000000000000000000000000000000000000_r2=0000000000000000000000
            00000020020000
            GCTCTGAGTATCAGACACACCGTGGCCTCCCCAAGG
            +
            ::::::::::::::::::::::::::::::::::::
            @readNum=1_strand=+_contig=17_pos=30714265_numends=2_pel=0_rl=36_wrv=1_si=-1_il=0_r1=000000000000000000000000000000000000_r2=0000000000000000000000
            00000020020000
            GGCCAAAGGGACACCGGTTTGACAACCAACAGCGTG
            +
            ::::::::::::::::::::::::::::::::::::




            There is no reads space info. Did I do sth wrong? How do I parse the second read coordinates for later verification?
            thanks.
            Feel free to dig into the code on this one as I am not supporting that read simulator very heavily; I would be happy to incorporate a patch though,. Otherwise, I would recommend the "dwgsim" tool within http://dnaa.sf.net. The latter is something I am supporting and actively maintaining.

            Comment


            • #7
              Dear nilshomer,
              thanks for your easy-to-use ill2fastq.pl script. Since I'm working on a huge dataset and need to convert from Illumina 1.3+ to fastq I used this script and it worked well the first 20GB, then I got the following error:

              C:\path-to-file>perl ill2fastq.pl -s my_sequences > C:\path-to-file\file.fastq
              ON 0
              ON 1
              Unicode character 0xffffffffffffffff is illegal at ill2fastq.pl line 383, <FH_on
              e> line 4.
              Unicode character 0xfffffffffffffffe is illegal at ill2fastq.pl line 383, <FH_on
              e> line 4.
              Unicode character 0xffffffffffffffff is illegal at ill2fastq.pl line 383, <FH_on
              e> line 4.
              Unicode character 0xffffffffffffffff is illegal at ill2fastq.pl line 383, <FH_on
              e> line 4.
              Unicode character 0xfffffffffffffffe is illegal at ill2fastq.pl line 383, <FH_tw
              o> line 4.
              Unicode character 0xffffffffffffffff is illegal at ill2fastq.pl line 383, <FH_tw
              o> line 4.
              Unicode character 0xffffffffffffffff is illegal at ill2fastq.pl line 383, <FH_tw
              o> line 4.
              Unicode character 0xfffffffffffffffe is illegal at ill2fastq.pl line 383, <FH_tw
              o> line 4.
              Unicode character 0xfffffffffffffffe is illegal at ill2fastq.pl line 383, <FH_tw
              o> line 4.
              Unicode character 0xffffffffffffffff is illegal at ill2fastq.pl line 383, <FH_tw
              o> line 4.
              Malformed UTF-8 character (byte 0xff) in reverse at ill2fastq.pl line 397, <FH_t
              wo> line 4.
              vw'ε\ê↔█P@▌╚*⌂┴╤§Φ╒E▬ª↔_påZ(*ijJ┼⌂■{x⌂√■∩┐▓╖¢█╒7ⁿw²mu╡┌ⁿ╧╒*¡■U¶y^╥OVΣY^µYYû┘«,v♣
              ╛╦±Qαë▌ƒ╟┐■≈+╔╬╣εαú≈▒∩¢╛∩█╢∩╦≥☺▲╒cU5╧mk£i█√≡τ±»*$$▀▌▐É`╘½(Q╣,+±☺OE╢╦╦▌p→.ªX«▓
              ╢"uL ♥mysequences_2_sequence.txt ┤}█VδJ¼σ√∙ì~∞╤ú !↨╓╙1▲º}6ù►áê‼▀]σ*∩**é╓¡
              £└r♣♫8¼═Z►╪→èóƪñ⌐╩⌂■w·■÷╧*∙»Φm╛ܲ┴?╫⌂fW│┼*║·┐╫*◄G¢▼⌂ⁿ╟*>'╣║√∙╟⌂ⁿτêΣ⌡jNé‼5▒╩^≡/¶
              ▲╫xy+→'‼k∞♣7Sk<╗║a╖ê'╓╪♂╓zjìg7δ♂⌐∞%Oε╔½╡∟╛⌐=┘♂.⌂ß↑πV^,û$9ÜZσA▓■àÖGu^▄,.s·╝α║₧Xπ¢
              ò↑yjì╜αë5₧├▒^]$Å£H■╣╞A¥%zF╙δ|{æL☻Æτ╫╖↨╥┘Kn&≈ìkë∙‼▼└‼╔ô█y╣\\╞╠^≡─↓{■τz=╗♦*:
              Died at ill2fastq.pl line 229.
              I tried to figure out, what happened here, but was only suggest that the problem lies in perls encoding of strings? (http://jeremy.zawodny.com/blog/archives/010546.html and http://perldoc.perl.org/perldiag.htm...ter-%28%25s%29)
              Perhaps someone has an idea or can provide a fast script to do the conversion fast and correct! Thanks a lot! Yours Jenzo

              Comment


              • #8
                ill2fastq.pl failed

                Hi,

                I am having difficulty using ill2fastq.pl. I have successfully used BFAST for alignment of all of my SOLiD data, but cannot get step 1 to work for my Illumina data. I am using bfast-0.6.4e

                This is what happens when I try to run the perl script (my two files are names 100247_1_sequence.txt and 100247_2_sequence.txt):

                Code:
                $ perl ill2fastq.pl -s 100247
                ON 0
                Malformed UTF-8 character (byte 0xff) in reverse at ill2fastq.pl line 395, <FH_two> line 4.
                @HWUSI-E@HWUSI-EAS570R_0028:6:1:1311:1079#0/2
                Died at ill2fastq.pl line 227
                .

                If you can help me out that would be great! Thanks in advance,

                Kelly

                Comment


                • #9
                  Googling "Malformed UTF-8 character" there seems to be something wrong with your encoding. What is your platform/OS?

                  Comment


                  • #10
                    Originally posted by nilshomer View Post
                    Googling "Malformed UTF-8 character" there seems to be something wrong with your encoding. What is your platform/OS?
                    I have a 64-bit linux running RedHat. I just tried it again using bfast-0.6.5a and the same thing happened.

                    Comment


                    • #11
                      Can you try on a different machine?

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Essential Discoveries and Tools in Epitranscriptomics
                        by seqadmin


                        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                        Yesterday, 07:01 AM
                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      44 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      43 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      38 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-04-2024, 09:00 AM
                      0 responses
                      55 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X