Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • kylle345
    Member
    • Apr 2009
    • 10

    File format

    Hi,

    I have a sequencing file that looks like this:

    4_1_932_784 GGACAGTTTTTTCCAATTATGGAACGCCTGTTCCTG
    4_1_829_103 GTCACTATCTCAGTCAAAATTTAAGAAAATTGACAT
    4_1_450_206 GTGCTATATCCCTATATAACCTACCCATCCACCTTT
    4_1_495_275 GTTGTGGGAAATTGGAGCGATAAGCGTGCTTCTTCC

    It is different from the standard fastq format. Does anyone know what format this is called?

  • ECO
    --Site Admin--
    • Oct 2007
    • 1360

    #2
    Check this thread:

    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

    Comment

    • kylle345
      Member
      • Apr 2009
      • 10

      #3
      Hey thanks for the reply but..

      But I only have the .seq file and not the .prb file.

      Does anyone know how to only handle the .seq file?

      thanks

      Comment

      • ECO
        --Site Admin--
        • Oct 2007
        • 1360

        #4
        What do you want to do with the .seq file? Convert to fasta? fastq?

        Comment

        • kylle345
          Member
          • Apr 2009
          • 10

          #5
          Hi sorry,

          I want to convert to it a fastq file.

          Comment

          • nilshomer
            Nils Homer
            • Nov 2008
            • 1283

            #6
            Originally posted by kylle345 View Post
            Hi sorry,

            I want to convert to it a fastq file.
            The *seq.txt files from my observations do not have qualities so you will have to make dummy quality values. For single end data, you could do something like:

            Code:
            awk '{printf("@%d:%d:%d:%d\n%s\n+\n", $1, $2, $3, $4, $5); 
            for(i=0;i<length($5);i++) { printf("I"); }; 
            printf("\n")}' 
            s_1_0001_seq.txt
            For paired end data, they concatenate the two reads so it is a little more complicated using awk but the above should get you started.

            Comment

            • kylle345
              Member
              • Apr 2009
              • 10

              #7
              so that will help me create a .prb file?

              Hey thanks for the quick replies. So having a .seq file is not enough to make a fastq file so that awk line helps me create a .prb file from .seq?

              then the combination of .seq and .prb can create a fastq?

              thanks

              Comment

              • nilshomer
                Nils Homer
                • Nov 2008
                • 1283

                #8
                Originally posted by kylle345 View Post
                Hey thanks for the quick replies. So having a .seq file is not enough to make a fastq file so that awk line helps me create a .prb file from .seq?

                then the combination of .seq and .prb can create a fastq?

                thanks
                The .seq file does not store qualities, so the qualities will have not have any meaning. The above awk command will output in FASTQ format so you do not need to worry about .seq and .prb files.

                If you have .qseq files (or .seq and .prb which you seem to be missing), then you can make a meaningful fastq file.

                Comment

                • kylle345
                  Member
                  • Apr 2009
                  • 10

                  #9
                  Hi I tried the awk line but it does not place the sequences in the new file.

                  I tried awk '{printf("@%d:%d:%d:%d\n%s\n+\n", $1, $2, $3, $4, $5); for(i=0 file1.txt > file2.txt

                  the output file only contains

                  @1:0:0:0

                  +

                  @1:0:0:0

                  +

                  @1:0:0:0


                  Its missing the sequence in between the lines.... is there something missing?

                  Comment

                  • nilshomer
                    Nils Homer
                    • Nov 2008
                    • 1283

                    #10
                    Originally posted by kylle345 View Post
                    I tried awk '{printf("@%d:%d:%d:%d\n%s\n+\n", $1, $2, $3, $4, $5); for(i=0 file1.txt > file2.txt

                    the output file only contains

                    @1:0:0:0

                    +

                    @1:0:0:0

                    +

                    @1:0:0:0


                    Its missing the sequence in between the lines.... is there something missing?
                    I must admit I am an author of the alignment program BFAST (free for academic use), which does have a "qseq2fastq.pl" perl script. It may be easier to rely on such a script.

                    Comment

                    • kylle345
                      Member
                      • Apr 2009
                      • 10

                      #11
                      thanks

                      Hey,

                      I will check it out

                      Kyle

                      Comment

                      Latest Articles

                      Collapse

                      • SEQadmin2
                        Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                        by SEQadmin2


                        I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                        Here are nine questions we think about, in roughly the order they matter, before...
                        06-18-2026, 07:11 AM
                      • SEQadmin2
                        From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                        by SEQadmin2


                        Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                        The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                        ...
                        06-02-2026, 10:05 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by SEQadmin2, 06-17-2026, 06:09 AM
                      0 responses
                      31 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-09-2026, 11:58 AM
                      0 responses
                      96 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-05-2026, 10:09 AM
                      0 responses
                      117 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-04-2026, 08:59 AM
                      0 responses
                      109 views
                      0 reactions
                      Last Post SEQadmin2  
                      Working...