Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • m_elena_bioinfo
    Member
    • Oct 2009
    • 99

    strange Illumina txt format

    Dear NGS-users,
    I have a problem for the analysis of reads from f single-read 36bp SureSelect run (Illumina).

    My reads file is in .txt format.

    I usually use BWA for the alignment and then use SAMTOOLS for the pileup.
    With alignment step, BWA returns me a 64kb sai file, and a sam file of 27Kb of dimension. Probably these files are uncorrect and incomplete. The next step to convert sam2bam crushes with this message:

    [samopen] SAM header is present: 657 sequences.
    [sam_read1] reference 'SN:hg18_knownGene_uc002qho.2 LN:16765

    ' is recognized as '*'.
    [main_samview] truncated file.


    I think that the problem is the strange format of the initial txt file (here is an example):
    HWUSI-EAS68R 1 3 1 995 11343 0 1 .CTTG........T.....GGGG............................ BBBBBBBBBBBBBBBBBBBBBBBBBBBBB
    BBBBBBBBBBBBBBBBBBBBBB 0
    HWUSI-EAS68R 1 3 1 995 18576 0 1 .ACAG........C.....GTTG............................ BBBBBBBBBBBBBBBBBBBBBBBBBBBBB
    BBBBBBBBBBBBBBBBBBBBBB 0

    Why is so strange my txt file?
    May depend on what?
    Thank you very much
    M.Elena
  • m_elena_bioinfo
    Member
    • Oct 2009
    • 99

    #2
    Here is another example of my txt file:

    HWUSI-EAS68R 1 3 1 1097 20058 0 1 AGACCTCATTATTATCTGTGTGTCTGCATTTTCTAATCCTTTTTGCCCCAG ]^aaa]][]]E^^^`]]]^]aaaaaaa_a
    \aa_``_a_aa[a]]`YZ[WSY 1
    HWUSI-EAS68R 1 3 1 1097 17901 0 1 TGCTGATGAGATTTATGACTGCAAGGTGGAGCACTGGGGCCTGGACCAGCC bbbbb]^`Y`Kbbbbaaa_`^^b^b]\]_
    `bbbbbbb^b_b^bbb_b\bbb 1
    HWUSI-EAS68R 1 3 1 1097 17710 0 1 TGGCGCACCCTAAGGCTCAGTCAGTAACCCGTACACAAACTCGTCCCTGCA BBBBBBBBBBBBBBBBBBBBBBBBBBBBB
    BBBBBBBBBBBBBBBBBBBBBB 0

    Comment

    • rmdavies
      Member
      • Dec 2009
      • 13

      #3
      This looks like an Illumina .qseq.txt file, although I'm a bit puzzled as to why the last base of the sequence is in its own column. Possibly this is an artifact from when you copied the data into the forum?

      Assuming that this is a .qseq.txt file, you can convert it to fastq format with the following perl script (note that this script converts the quality values to phred+33 format):

      Code:
      #!/usr/bin/perl
      
      use strict;
      use warnings;
      
      while (<>) {
          chomp;
          my ($instr, $run_id, $lane, $tile, $x, $y, $index, $read,
      	$bases, $q_line, $filter) = split /\t/, $_;
      
          # Turn dots into Ns in the base calls
          $bases =~ tr/./N/;
      
          # Convert Illumina's quality values to true Phred scale
          $q_line =~ tr/\x40-\xff\x00-\x3f/\x21-\xe0\x21/;
      
          if ($index) {
      	print "\@${instr}_$run_id:$lane:$tile:$x:$y\#$index/$read\n";
          } else {
      	print "\@${instr}_$run_id:$lane:$tile:$x:$y/$read\n";
          }
          print "$bases\n+\n$q_line\n";
      }
      Once you have your fastq file, you should be able to use it as input to bwa in order to get your alignment.

      Comment

      • m_elena_bioinfo
        Member
        • Oct 2009
        • 99

        #4
        Thank you very much rmdavies! You are right and my file is a .qseq one! By converting the file, now I can get the alignment with bwa. I'm a beginner with illumina data, and I could not know what is this format.
        thanks again for the precious help!!!
        M.Elena

        Comment

        • Asifullah
          Junior Member
          • Aug 2010
          • 5

          #5
          Originally posted by m_elena_bioinfo View Post
          Thank you very much rmdavies! You are right and my file is a .qseq one! By converting the file, now I can get the alignment with bwa. I'm a beginner with illumina data, and I could not know what is this format.
          thanks again for the precious help!!!
          M.Elena
          I am a new user of illumina data and i myself facing the same type of read format which u have pointed. I want to convert it in to simple FASTq format. could you please give the commend line for running this perl script. I can,t find its running commend line for successful operation. My email address is ([email protected]). i will highly oblige to lessen from your side.
          regard,
          asifullah

          Comment

          • epigen
            Senior Member
            • May 2010
            • 101

            #6
            convert Illumina scores into Phred scores in a BAM file

            Originally posted by rmdavies View Post
            Assuming that this is a .qseq.txt file, you can convert it to fastq format with the following perl script (note that this script converts the quality values to phred+33 format):
            Although this thread is quite old, I found it extremely useful. Thanks for providing the efficient way to convert Illumina scores into Phred scores, rmdavies!
            I used it to transform the scores in a BAM file:

            samtools view -h Illumina_score.bam | perl -lane '$"="\t"; if (/^@/) {print;} else {$F[10]=~ tr/\x40-\xff\x00-\x3f/\x21-\xe0\x21/;print "@F"}' | samtools view -Sbh - > Phred_score.bam

            Saved us a lot of fastq file transformations and we did not have to run all the BWA alignments again.

            Comment

            • jamminbeh
              Member
              • Aug 2009
              • 11

              #7
              Do you know how to run the perl script in unix and make it utilize the fastq file? thanks.

              Comment

              • sklages
                Senior Member
                • May 2008
                • 628

                #8
                Originally posted by jamminbeh View Post
                Do you know how to run the perl script in unix and make it utilize the fastq file? thanks.
                to run: perl myScript.pl FASTQ_file > newfile.fq

                It is not intended to work with fastq files but with qseq files.

                Comment

                • Kaas
                  Member
                  • Dec 2012
                  • 20

                  #9
                  I saved the above mentioned script as Sanger.pl and tested it on the first 12 lines of a Illumina 1.5 fastq file

                  perl Sanger.pl SRR329951_2.fastq.fa > newfile.fq

                  put I am given the following errors for all lines
                  Use of uninitialized value in transliteration (tr///) at Sanger.pl line 23, <> line 1.
                  Use of uninitialized value in transliteration (tr///) at Sanger.pl line 29, <> line 1.
                  Use of uninitialized value $run_id in concatenation (.) or string at Sanger.pl line 39, <> line 1.
                  Use of uninitialized value $lane in concatenation (.) or string at Sanger.pl line 39, <> line 1.
                  Use of uninitialized value $tile in concatenation (.) or string at Sanger.pl line 39, <> line 1.
                  Use of uninitialized value $x in concatenation (.) or string at Sanger.pl line 39, <> line 1.
                  Use of uninitialized value $y in concatenation (.) or string at Sanger.pl line 39, <> line 1.
                  Use of uninitialized value $read in concatenation (.) or string at Sanger.pl line 39, <> line 1.
                  Use of uninitialized value $bases in concatenation (.) or string at Sanger.pl line 43, <> line 1.
                  Use of uninitialized value $q_line in concatenation (.) or string at Sanger.pl line 43, <> line 1.

                  Any idea what I am doing wrong?

                  cat SRR823966_2.fastq
                  @SRR823966.19590458 FCC076MACXX:3:2308:13081:200822 length=90/2
                  GGAGACTGTAGTTGGGTAGAGGGTCAGGTGTCGGGGTACTCGTGAGTTGTGTTGGCGGTTGTGTAGTTTAGTATATGTGTGATTGTTTGT
                  +SRR823966.19590458 FCC076MACXX:3:2308:13081:200822 length=90
                  BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
                  @SRR823966.19590459 FCC076MACXX:3:2308:14378:200834 length=90/2
                  AAACATGTAACTTATTTATTTTTACCATTGTTGGGCTGGCGTGGTGGTTTGTGAGTGGGCCTTTGAGTTTGATGTCAGTCTGGTCTGTGT
                  +SRR823966.19590459 FCC076MACXX:3:2308:14378:200834 length=90
                  PPJSSS\aQQKR]JQbfQ]biiiJJRJR[RSbfhiHYbgHHO^eaeheGW[bfHW]bgBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

                  head -12 newfile.fq
                  @@SRR823966.19590458 FCC076MACXX:3:2308:13081:200822 length=90/2_::::/

                  +

                  @GGAGACTGTAGTTGGGTAGAGGGTCAGGTGTCGGGGTACTCGTGAGTTGTGTTGGCGGTTGTGTAGTTTAGTATATGTGTGATTGTTTGT_::::/

                  +

                  @+SRR823966.19590458 FCC076MACXX:3:2308:13081:200822 length=90_::::/

                  +

                  @BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB_::::/

                  +

                  @@SRR823966.19590459 FCC076MACXX:3:2308:14378:200834 length=90/2_::::/

                  +

                  @AAACATGTAACTTATTTATTTTTACCATTGTTGGGCTGGCGTGGTGGTTTGTGAGTGGGCCTTTGAGTTTGATGTCAGTCTGGTCTGTGT_::::/

                  +

                  @+SRR823966.19590459 FCC076MACXX:3:2308:14378:200834 length=90_::::/

                  +

                  @PPJSSS\aQQKR]JQbfQ]biiiJJRJR[RSbfhiHYbgHHO^eaeheGW[bfHW]bgBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB_::::/

                  +

                  @@SRR823966.19590460 FCC076MACXX:3:2308:15143:200791 length=90/2_::::/

                  +

                  @CTCGAGCAGGAGAGGGGCCCTGGCTGCTGAGGGGTCCCTGTCCAATAACCCCCACACCGATCATGTCCCTCACAGTTTCCATCTCAGACG_::::/

                  +

                  @+SRR823966.19590460 FCC076MACXX:3:2308:15143:200791 length=90_::::/

                  +

                  @BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB_::::/

                  +

                  Comment

                  • mastal
                    Senior Member
                    • Mar 2009
                    • 666

                    #10
                    Hi Kaas,

                    What are you trying to achieve by using this script?
                    Because your starting file, SRR823966_2.fastq, looks like it's already in fastq format.

                    The script converts Illumina's old qseq.txt format to fastq.
                    Since your file is not in qseq format, the script is not extracting the right information into the variables,
                    which is why you are getting all those 'Use of uninitialized value' errors.

                    Comment

                    • Kaas
                      Member
                      • Dec 2012
                      • 20

                      #11
                      Thanks mastal
                      I arrived at this thread from http://seqanswers.com/forums/showthread.php?t=5210 because i needed to convert from Illumina 1.5 (phred64) to Illumina 1.9 (Phred33). i misread and used the perl script from this this thread.

                      Comment

                      • HESmith
                        Senior Member
                        • Oct 2009
                        • 512

                        #12
                        BFAST contains a Perl script (ill2fastq) for this conversion.

                        Comment

                        Latest Articles

                        Collapse

                        • SEQadmin2
                          From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                          by SEQadmin2


                          Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                          The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                          ...
                          Yesterday, 10:05 AM
                        • SEQadmin2
                          Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                          by SEQadmin2


                          With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                          Introduction

                          Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                          05-22-2026, 06:42 AM
                        • SEQadmin2
                          Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                          by SEQadmin2

                          Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                          Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                          05-06-2026, 09:04 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by SEQadmin2, Yesterday, 12:03 PM
                        0 responses
                        17 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, Yesterday, 11:40 AM
                        0 responses
                        13 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 05-28-2026, 11:40 AM
                        0 responses
                        29 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 05-26-2026, 10:12 AM
                        0 responses
                        31 views
                        0 reactions
                        Last Post SEQadmin2  
                        Working...