Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • strange Illumina txt format

    Dear NGS-users,
    I have a problem for the analysis of reads from f single-read 36bp SureSelect run (Illumina).

    My reads file is in .txt format.

    I usually use BWA for the alignment and then use SAMTOOLS for the pileup.
    With alignment step, BWA returns me a 64kb sai file, and a sam file of 27Kb of dimension. Probably these files are uncorrect and incomplete. The next step to convert sam2bam crushes with this message:

    [samopen] SAM header is present: 657 sequences.
    [sam_read1] reference 'SN:hg18_knownGene_uc002qho.2 LN:16765

    ' is recognized as '*'.
    [main_samview] truncated file.


    I think that the problem is the strange format of the initial txt file (here is an example):
    HWUSI-EAS68R 1 3 1 995 11343 0 1 .CTTG........T.....GGGG............................ BBBBBBBBBBBBBBBBBBBBBBBBBBBBB
    BBBBBBBBBBBBBBBBBBBBBB 0
    HWUSI-EAS68R 1 3 1 995 18576 0 1 .ACAG........C.....GTTG............................ BBBBBBBBBBBBBBBBBBBBBBBBBBBBB
    BBBBBBBBBBBBBBBBBBBBBB 0

    Why is so strange my txt file?
    May depend on what?
    Thank you very much
    M.Elena

  • #2
    Here is another example of my txt file:

    HWUSI-EAS68R 1 3 1 1097 20058 0 1 AGACCTCATTATTATCTGTGTGTCTGCATTTTCTAATCCTTTTTGCCCCAG ]^aaa]][]]E^^^`]]]^]aaaaaaa_a
    \aa_``_a_aa[a]]`YZ[WSY 1
    HWUSI-EAS68R 1 3 1 1097 17901 0 1 TGCTGATGAGATTTATGACTGCAAGGTGGAGCACTGGGGCCTGGACCAGCC bbbbb]^`Y`Kbbbbaaa_`^^b^b]\]_
    `bbbbbbb^b_b^bbb_b\bbb 1
    HWUSI-EAS68R 1 3 1 1097 17710 0 1 TGGCGCACCCTAAGGCTCAGTCAGTAACCCGTACACAAACTCGTCCCTGCA BBBBBBBBBBBBBBBBBBBBBBBBBBBBB
    BBBBBBBBBBBBBBBBBBBBBB 0

    Comment


    • #3
      This looks like an Illumina .qseq.txt file, although I'm a bit puzzled as to why the last base of the sequence is in its own column. Possibly this is an artifact from when you copied the data into the forum?

      Assuming that this is a .qseq.txt file, you can convert it to fastq format with the following perl script (note that this script converts the quality values to phred+33 format):

      Code:
      #!/usr/bin/perl
      
      use strict;
      use warnings;
      
      while (<>) {
          chomp;
          my ($instr, $run_id, $lane, $tile, $x, $y, $index, $read,
      	$bases, $q_line, $filter) = split /\t/, $_;
      
          # Turn dots into Ns in the base calls
          $bases =~ tr/./N/;
      
          # Convert Illumina's quality values to true Phred scale
          $q_line =~ tr/\x40-\xff\x00-\x3f/\x21-\xe0\x21/;
      
          if ($index) {
      	print "\@${instr}_$run_id:$lane:$tile:$x:$y\#$index/$read\n";
          } else {
      	print "\@${instr}_$run_id:$lane:$tile:$x:$y/$read\n";
          }
          print "$bases\n+\n$q_line\n";
      }
      Once you have your fastq file, you should be able to use it as input to bwa in order to get your alignment.

      Comment


      • #4
        Thank you very much rmdavies! You are right and my file is a .qseq one! By converting the file, now I can get the alignment with bwa. I'm a beginner with illumina data, and I could not know what is this format.
        thanks again for the precious help!!!
        M.Elena

        Comment


        • #5
          Originally posted by m_elena_bioinfo View Post
          Thank you very much rmdavies! You are right and my file is a .qseq one! By converting the file, now I can get the alignment with bwa. I'm a beginner with illumina data, and I could not know what is this format.
          thanks again for the precious help!!!
          M.Elena
          I am a new user of illumina data and i myself facing the same type of read format which u have pointed. I want to convert it in to simple FASTq format. could you please give the commend line for running this perl script. I can,t find its running commend line for successful operation. My email address is ([email protected]). i will highly oblige to lessen from your side.
          regard,
          asifullah

          Comment


          • #6
            convert Illumina scores into Phred scores in a BAM file

            Originally posted by rmdavies View Post
            Assuming that this is a .qseq.txt file, you can convert it to fastq format with the following perl script (note that this script converts the quality values to phred+33 format):
            Although this thread is quite old, I found it extremely useful. Thanks for providing the efficient way to convert Illumina scores into Phred scores, rmdavies!
            I used it to transform the scores in a BAM file:

            samtools view -h Illumina_score.bam | perl -lane '$"="\t"; if (/^@/) {print;} else {$F[10]=~ tr/\x40-\xff\x00-\x3f/\x21-\xe0\x21/;print "@F"}' | samtools view -Sbh - > Phred_score.bam

            Saved us a lot of fastq file transformations and we did not have to run all the BWA alignments again.

            Comment


            • #7
              Do you know how to run the perl script in unix and make it utilize the fastq file? thanks.

              Comment


              • #8
                Originally posted by jamminbeh View Post
                Do you know how to run the perl script in unix and make it utilize the fastq file? thanks.
                to run: perl myScript.pl FASTQ_file > newfile.fq

                It is not intended to work with fastq files but with qseq files.

                Comment


                • #9
                  I saved the above mentioned script as Sanger.pl and tested it on the first 12 lines of a Illumina 1.5 fastq file

                  perl Sanger.pl SRR329951_2.fastq.fa > newfile.fq

                  put I am given the following errors for all lines
                  Use of uninitialized value in transliteration (tr///) at Sanger.pl line 23, <> line 1.
                  Use of uninitialized value in transliteration (tr///) at Sanger.pl line 29, <> line 1.
                  Use of uninitialized value $run_id in concatenation (.) or string at Sanger.pl line 39, <> line 1.
                  Use of uninitialized value $lane in concatenation (.) or string at Sanger.pl line 39, <> line 1.
                  Use of uninitialized value $tile in concatenation (.) or string at Sanger.pl line 39, <> line 1.
                  Use of uninitialized value $x in concatenation (.) or string at Sanger.pl line 39, <> line 1.
                  Use of uninitialized value $y in concatenation (.) or string at Sanger.pl line 39, <> line 1.
                  Use of uninitialized value $read in concatenation (.) or string at Sanger.pl line 39, <> line 1.
                  Use of uninitialized value $bases in concatenation (.) or string at Sanger.pl line 43, <> line 1.
                  Use of uninitialized value $q_line in concatenation (.) or string at Sanger.pl line 43, <> line 1.

                  Any idea what I am doing wrong?

                  cat SRR823966_2.fastq
                  @SRR823966.19590458 FCC076MACXX:3:2308:13081:200822 length=90/2
                  GGAGACTGTAGTTGGGTAGAGGGTCAGGTGTCGGGGTACTCGTGAGTTGTGTTGGCGGTTGTGTAGTTTAGTATATGTGTGATTGTTTGT
                  +SRR823966.19590458 FCC076MACXX:3:2308:13081:200822 length=90
                  BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
                  @SRR823966.19590459 FCC076MACXX:3:2308:14378:200834 length=90/2
                  AAACATGTAACTTATTTATTTTTACCATTGTTGGGCTGGCGTGGTGGTTTGTGAGTGGGCCTTTGAGTTTGATGTCAGTCTGGTCTGTGT
                  +SRR823966.19590459 FCC076MACXX:3:2308:14378:200834 length=90
                  PPJSSS\aQQKR]JQbfQ]biiiJJRJR[RSbfhiHYbgHHO^eaeheGW[bfHW]bgBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

                  head -12 newfile.fq
                  @@SRR823966.19590458 FCC076MACXX:3:2308:13081:200822 length=90/2_::::/

                  +

                  @GGAGACTGTAGTTGGGTAGAGGGTCAGGTGTCGGGGTACTCGTGAGTTGTGTTGGCGGTTGTGTAGTTTAGTATATGTGTGATTGTTTGT_::::/

                  +

                  @+SRR823966.19590458 FCC076MACXX:3:2308:13081:200822 length=90_::::/

                  +

                  @BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB_::::/

                  +

                  @@SRR823966.19590459 FCC076MACXX:3:2308:14378:200834 length=90/2_::::/

                  +

                  @AAACATGTAACTTATTTATTTTTACCATTGTTGGGCTGGCGTGGTGGTTTGTGAGTGGGCCTTTGAGTTTGATGTCAGTCTGGTCTGTGT_::::/

                  +

                  @+SRR823966.19590459 FCC076MACXX:3:2308:14378:200834 length=90_::::/

                  +

                  @PPJSSS\aQQKR]JQbfQ]biiiJJRJR[RSbfhiHYbgHHO^eaeheGW[bfHW]bgBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB_::::/

                  +

                  @@SRR823966.19590460 FCC076MACXX:3:2308:15143:200791 length=90/2_::::/

                  +

                  @CTCGAGCAGGAGAGGGGCCCTGGCTGCTGAGGGGTCCCTGTCCAATAACCCCCACACCGATCATGTCCCTCACAGTTTCCATCTCAGACG_::::/

                  +

                  @+SRR823966.19590460 FCC076MACXX:3:2308:15143:200791 length=90_::::/

                  +

                  @BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB_::::/

                  +

                  Comment


                  • #10
                    Hi Kaas,

                    What are you trying to achieve by using this script?
                    Because your starting file, SRR823966_2.fastq, looks like it's already in fastq format.

                    The script converts Illumina's old qseq.txt format to fastq.
                    Since your file is not in qseq format, the script is not extracting the right information into the variables,
                    which is why you are getting all those 'Use of uninitialized value' errors.

                    Comment


                    • #11
                      Thanks mastal
                      I arrived at this thread from http://seqanswers.com/forums/showthread.php?t=5210 because i needed to convert from Illumina 1.5 (phred64) to Illumina 1.9 (Phred33). i misread and used the perl script from this this thread.

                      Comment


                      • #12
                        BFAST contains a Perl script (ill2fastq) for this conversion.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Advancing Precision Medicine for Rare Diseases in Children
                          by seqadmin




                          Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                          12-16-2024, 07:57 AM
                        • seqadmin
                          Recent Advances in Sequencing Technologies
                          by seqadmin



                          Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                          Long-Read Sequencing
                          Long-read sequencing has seen remarkable advancements,...
                          12-02-2024, 01:49 PM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 12-17-2024, 10:28 AM
                        0 responses
                        33 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 12-13-2024, 08:24 AM
                        0 responses
                        49 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 12-12-2024, 07:41 AM
                        0 responses
                        34 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 12-11-2024, 07:45 AM
                        0 responses
                        46 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X