Hi,
the last fastq files I got for an RNA-seq experiment had for some reason a glitch, that is
there was at least one field which was not starting with + like:
+HWI-ST667_0144:3:1101:1140:2047#GTGGCC/1
So I thought I could easily get around this and write a short script to filter out all weird entries:
So, while the the code is probably not the smartest way to do it, it runs fine for small files but a 12 gig fastq file will abort after processing 130 Mb, though I explicitly state it should run until the end with eof(). I tried reading everything in an arrray first and then iterating through the array but nope, still aborts after 130 Mb (and uses awfully amount of memory).
So, I am missing something here, could someone help me out?
Thanks,
Marc
the last fastq files I got for an RNA-seq experiment had for some reason a glitch, that is
there was at least one field which was not starting with + like:
+HWI-ST667_0144:3:1101:1140:2047#GTGGCC/1
So I thought I could easily get around this and write a short script to filter out all weird entries:
Code:
#!/usr/bin/perl use warnings; use strict; use Carp; my $a = {}; my $n = 0; my $k = 0; my $filenames = $ARGV[0]; unless( open( FILEIOFH, $filenames ) ){ croak("Cannot open file" . $filenames ); } while (defined(my $line = <FILEIOFH>) ) { unless ( eof(FILEIOFH) ){ chomp $line; if ($n < 2 ){ $a->{$n} = $line; $n++; next; } if ($n == 2 && $line =~/^\+HWI/){ print $a->{0},"\n"; print $a->{1},"\n"; print $line,"\n"; $n++; $k=1; next; } if ($n == 2){ $n++; next; } if ($k){ print $line,"\n"; $n = 0;$k = 0; }else{ $n = 0; } } } close(FILEIOFH);
So, I am missing something here, could someone help me out?
Thanks,
Marc
Comment