Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by yifangt View Post
    My question is related, but not quite the same, which is: I need to get a sub-string of the sequence and the corresponding quality score for each of the entries, the file format untouched. My Illumina reads consists of 101bp long. I want remove the first 10bp and last 25bp then only the middle 66bp left.

    The reason I need do this is my DNA sequence is methylated and the first 10 and last 25bp seem not having good quality in general so that I want get rid of them. My challenge is to remove both quality score and sequence correspondingly.
    Code:
    use Bio::SeqIO; use Bio::Seq::Quality;
    
    $seqio = Bio::SeqIO->new('-format'=>'fastq' , '-file'=>'some.fasq');
    
    my $out_fastq = Bio::SeqIO->new( -format => 'fastq', '-file'=> 'subset.fastq');
    
    while((my $line = $seqio->next_seq() ) { 
    # keep the id 
    # substring the sequence (from 11 to 66) 
    # substring the quality score (from 11 to 66) $out_fastq->write_seq($line); }
    Or any tools there to do my job?

    Thanks a lot!

    Yifang
    Yifang,

    First rule of coding, never write your own when a tool already exist to do what you want. The FASTX-Toolkit has a utility to do exactly what you want, namely the fastx_trimmer. You give an input file (fasta or fastq), the postion of the first and last base you wish to keep (in your case 11 and 76) and it will produce a trimmed (bases and quality) file.

    Comment


    • #17
      Thanks a lot kmcarr! I was searching for it, just missed the inside. I will give it a try.

      Best!
      Yifang

      Comment


      • #18
        Code:
        awk '{print ; getline } {print substr($0, 11, 76) ; getline; print ; getline ; print substr($0, 11, 76) }' input.fastq

        Comment


        • #19
          This is a cool script. Thanks gpcr! Yifang

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Investigating the Gut Microbiome Through Diet and Spatial Biology
            by seqadmin




            The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
            02-24-2025, 06:31 AM
          • seqadmin
            Quality Control Essentials for Next-Generation Sequencing Workflows
            by seqadmin




            Like all molecular biology applications, next-generation sequencing (NGS) workflows require diligent quality control (QC) measures to ensure accurate and reproducible results. Proper QC begins at nucleic acid extraction and continues all the way through to data analysis. This article outlines the key QC steps in an NGS workflow, along with the commonly used tools and techniques.

            Nucleic Acid Quality Control
            Preparing for NGS starts with isolating the...
            02-10-2025, 01:58 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 03-03-2025, 01:15 PM
          0 responses
          149 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 02-28-2025, 12:58 PM
          0 responses
          223 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 02-24-2025, 02:48 PM
          0 responses
          590 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 02-21-2025, 02:46 PM
          0 responses
          259 views
          0 likes
          Last Post seqadmin  
          Working...
          X