Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    ok fine i get it, in the report it says hi-seq 2000, base calling pipleline hiSeq control soft. v 1.4.5
    but by default bowtie identifies the quality scale. however now i am piping the fastq file through the perl script from the link you suggested will see when the run is finished.
    i am new may be that's why it hard like this.
    thanks a lot

    Comment


    • #17
      Software to filter errors in fastq files?

      Bowtie doesn't identify the scale by default, phred33 is the default
      scale bowtie will use unless you specify that your files use a different scale.

      But still, I don't think there are any quality encodings that would give you a value of -93.

      And yes, we've all been there, things always seem more complicated at the beginning.

      Comment


      • #18
        hello guys
        Please i saw this summary file in the tophat out folder
        please what does it mean and why is it 64% its very low. i googled a bit but became more confused
        what can i do to to improve the mapping. i used default setting of tophat.

        Left reads:
        Input: 63588486
        Mapped: 41120473 (64.7% of input)
        of these: 5143253 (12.5%) have multiple alignments (2 have >20)
        Right reads:
        Input: 63588486
        Mapped: 38423206 (60.4% of input)
        of these: 4773086 (12.4%) have multiple alignments (0 have >20)
        62.5% overall read alignment rate.

        Aligned pairs: 31409898
        of these: 3418180 (10.9%) have multiple alignments
        and: 24649 ( 0.1%) are discordant alignments
        49.4% concordant pair alignment rate.

        thanks

        Comment


        • #19
          There are a few questions you'll need to answer before anyone can help you:
          1) How long are the reads?
          2) Have you quality trimmed yet?
          3) What organism is this?
          4) What reference did you use?
          5) What version of tophat/bowtie was this?
          6) What was the exact command line argument used to start alignment?
          7) What sort of experiment was this from?

          Comment


          • #20
            Originally posted by dpryan View Post
            There are a few questions you'll need to answer before anyone can help you:
            1) How long are the reads?
            2) Have you quality trimmed yet?
            3) What organism is this?
            4) What reference did you use?
            5) What version of tophat/bowtie was this?
            6) What was the exact command line argument used to start alignment?
            7) What sort of experiment was this from?

            ok i see
            1 reads are 100bp. i did clearing of fastq file
            2 no i did not and i dont know how honestly
            3 organism plant
            4 ncbi mRNA
            5 BOWTIE 2
            6 TOPHAT2 path to ref.fa path to fastq file A1 and A2
            7 rnaseq.

            Comment


            • #21
              You might use trim_galore/trimmomatic/etc. to quality trim the reads and align again. Also, since you're aligning directly to the transcriptome, your alignment rate will be decreased if whichever plant your using doesn't have a particularly complete reference transcriptome.

              Comment


              • #22
                Originally posted by dpryan View Post
                You might use trim_galore/trimmomatic/etc. to quality trim the reads and align again. Also, since you're aligning directly to the transcriptome, your alignment rate will be decreased if whichever plant your using doesn't have a particularly complete reference transcriptome.
                please what is this command line, never used it before and how can i get it.
                thanks

                Comment


                • #23
                  Originally posted by aforntacc View Post
                  please what is this command line, never used it before and how can i get it.
                  thanks
                  Have you googled for "trim_galore" or "trimmomatic"? They come with some documentation.

                  Comment


                  • #24
                    This was great thanks!!

                    Comment


                    • #25
                      filtering bad reads

                      Originally posted by simonandrews View Post
                      If it's useful to anyone this is a small script I knocked up when we had to process some fastq files which were corrupted during an FTP transfer. You can pipe data through it and it does some basic sanity checks to ensure that the file looks like valid fastq data. It will remove any entries which look broken and leave you just the good stuff.

                      Code:
                      #!/usr/bin/perl
                      use warnings;
                      use strict;
                      
                      while (<>) {
                      
                        unless (/^\@/) {
                          warn "$_ should have had an \@ at the start and it didn't\n";
                          next;
                        }
                        my $id1 = $_;
                        my $seq = <>;
                        my $id2 = <>;
                        my $qual = <>;
                      
                        if ($seq =~/^[@+]/) {
                          warn "Sequence '$seq' looked like an id";
                          next;
                        }
                        if ($qual =~/^[@+]/) {
                          warn "Quality '$qual' looked like an id";
                          next;
                        }
                        if ($id2 !~ /^\+/) {
                          warn "Midline '$id2' didn't start with a +";
                          next;
                        }
                      
                        if ($qual =~ /[GATCN]{20,}/) {
                          warn "Quality '$qual' looked like sequence";
                          next;
                        }
                      
                        if (length($seq) != length($qual)) {
                          warn "Seq $seq and Qual $qual weren't the same length";
                          next;
                        }
                      
                        print $id1,$seq,$id2,$qual;
                      
                      
                      }

                      Thank you so much for the script, I used it and it worked with my reads and left only good quality reads (which I managed to map using tophat). I have one question though, will filtering these reads affect any downstream analysis (e.g. cuffdiff step) where differential gene expression is dependent on read quantity between my conditions? I'm a biologist by training and have recently started working with RNA-Seq data. Any response will be highly appreciated.

                      Comment


                      • #26
                        Quality-filtering will always incur bias in a platform where quality is affected by sequence composition; I don't recommend it for quantitative analysis like differential expression. It's better to quality-trim or simply use an aligner that is capable of mapping the low-quality reads, like BBMap.

                        Comment


                        • #27
                          Thank you so much for your response Brian. I will explore BBMap in the mean time. Does anyone know what this error mean? It occurred while filtering the reads using the perl script above.

                          Can't locate object method "With" via package "Quote" (perhaps you forgot to load "Quote"?) at ./perlscript.pl line 44, <> line 847195456

                          Thanks.

                          Comment


                          • #28
                            Sorry, I don't use Perl.
                            Last edited by Brian Bushnell; 12-04-2015, 10:00 AM.

                            Comment


                            • #29
                              Thanks so much Brian Bushnell. The reason we trying to explore cleaning the reads is because our tophat jobs were running out of time in the server before completion. Our raw read files are too big (~13GB per .gz file) and we were thinking that some of the reads might not be of good quality. If that the case, why not remove them and map only good quality reads? Again, this might not be a good approach and that why I'm seeking help.

                              Comment


                              • #30
                                @Ntobe: That perl script referenced above is only for checking if a file has corrupt fastq records. Is that what you are using it for?

                                Have you scanned and trimmed (if needed) your raw data files to remove adapter contamination? You can speed up Tophat jobs by using multiple threads. Have you tried using that option?

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                30 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                32 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                28 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                52 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X