Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trouble with box-plot of quality data in SOLID

    hi everyone,
    I have used the SOLiD2std.pl to change the csfasta and qual files to standard fastq files.
    I then ran fastqc to view boxplots of quality data and got these results, attached.
    They seem to have poor quality every 5 nt, as if the primer 5 of the procedure failed....
    Has anyone seen this type of quality plots before???
    I am new with solid data, have worked with ilumina and 454 before.
    Thanks for any guidance in advance.
    cheers
    maximo
    Attached Files

  • #2
    You can not convert colorspace to basespace directly. See here why:
    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

    Comment


    • #3
      An alternative route to using FastQC, which is very nice, is to align first and use the resulting BAM file as input.

      Another program I have looked at for duplicate analysis is prinseq, which also offers graphical output.

      Comment


      • #4
        Did you removed adapter info or trimmed your reads.
        what about other fastQC results they are pass or fail.
        Krishna

        Comment


        • #5
          Originally posted by Krish_143 View Post
          Did you removed adapter info or trimmed your reads.
          what about other fastQC results they are pass or fail.
          Some aspects passed and others failed,
          I did NO removal of adapters.... Not sure if it was done when i received the csfasta and qual files..
          Taking into consideration that this is a direct conversion from color to base space, its understandable that there may be confusing results.... (In all about 45% of reads mapped with color-space-novoAlign to a reference, and only 15% usnig colorspace-tophat/bowtie...)
          Any thoughts?
          PASS Basic Statistics Corrida_4_FC_1_01_01CVATFR001_F3.fastq
          FAIL Per base sequence quality Corrida_4_FC_1_01_01CVATFR001_F3.fastq
          WARN Per sequence quality scores Corrida_4_FC_1_01_01CVATFR001_F3.fastq
          WARN Per base sequence content Corrida_4_FC_1_01_01CVATFR001_F3.fastq
          FAIL Per base GC content Corrida_4_FC_1_01_01CVATFR001_F3.fastq
          FAIL Per sequence GC content Corrida_4_FC_1_01_01CVATFR001_F3.fastq
          WARN Per base N content Corrida_4_FC_1_01_01CVATFR001_F3.fastq
          PASS Sequence Length Distribution Corrida_4_FC_1_01_01CVATFR001_F3.fastq
          PASS Sequence Duplication Levels Corrida_4_FC_1_01_01CVATFR001_F3.fastq
          PASS Overrepresented sequences Corrida_4_FC_1_01_01CVATFR001_F3.fastq
          FAIL Kmer Content Corrida_4_FC_1_01_01CVATFR001_F3.fastq

          Comment


          • #6
            I would recommend using SAET to clear up any colour errors before alignment. Normally I get 1-5% more aligned reads using this (apparently a lot more with ECC data).

            I don't think your fastqc boxplot results are too bad, but I've seen better ones.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            24 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            25 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            22 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            52 views
            0 likes
            Last Post seqadmin  
            Working...
            X