Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • apredeus
    Senior Member
    • Jul 2012
    • 151

    MiSeq producing various length reads

    Hello all

    I'm processing a micro-RNA-seq experiment for a collaborator of ours, and see a very unusual thing. They have sequenced three samples using miSeq, with the expected read length of 51. However instead I see lots of reads that are NNNNNNNNNNNNNN of length 20-21, and quite a few of intermediate ones too.

    This is very unusual - do you have any idea about why it might have happened?
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    Are you saying that there are actual NNNN or just short(er) than 51 bp reads?

    If there are N's then that may indicate a failure of basecalling. It could be due to overloading. Generally sequencing facilities will not release this kind of data.

    If that is a result of some sort of post-run data processing (where they replaced the adapter sequences with N's for example, don't know if BaseSpace does something like that) then you would need to ask. If you ignore/strip the N's is the rest of the data good quality?

    Comment

    • apredeus
      Senior Member
      • Jul 2012
      • 151

      #3
      There are a bunch of NNNNN reads that are 20 bp long, and there are bunch of other reads that are not N* but have a variable length. I'll try to align them to see if it will at least look like micro-RNA, but the thing is, you need to clip the adapters and it's hard to do it on a variable length read

      It does not look like the cell is overloaded from FastQC report though. It looks like there's a small bubble there but that's all.

      It was not a sequencing facility that did it - just a small institute ran it on their MiSeq. So they totally might have done something wrong there, they don't run it very often for this sort of libraries - mostly they sequence strains of viruses.

      Comment

      • Brian Bushnell
        Super Moderator
        • Jan 2014
        • 2709

        #4
        Reads don't come off the machine with variable length unless you set the Illumina software to trim the adapters during base-calling or demultiplexing or something (not sure exactly when it happens), or they've been postprocessed in some way. You should ask how the data was generated, or better yet, see if you can get the raw fastq data.

        Comment

        • apredeus
          Senior Member
          • Jul 2012
          • 151

          #5
          Those were supposed to be raw fastq. But you are right, I was thinking along the same lines. I'll just come over and get the data from the device myself.

          Comment

          • jdk787
            josh kinman
            • Apr 2014
            • 72

            #6
            I've seen this with short small RNA libraries when using MiSeq reporter to demux with automatic adapter trimming.

            To fix this you can redemultiplex the run with BCL2FastQ, or remove the adapter sequences from your sample sheet and redumultiplex with MiSeq reporter. Then just trim the adapters yourself.
            Josh Kinman

            Comment

            • GenoMax
              Senior Member
              • Feb 2008
              • 7142

              #7
              Originally posted by apredeus View Post
              Those were supposed to be raw fastq. But you are right, I was thinking along the same lines. I'll just come over and get the data from the device myself.
              If you can't get the raw data or can't get the facility to re-run the analysis then just trim the N's off. One can safely assume that Illumina would know how to identify their own adapter sequences. It sounds like they are masked by the default demux process.

              @Brian: What is an easy way to trim those N's using BBMap? I should add this to my BBMap tricks thread.

              Comment

              • jdk787
                josh kinman
                • Apr 2014
                • 72

                #8
                Originally posted by GenoMax View Post
                If you can't get the raw data or can't get the facility to re-run the analysis then just trim the N's off. One can safely assume that Illumina would know how to identify their own adapter sequences. It sounds like they are masked by the default demux process.
                I couldn't find this info for MiSeq Reporter, but did see this in the Bcl2FastQ guide..

                --mask-short-adapter-reads arg (=22) smallest number of remaining bases (after masking bases below the minimum trimmed read length) below which whole read is masked

                So it looks like it is possible that the adapters are being correctly identified, but the remaining read after trimming is shorter than 22bp and may be being masked with NNNN.

                Since this is micro RNA, I think it is worth trying to redemux without adapter trimming or changing this variable in order to unmask these reads instead of removing them. Doing this has worked for me when sequencing Small RNA libraries on the MiSeq.
                Josh Kinman

                Comment

                • jdk787
                  josh kinman
                  • Apr 2014
                  • 72

                  #9
                  From MiSeq Reporter User Guide

                  Masking Short Reads
                  MiSeq Reporter includes a setting that prevents reads that have been almost entirely
                  trimmed or masked from confounding downstream analysis, which is based on the following criteria:
                  } If the adapter is encountered within the first 32 bases of the read, the adapter sequence is N-masked.
                  } If the adapter is identified in the first 32 bases and the read includes ten or more bases from the start of the adapter, the entire read is N-masked. This ten-base limit is controlled by the configuration setting NMaskShortAdapterReads.
                  Josh Kinman

                  Comment

                  • Brian Bushnell
                    Super Moderator
                    • Jan 2014
                    • 2709

                    #10
                    Originally posted by GenoMax View Post
                    One can safely assume that Illumina would know how to identify their own adapter sequences.
                    I'd like to think so...

                    What is an easy way to trim those N's using BBMap? I should add this to my BBMap tricks thread.
                    You can use BBDuk or Reformat with "qtrim=rl trimq=1". That will only trim trailing and leading bases with Q-score below 1, which means Q0, which means N (in either fasta or fastq format). The BBMap package automatically changes q-scores of Ns that are above 0 to 0 and called bases with q-scores below 2 to 2, since occasionally some Illumina software versions produces odd things like a handful of Q0 called bases or Ns with Q>0, neither of which make any sense in the Phred scale.

                    @jdk787, thanks for posting the specific details of what's going on. Looks like defaults that make sense in many cases but not for small RNAs.

                    Comment

                    • agent_pilin
                      Junior Member
                      • Apr 2019
                      • 4

                      #11
                      Originally posted by apredeus View Post
                      Hello all

                      I'm processing a micro-RNA-seq experiment for a collaborator of ours, and see a very unusual thing. They have sequenced three samples using miSeq, with the expected read length of 51. However instead I see lots of reads that are NNNNNNNNNNNNNN of length 20-21, and quite a few of intermediate ones too.

                      This is very unusual - do you have any idea about why it might have happened?
                      Hello Alexander, have you already found the reason of this problem?
                      I have the same problem with last sequencing data: the reads 1 are considered to have the length 41 bp, but real length varies from 35 bp to 41 bp and some of reads are polyN!

                      Comment

                      • GenoMax
                        Senior Member
                        • Feb 2008
                        • 7142

                        #12
                        Are your sequences adapter masked or are there genuine N's (no calls)?

                        Comment

                        • agent_pilin
                          Junior Member
                          • Apr 2019
                          • 4

                          #13
                          Originally posted by GenoMax View Post
                          Are your sequences adapter masked or are there genuine N's (no calls)?
                          I think these are adapter sequences masked, but it was not me who performed sequencing experience, I process fastq raw data

                          Comment

                          • apredeus
                            Senior Member
                            • Jul 2012
                            • 151

                            #14
                            Originally posted by agent_pilin View Post
                            Hello Alexander, have you already found the reason of this problem?
                            I have the same problem with last sequencing data: the reads 1 are considered to have the length 41 bp, but real length varies from 35 bp to 41 bp and some of reads are polyN!
                            Hello,

                            I don't quite remember since it was a long time ago but I'm pretty sure that the reason this happened is due to Illumina software being confused by the adapter and short read sequence. So you would need to get the untrimmed sequences. If these are not available, get the BCL files and convert them to fastq yourself.

                            Comment

                            • agent_pilin
                              Junior Member
                              • Apr 2019
                              • 4

                              #15
                              Originally posted by apredeus View Post
                              Hello,

                              I don't quite remember since it was a long time ago but I'm pretty sure that the reason this happened is due to Illumina software being confused by the adapter and short read sequence. So you would need to get the untrimmed sequences. If these are not available, get the BCL files and convert them to fastq yourself.
                              Thank you for your answer, it's a good idea !
                              Stanislav

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                                Here are nine questions we think about, in roughly the order they matter, before...
                                06-18-2026, 07:11 AM
                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-17-2026, 06:09 AM
                              0 responses
                              40 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              102 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              123 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              114 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...