Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Barcodes and Primers

    Hello everyone! I realize this question has been asked and answered, but even after reading quite a bit I can't decide what I have or don't have...dang!

    My data is fastq format and was downloaded from BaseSpace for use in third party analysis. I have run "make.contigs" using Mothur 1.31.2 on my first sample. Using both the R1 and R2 files, which I assume are my forward and reverse paired seqs. It ran fine with no errors, but as I read the sequences it really looks like they must have barcodes and/or primers still attached. I keep seeing that Illumina fastq files which have been de-multiplexed should already be trimmed of the barcodes and primers. Is this correct or do I need to somehow come up with an oligios file?

    Below is a sample of the first few reads. Thank you very much for any help you can provide. This is my first run through with Illumina data and my mentor is no help as he is apparently swamped right now.

    >M02146_10_000000000-A51MH_1_1101_13422_1525
    TACGGAGGGAGCTAGCGTTGTTCGGAATTACTGGGCGTAAAGCGTACGTAGGCGGCTATTCAAGTCAGAGGTGAAAGCCCGGGGCTCAACCCCGGAACTGCCTTTGAAACTAGGTAGCTAGAGTCTTGGAGAGGTTAGTGGAATTCCGAGTGTAGAGGTGAAATTCGTAGATATTCGGAAGAACACCAGTGGCGAAGGCGACTAACTGGACAAGTACTGACGCTGAGGTACGAAAGCGTGGGGAGCAAACAGG
    >M02146_10_000000000-A51MH_1_1101_13396_1529
    TACGGAGGGAGCTAGCGTTGTTCGGAATTACTGGGCGTAAAGCGTACGTAGGCGGCTATTCAAGTCAGAGGTGAAAGCCCGGGGCTCAACCCCGGAACTGCCTTTGAAACTAGGTAGCTAGAGTCTTGGAGAGGTTAGTGGAATTCCGAGTGTAGAGGTGAAATTCGTAGATATTCGGAAGAACACCAGTGGCGAAGGCGACTAACTGGACAAGTACTGACGCTGAGGTACGAAAGCGTGGGGAGCAAACAGG
    >M02146_10_000000000-A51MH_1_1101_13412_1540
    TACGGAGGGAGCTAGCGTTGTTCGGAATTACTGGGCGTAAAGCGTACGTAGGCGGCTATTCAAGTCAGAGGTGAAAGCCCGGGGCTCAACCCCGGAACTGCCTTTGAAACTAGGTAGCTAGAGTCTTGGAGAGGTTAGTGGAATTCCGAGTGTAGAGGTGAAATTCGTAGATATTCGGAAGAACACCAGTGGCGAAGGCGACTAACTGGACAAGTACTGACGCTGAGGTACGAAAGCGTGGGGAGCAAACAGG

  • #2
    Originally posted by MercuryMan View Post
    Hello everyone! I realize this question has been asked and answered, but even after reading quite a bit I can't decide what I have or don't have...dang!

    My data is fastq format and was downloaded from BaseSpace for use in third party analysis. I have run "make.contigs" using Mothur 1.31.2 on my first sample. Using both the R1 and R2 files, which I assume are my forward and reverse paired seqs. It ran fine with no errors, but as I read the sequences it really looks like they must have barcodes and/or primers still attached. I keep seeing that Illumina fastq files which have been de-multiplexed should already be trimmed of the barcodes and primers. Is this correct or do I need to somehow come up with an oligios file?

    Below is a sample of the first few reads. Thank you very much for any help you can provide. This is my first run through with Illumina data and my mentor is no help as he is apparently swamped right now.

    Code:
    >M02146_10_000000000-A51MH_1_1101_13422_1525
    TACGGAGGGAGCTAGCGTTGTTCGGAATTACTGGGCGTAAAGCGTACGTAGGCGGCTATTCAAGTCAGAGGTGAAAGCCCGGGGCTCAACCCCGGAACTGCCTTTGAAACTAGGTAGCTAGAGTCTTGGAGAGGTTAGTGGAATTCCGAGTGTAGAGGTGAAATTCGTAGATATTCGGAAGAACACCAGTGGCGAAGGCGACTAACTGGACAAGTACTGACGCTGAGGTACGAAAGCGTGGGGAGCAAACAGG
    >M02146_10_000000000-A51MH_1_1101_13396_1529
    TACGGAGGGAGCTAGCGTTGTTCGGAATTACTGGGCGTAAAGCGTACGTAGGCGGCTATTCAAGTCAGAGGTGAAAGCCCGGGGCTCAACCCCGGAACTGCCTTTGAAACTAGGTAGCTAGAGTCTTGGAGAGGTTAGTGGAATTCCGAGTGTAGAGGTGAAATTCGTAGATATTCGGAAGAACACCAGTGGCGAAGGCGACTAACTGGACAAGTACTGACGCTGAGGTACGAAAGCGTGGGGAGCAAACAGG
    >M02146_10_000000000-A51MH_1_1101_13412_1540
    TACGGAGGGAGCTAGCGTTGTTCGGAATTACTGGGCGTAAAGCGTACGTAGGCGGCTATTCAAGTCAGAGGTGAAAGCCCGGGGCTCAACCCCGGAACTGCCTTTGAAACTAGGTAGCTAGAGTCTTGGAGAGGTTAGTGGAATTCCGAGTGTAGAGGTGAAATTCGTAGATATTCGGAAGAACACCAGTGGCGAAGGCGACTAACTGGACAAGTACTGACGCTGAGGTACGAAAGCGTGGGGAGCAAACAGG
    MM,

    Why do you believe that your reads still have barcodes? Because you see sequences in common in all your reads? The fact that you are processing this data set with Mothur tells me that this is a 16S data set. BLASTing these three reads confirms that they are (nearly) perfect matches to the V4 region of the 16S rRNA gene, no Illumina barcodes or adapters in sight.

    If you're worried because they all look identical to each other don't be, that is the expected outcome of an amplicon sequencing experiment.

    Comment


    • #3
      Thanks kmcarr! You confirmed this for me. I had BLASTed as well and was comforted by the V4 report, but the MiSeq administrator on my campus is notoriously vague and hard to get an answer from. I think he intentionally sabotaged some 454 data he gave me to determine whether or not I could sort out the problem on my own (which I did thank god!).

      Just two days ago my major Professor decided (since he's been using it to assess my data as well without telling me) that I should use USEARCH to analyse my data. This would put me back at square one after I've spent the last 2 months getting familiar with Mothur.

      So my question is...what software would you use to analyse 16S microbial metagenomic data? I have 16 separate samples from 8 sites (2 extractions per site), and I hope to do a full analysis including alpha and beta diversity, differential abundance and probably a few other statistical tests.

      I welcome any comments/advice from anyone who has used more than one pipeline and conducted such an analysis.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      25 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      28 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      24 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      52 views
      0 likes
      Last Post seqadmin  
      Working...
      X