Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Anomalies in multiplexed whole transcriptome analysis

    Hi, long time reader, first time poster. Our lab has been banging our collective heads against some bizarre results for the past several weeks, and we're hoping that the good folks here will be interested in helping us figure out what's going on here.

    Some background: we have four cell lines, prepped for sequencing using the SOLiD Whole Transcriptome Analysis Kit (SWTAK), with separate barcodes for each library prepared including the libraries made using Adapter Mix A and Mix B. Results were mapped using Bowtie and are being analyzed using SAMTools and IGV.

    Sequencing was done by a facility using SOLiD for the first time, and the samples were prepped by someone using that kit for the first time, so there's a chance that errors could have been introduced at just about any point in the workflow. We're trying to figure out exactly what happened.

    I've attached a picture that encapsulates more or less all the problems we've discovered with the data. Specifically:

    1. In almost all areas, reads from the A and B libraries of the same cell line map to the same strand, either + or -. In areas of low coverage, there's occasional mixing of strand assignment, but in high coverage areas (>200-fold, often thousands of folds), it's always the same strand. Shouldn't they map to opposite strands?

    2. We have evidence that some kind of mis-barcoding might have happened, in that we observe SNPs occurring in, for example, the B library of one cell line, and again in the B library of another cell line, but not in their A libraries. It would be straightforward if it was the same cell lines, but it's not - sometimes 654RMB will match with JHPB, and other times 701054A will match JHPA. We've observed this in a number of locations, always with high coverage.

    3. Though we have eight libraries, areas of high coverage only show two patterns of coverage. The 654RM and 701054 cell lines appear similar for both their A and B libraries, and the FD123 and JHP libraries do the same with each other. Similarly, we find SNPs common to both the A and B libraries of both 654RM and 701054 cell lines quite often, and similarly with the FD123 and JHP libraries.

    As you might guess, this has been driving us crazy. The possibilities for the errors are endless, but also self-contradictory. If the samples had been mis-barcoded, why do the SNPs mismatch in some places but not others? If the samples were mixed or otherwise contaminated, why do they appear to have such common coverage patterns?

    I found a conversation in this thread (link) pointing out that the B mix can introduce errors in the first hexamer sequenced, so I'm currently thinking that a valid way of detecting whether an A or B mix had been used would be to do a frequency analysis of errors in the reads. Does this sound like a reasonable place to start?

    Thanks for you time and consideration!
    Bob
    Attached Files

  • #2
    Long time reader, first time poster.

    If you don't contribute, why should we?

    Anyway, I'll think about this but certainly not a high priority.

    Comment


    • #3
      I don't contribute because quite honestly this is all very new to me, and I really don't have any experience to share. I said "long-time reader, first time poster" as a reference to "long time listener, first time caller", an attempt at what we call "humor" on my planet. Go ahead and check my user profile, you'll notice that I've actually only been here since the end of February, and my lurking activity has been largely relegated to figuring this stuff out. If throwing my uninformed an likely unhelpful opinions around on this forum will somehow raise my esteem in your eyes, I'm happy to do that, but I somehow doubt anyone else would approve.

      Comment


      • #4
        This is new to all of us. With a few exceptions I haven't seen arrogant know it alls (who truly know the least) scoffing at anyone's lack of experience.

        I hate it when people don't share information. I've made a fool of myself endless times and have learned from it. Hopefully others have too.

        Comment


        • #5
          Tell you what, as soon as I have some information to share, I'll do so. Until then, I'm going to stop responding to personal criticisms and hopefully have a more productive discussion.

          Comment


          • #6
            Bob,

            Please don't take NGS's attitude as reflective of this community. He's established himself as one of our members that enjoys making snide comments at pretty much everyone (myself included). In contrast, I welcome anyone who chooses to lurk or chooses to post, even if it's their first question. The site is a resource, feel free to use it however it helps you.

            NGS,

            I'm happy to have you (complete with crankiness) here...I would ask you to refrain from posting argumentative responses that will discourage new users from participating. Thanks.

            Let's return to the fun stuff...sequencing anomalies!

            edit: Now that I've taken the time to read bob's post, it's a pretty darn good example of a first post. It even includes proof that he's been searching the forums for an answer.

            Comment


            • #7
              The picture is not legible but the fact that SNPs from the same cell lines are not consistent between 2 libraries indicates a sample mix up or mis-barcoding, IMHO.

              However, regions of high coverage are often highly repetitive regions and as such subject to errors in assembly.

              If it were me I would make my own libraries since it's difficult to trouble shoot someone elses probable lab errors.

              Comment


              • #8
                FYI
                My complaint was about people not contributing but lurking. (The only other complaint I've ever made which is referenced above was that people with commercial interests should disclose them clearly).

                Comment


                • #9
                  I went ahead and did the same base pair mismatch analysis done by the user hingamp, and came up with some results that explain at least the first point of oddness. The first two cell lines all show an error pattern consistent with errors from preparation with the B adapter mix, even though they're labeled as both A and B mixes.

                  Hopefully these pictures will be legible...
                  Graph 1
                  Graph 2

                  So as far as I can tell, either the barcoding was done incorrectly, or the prep was done incorrectly.

                  Comment


                  • #10
                    I don't know anything about these cell lines and what their genome size is but can you skip barcoding your library and run separate octets or quads?
                    Obviously you will need to make a new library to do this but it looks like you probably will anyway.

                    Comment


                    • #11
                      Originally posted by martian_bob View Post
                      I went ahead and did the same base pair mismatch analysis done by the user hingamp, and came up with some results that explain at least the first point of oddness. The first two cell lines all show an error pattern consistent with errors from preparation with the B adapter mix, even though they're labeled as both A and B mixes.

                      Hopefully these pictures will be legible...
                      Graph 1
                      Graph 2

                      So as far as I can tell, either the barcoding was done incorrectly, or the prep was done incorrectly.
                      Yes, the samples were mixed up somewhere along the line.

                      Various possibilities for how to handle this. But I would encourage you to discuss the issue with your core. They may be unaware they have issues with sample tracking and may be able to implement procedures that will prevent them from occurring again.

                      If you want to delve more into the data you might be able to find SNPs or other characteristic features that allow you to unambiguously identify your cell lines. Now this does begin to encroach into the territory of what I call "the Bennetzen Dictum": "Don't waste clean thoughts on dirty data." But if you were able to identify characteristic features, then you have an embedded positive control in all of your future experiments with these cell lines.

                      --
                      Phillip

                      Comment


                      • #12
                        Originally posted by NextGenSeq View Post
                        FYI
                        My complaint was about people not contributing but lurking. (The only other complaint I've ever made which is referenced above was that people with commercial interests should disclose them clearly).
                        Dude. This isn't an intervention or anything. But c'mon, you come off as cranky. Not that I'm chastising you for laying down a negative vibe, or anything. It isn't an empty crankiness you display. But just on the level of "know thyself", I think you should click on your handle in any of your posts and then on "Find More Posts by NextGenSeq". Just on the first page I see:

                        454 was the first to market and thus the oldest technology. It's also the most expensive.
                        Illumina is the current market leader. I personally wouldn't even consider purchasing a 454 system.
                        What a terrible name. It sounds like a faith based bible chapter than a logical scientific based tool. Did LiCor develop it?
                        Thank god I never took a job there. After talking with them it was pretty obvious they wouldn't make it. I doubt PacBio will do much better but they can at least learn from Helicos's mistakes.
                        I would use Ubuntu. I hate Red Hat.
                        and, well:

                        This is new to all of us. With a few exceptions I haven't seen arrogant know it alls (who truly know the least) scoffing at anyone's lack of experience.

                        I hate it when people don't share...

                        Again, fine by me. Lash out. Let your crank-flag fly. Embrace that inner crank, don't deny it...

                        --
                        Phillip

                        Comment


                        • #13
                          Originally posted by pmiguel View Post
                          Now this does begin to encroach into the territory of what I call "the Bennetzen Dictum": "Don't waste clean thoughts on dirty data."
                          That's a terrific turn of phrase, I'm going to get that framed and put up in the lab.

                          In the end, we contracted with an outside company (EdgeBio) to redo the sequencing, and they did a terrific job. Excellent sequencing quality and quantity. We're still not sure exactly where the error was introduced, but we're all a lot more careful these days.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Current Approaches to Protein Sequencing
                            by seqadmin


                            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                            04-04-2024, 04:25 PM
                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 04-11-2024, 12:08 PM
                          0 responses
                          31 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 10:19 PM
                          0 responses
                          32 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 09:21 AM
                          0 responses
                          28 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-04-2024, 09:00 AM
                          0 responses
                          53 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X