Thank you for your submission. Please watch the video below
Speaker 1: The broadcast is now starting. All attendees are in listen-only mode.
Ted: Hello, everyone. Welcome to our webinar, “The Four Most Common Problems in Multiplex Panel Design.” My name is Ted Dacko and I’m gonna serve as your host today, and the webinar will be presented by John SantaLucia.
John: Hi, everyone.
Ted: Okay. To begin with, let’s do some housekeeping issues. All of you are in listen-only mode to cut down the background noise. We ask you to please type questions in on the right hand side dialogue box. As the webinar is going on, we will compile and do the best we can to get back to answer those questions at the end of the webinar, and if we don’t, we’ll get back to you individually. If you have a question on an individual slide, please make sure you make note of the slide number in your question. The slides from this presentation will be sent to all participants and a recorded version of the webinar should be available no later than Monday of this week. We actually hope to have it by Friday. If you’re interested in anything that you hear today, we would be happy to follow up with you separately.
Here is our agenda. I’d like to introduce the speaker in just a second. Then I would like to tell you a little about our sponsor today, DNA Software. Then we will cover the four problems of Multiplex, we’ll provide a solution for the four problems of Multiplex, and actually provide you with a demonstration of what a solution might look like. Then we’ll talk about the benefits. Then we’ll to a quick summary for you, and then we will answer your questions and we’ll talk about next steps.
I’d like to introduce our speaker, John SantaLucia. John has been a professor at Wayne State University for over 23 years. He is a world renowned biophysical chemist with over 50 papers and 1 book. He served on numerous government panels and he’s been a consultant to numerous companies as well, many of which are on the phone today. And I do need to point out that we have well over 100 people registered for this webinar, so this is a very interesting topic. He measured DNA nearest-neighbor thermodynamic parameters and he is the co-inventor of qPCR CopyCount. He founded DNA software in 2000. At the first ten years, he was the Chief Science Officer, and for the past seven years he has been the President and CEO. John, welcome.
John: Thank you, Ted.
Ted: John, my first question to you is, can you tell the audience a little bit about DNA Software and why you founded the company?
John: Sure, Ted. Back when I was a professor, we were studying the fundamental aspects of DNA folding and DNA hybridization, and mismatch thermodynamic, and we started getting a lot of phone calls from different companies and academic labs asking if I could consult with them to help solve their problems. It became obvious to me early on that there was a commercial need to solve the problems that people had and I couldn’t do it in my academic lab, so we started DNA Software to try to meet those needs. All right, that’s all I wanted to say about DNA Software for now. We’ll get to share more as we go throughout the seminar today.
Ted: All right. Can you tell us what these four problems of Multiplex PCR are?
John: Sure. Here is the, first of all, the applications of Multiplex PCR and then I’ll share with you how that [dogtails inaudible 00:04:02] into the problems. The first application of Multiplex PCR is for detection of infectious disease. A lot of folks want to design panels of tests that will detect more than one pathogen at a time, such as for example, upper respiratory pathogens or a group of messenger RNA found in human blood, et cetera. The key there is to be able to account for target variation and that’s called consensus design. There you want to be able to distinguish target of interest from near-neighbors and background sequences.
Another application of Multiplex PCR is for target enrichment. Here you might have something like 100 genes that are cancer causing genes or involved in pathways involving cancer. You want to enrich those targets for Next-Generation Sequencing. That type of application involves very high levels of multiplexing, often from 100-plex to higher levels of amplification needed. There the trick is to get even amplification of all of the targets.
And lastly, there is the need to do string genotyping, particularly for SNPs or determining the strain of viruses and bacteria, for example. There there’s a need to not only detect genomes of different viruses and bacteria, but also to distinguish them. I wanted to summarize this, I think we’ll be convincing you this today, I don’t think I need to make a hard sell on in this, designing Multiplex PCR is hard, and we want to show you that we appreciate the nuances of that field.
Here are the four most common problems, to answer your question, Ted. The first is the problem of false negatives. That is a test where a person has a particular disease, for example, and the test is negative despite the fact that they have the disease. Those are caused by a variety of things that we’ll be going over, what causes false negatives. False positives occur when a person does not have a disease or the sample is not present in the test tube, but the test says that it is. That’s a false positive. We’ll be showing why that arises and what the solutions are to that problem.
The third problem is the problem of coverage. This has to do with if you have a lot of different variants of a target of interest or if you have many different targets that you’re trying to Multiplex. How do you know that you can detect all of those different targets at the same time? The first three topics really are technical in nature. The last topic about company resource constraints is more physical. It has to do with … Well, we’ll cover that when we get to it.
Ted: Which of these four, in your mind, is the biggest challenge for organizations?
John: Well actually, the last one. The company resource constraints is the largest problem, because often times people don’t realize what’s possible and don’t realize that they have a need that they to do sell.
Ted: Okay, well let’s get into it.
John: All right. Okay, the first topic is about false negatives and that has to do with sensitivity of assays. What are the major causes of false negatives? It’s widely known this first cause, A, is that target secondary structure can inhibit primer binding, and that’s particularly true for RNA targets, but often true as well for DNA targets. We’ll cover that topic in detail. In addition to that, there is a number of other causes of false negatives. Those can be everything from false amplification due to primer dimers, false amplicons, primer-amplicon interactions, unimolecular extension. All three of these last causes, B, C, and D, are due to polymerase extension and the depletion of primers and NTPs. There’s one other cause of false negatives as well, and that’s sequence variation. We’ll cover that topic in step three when we talk about coverage.
Let’s first talk about target secondary structure. Most users, when they are considering primer hybridization, they use a two-state model and they may even use some of the work that we published many years ago now, like the nearest neighbor model, to predict the difference in energy between the random coil state and the duplex state. It turns out that that model just is not sufficient in order to really predict what’s going. The reason for that is real DNA is not straight like this when it’s in a random coil. In fact, real DNA molecules are folded. Shown on the right hand side here is a more sophisticated model, an N-state model, many states are included here.
Now, what we see is not only do we have … Here’s a hybridization region in green and a two-state model would say, well here’s a hybridization region and here’s a primer, and they want to hybridize as shown on the right hand side here. But in fact, there’s competing equilibria. The target itself can be folded and in fact, that region to which you want to direct your primer can be folded and that folding inhibits the hybridization. You need to pay the energy to break that folding before a primer can bind. Also, the primer itself can form hairpin interactions, for example. Such hairpin interactions can sometimes be innocuous, and other times that hairpin interaction can cause other problems if that hairpin can serve as a template for a polymerase.
As I mentioned here, the problem with this approach, the two-state approach, is that there’s an energetic cost to break secondary structure. That folding is one of the major causes of low sensitivity in single-plex PCR and definitely in multiplex PCR, where it’s much more complex. In particular with multiplex PCR, such folding of DNA can cause the amplification to be uneven. That is, one amplicon amplifies faster than other amplicons. One of the major causes of that is that some of the targets are more folded than others, and if you haven’t folded your target DNA, there would be no way for you to know that.
The solution to this problem is to solve for the amount bound, to solve the coupled equilibria. We have ways at DNA Software to predict all of these equilibria and solve for the amount that is in the duplex state, and that amount is what is ultimately related to your assay sensitivity.
All right, let’s now cover a little bit about the other causes of false negatives. One of them, I mentioned, is the idea of formation of primer dimers. Just by accident, it can happen that two primers have paring at their 3′ prime end and a polymerase would extend those. That is problematic for PCR and multiplex in particular, because it depletes the PCR primers. They get chemically used up and the NTPs get depleted as well, and that can ultimately cause the multiplex reaction to fail.
Another type of interaction, which is particularly important in multiplexing, is primer-amplicon interactions. Shown here in this picture on the upper right side is a target from the Zika virus, but there’s a primer binding site for one of the other targets that’s in your multiplex, let’s say it was Influenza A was also trying to be detected. You have this Influenza A primer binding to that Zika target. That cross-hybridization reaction can completely ruin that amplicon by making it shorter and making it so the probe doesn’t bind, and therefore you get a false negative.
All right, so those are two really important things to account for, and I think the first one, most primer design software account for. The second one is more subtle and harder to do.
This last one is unimolecular extension and I have a question for you to think about. I’ve shown here five different primers. The sequences are similar to each other but they’re all a little bit different, and I have this question, which of these structures do you think is extensible? Now, it’s not obvious at first blush, and the truth is it depends. In fact, all of them are extensible in one way or another and I’ll show you what I mean by that.
Here are some considerations. First of all, some of the primers end with a mismatch at the end, and some mismatches are extensible and other are not. For example, on this first case on the left hand side, there an AA mismatch. AA mismatches are generally not very extensible. Even though this is a thermodynamically stable hairpin, this particular hairpin is unlikely to be problematic for PCR. On the other hand, this primer over here in the middle that differs by one nucleotide. Just change the C at the end, now it’s an AC mismatch. Well, an AC mismatch actually is … Many polymerases find that to be tolerant and they will extend that. Once they extend that primer to the end, that primer has now changed its chemical composition and will no longer be viable in the PCR reaction.
All right, now I said that almost all of these could in fact be substrates for polymerase extension. That would happen if you happen to choose a polymerase that had 3′-exonuclease activity. If your enzyme has exonuclease activity, it will actually chew back and remove some of these dangling end nucleotides until it creates a species that is in fact extensible. That’s something to think about and that becomes very problematic, and we generally don’t recommend enzymes that have 3′-exonuclease.
Lastly, perhaps the most surprising of all is the structure that’s on the right hand side. The right most structure has its five prime end base pair. We wouldn’t normally think that such a structure would extensible by a polymerase, but think about what happens during a PCR reaction. During PCR, the complement of that sequence would be made, and once that complement was made, the complement would be highly likely to fold back on itself and form such a structure. Such structures at the three prime end of the antisense strand, such structures would be a cause for forming amplicon dimers or higher order [inaudible 00:14:34], which also are very problematic. They will shut down the PCR completely.
All right. This brings us now to the second type of problem in PCR that we face, and that is the problem of false positives. False positives are the result of false hybridization occurring in your reaction. For multiplex PCR, designing your primers to be specific is absolutely critical, because in a multiplex PCR there are so many primers and so many amplicons that the possibility of getting cross-hybridization becomes huge and grows very quickly. In fact, it grows exponentially. I will show you in a moment that the tool that most people use for such specificity tests is a BLAST search, and it’s really not the best tool for the job.
The second thing about false positives is the idea, in part C here, that the multiplex is a complex interacting system. False amplicons can involve rare hybridization reactions and weak hybridization reactions, not perfect matches, but hybridizations that involve mismatches or even bulges in them. A tool like BLAST just is not up to that task. What we need is a tool that can scan huge numbers of primers against large numbers of genome databases, and that is a challenging problem.
Let’s take a minute and talk a little bit about an approach that many scientists take, which is to use a BLAST, the program BLAST, to predict their primer specificity. Of course, I don’t need to say that … Explain much about this to this audience. BLAST is one of the most widely used bioinformatics tools in the world and deservedly, it deserves a lot of credit. But it really was not intended for the purpose of detecting primer specificity. Instead, it was really meant to determine sequence similarity and to infer common evolutionary ancestry, and that’s wonderful for that function. But the algorithm really has some problems with it for the purpose of primer specificity.
So what’s wrong with using BLAST to predict cross-hybridization? Well, the fundamental problem is that BLAST searches based on sequence similarity, not sequence complementary. Now, that deficiency immediately leads to work around that all users of BLAST have to do. What they do is they, “Well, since BLAST can’t find duplex complementarity, what I’ll do is take the complement of my oligo and then take that complement and use BLAST to find similarities to the complements.” Now, this work around produces five big problems.
The first is BLAST gives the wrong ranking of hits ’cause it’s based on an evolutionary scoring model, not based on thermodynamics of hybridization. Furthermore, BLAST misses about 80% of the thermodynamically stable hits, so many of the things that could cause a false amplification reaction in your reaction are not caught by the BLAST search. Further, BLAST is often used against a large database like the nucleotide database collection. As a result of that, it gives too many irrelevant hits. So it misses the hits that you’re interested in and need, and it gives you hits that you don’t care about, so you get a deluge of information. Now further, BLAST is not distinguished between hits that are extensible by polymerase versus those that are not. It also does not have the detection of amplicons or the detection of multiplexing.
All right, so my take home message here is that sequence similarity is not the same thing as thermodynamic stability, and that’s very easy to illustrate. For example, that work around of assuming that you’re going to take the complement of your oligo and scan it against using BLAST to find similarities, that is a tantamount to assuming that GC base pairs are equal in stability to AT base pairs, which we know is not correct. Of course GC base pairs are generally more stable than AT base pairs. In fact, to accurately predict melting temperature, delta G, thermodynamics, things of that nature, you need a nearest neighbor model.
In addition, BLAST scores all mismatches the same. All right. It just says, “Hey, there was a mutation here.” It doesn’t know that different mismatches are different, so a GT mismatch, which is known to be stabilizing, is scored by BLAST the same as CC mismatch would be. And of course, that is not true. Those differ in thermodynamic equilibrium concept by more than a factor of 4,000 in equilibrium concept, so that’s a huge effect. BLAST also scores gaps incorrectly from a thermodynamic perspective. It’s thinking about them as insertions and deletion events, and we’re thinking about them as unpaired nucleotides and the thermodynamics thereof.
Dangling ends, which are the extra nucleotides at the ends of a base pair duplex, also contribute significantly, almost as much as a full base pair. Those dangling end effects are completely neglected by BLAST. BLAST doesn’t include different rules for DNA-RNA hybridization versus DNA-DNA. All right. It doesn’t include effects for salt and temperature, and it has the wrong rules for multiple mismatches and the location of mismatches are not accounted for differently. But the most important negative is that a BLAST search has a minimum word length of seven consecutive perfect matches. If your hybridization does not contain seven consecutive perfect matches, then it will not be detected. Here are some examples of three extremely stable thermodynamic hits that BLAST would completely miss. It’d be completely blind to these due to that minimum word length limitation.
Well, we’ve developed a solution for to this problem called ThermoBLAST. ThermoBLAST really … It has a sort of … The speed of BLAST and data base capabilities of BLAST and more, but it is really based on a completely different set of algorithms. It’s using thermodynamics based scoring. They have a completely different method for finding the seeds and the extension algorithm. It has the ability to incorporate genome playlists and we’ve now included massive cloud computing to allow this to be run with parallel computing. We have a nice genome viewer and it automatically detects amplicons. The take home message here is ThermoBLAST ranks hits based on thermodynamic affinity rather similarity. That’s a key part. What we’ll be sharing with you later on, the reason I went through this course, was to show you that ThermoBLAST is now integrated into other solutions that we’ll be sharing with you later.
All right. While we’re on the topic of ThermoBLAST, we have this thing called a playlist that I wanted to share with you. There’s three different types of playlists that I’ll be showing you. One are called the Inclusivity Playlist. This is the list of variants of the genomes that you want to detect and that you want your primers to cover. All right, so that’s called the Inclusivity list. We’re gonna be using the Zika virus as a example in our demo, so this is a good place where we had 168 different Zika genomes. Now in addition … This is a place where by putting in the things that you want to detect, we’re gonna use ThermoBLAST to determine the coverage of a set of primers.
Now alternatively, ThermoBLAST can be used to determine false positives. If you take a set of primers and scan them against viruses and bacteria that might cause a false positive in your assay, then you would find these false positives. ThermoBLAST is wonderful for this. It’s very easy to make collections of near neighbor viruses like a Dengue fever virus is also a Flavivirus, like the Zika virus is. Their sequences can be quite related to each other, so we can make a collection of Dengue fever viruses. We run ThermoBLAST to make sure that none of our primers for Zika bind to Dengue fever, and then none of our Dengue fever viruses bind to Zika primers. All right, so this is a way to determine false positives.
Another type of false positive is there because of the possibility of a background genome. You’re doing your genetic tests for the presence of a Zika virus infection, but the human genome is present in the sample that you took from the person. You should check your primers to be sure that they don’t bind in the human genome or human RefSeq, or other unrelated fever causing viruses. For example, the Chikungunya virus or the Toga virus, not that closely related to Flavivirus, but closely enough that you may get a hit, and you might want to put such viruses into a background playlist. All right, we’ll come back to this slide when we go to do the demo.
All right, the third cause of multiplex problems is the issue of coverage. I want to make a definition here. Two different types of coverage. The first type of coverage is coverage of different variants of the target, and that we’ve called consensus design. We want a single set of primers or two sets of primers, or three sets of primers, that will bind to all variants of the given target. In our example I will be using consensus where I want to design primers, a minimal number of primers to amplify all the 168 Zika viruses.
Another type of coverage is the problem when you have multiple very different targets. For example, suppose you wanted to amplify 100 different genes from the human genome simultaneously. Well, they’re all very different from each other and that would be a high level multiplex. We’ll cover that one as well.
All right. First of all, let’s talk about the idea of consensus design. Traditionally, people would use the multiple sequence alignment algorithm to take the sequence variants, and they would try to use the MSA algorithm to identify the conserved regions. We’ll see why that is not a very good approach. All right.
And fundamentally, what they’re trying to do there when they do that is answer the question, “Well, okay. I want to design a new test for human papillomavirus. Where should I target the design of my oligos in that huge virus?” Well, you would try to use conservation to do that, and we’ll show why that doesn’t work very well. Another question comes up about what does it mean to be “covered”? All right, we’ll go through a little bit about what those criteria would be for being covered. Lastly, on the topic of multiplexing, how do you get all of the primers to work well together? You can see there’s a lot of things here that we need to think about.
All right, so what’s wrong with using a multiple sequence alignment, which I would say for most people, it would be their first go to method for trying to do a consensus design. One problem is the computational problem. The sequences that are present in GenBank are just growing exponentially every year, and the multiple sequence alignment algorithms do no scale well for large databases, both in terms of length of the sequences and the number of sequences. Most MSA algorithms just plain can’t do 1,000 different sequences that are the full length genome, they just can’t do it. They could do regions and things like that, so that gets immediately … That’s limiting.
Now, another problems is that the pairwise alignments themselves that are used to make the multiple sequence alignment, those are poor, and you can tell they’re poor immediately. Look at any multiple sequence alignment and look in the protein coding regions. Every single place where you see a single nucleotide insertion or deletion, or two nucleotide insertion or deletion, you immediately know that that alignment cannot be correct, because they should be using triplet codons in order … So you either have triplet insertions or triplet deletions in coding regions. You see that very commonly and that is just telling you the alignments are junk.
All right, so sequence similarity is the wrong metric. One of the reasons why the MSA alignments are not very good is that nucleotide sequences information is poor. You only have four different letters, A, C, G, and T, and it’s just very hard. It’s just a lot of sequence variation, particularly in viruses and bacteria. A lot of sequence variation and the MSAs just don’t work that well, particularly for primer design.
All right, so what does it mean to be covered? Inclusivity playlist contains sequence variants, so what are our criteria for whether a primer will bind to all of the members, or any particular member, in the Inclusivity? Well to answer that question, we need to know a lot more about polymerase extensibility rules. What hybridization lead to extensibility? What mismatches are tolerable and yet retain extensibility and also high efficiency of amplification. Well, those are things that most users don’t know, and those are things that we’ve been investigating experimentally at DNA Software for years now, and we’ve incorporated those things into our software. All right, I’ll be showing you already that BLAST is the wrong approach for such problems, multiple sequence alignments the wrong approach, what is the right approach?
Again, ThermoBLAST is very good at properly computing inclusivity coverage. It uses a proper thermodynamic scoring for duplex complementarity, it analyzes hits for polymerase extensibility, it automatically detects the amplicons that are created by pairs of primers. And we get … Output [let inaudible 00:28:56] is shown here, for example, where we can see … For all the members of an Inclusivity set, we can see how the primers that we designed cover them, and we can see the locations where there are mismatches. The primer designs have been optimized to put these mismatches in places that are tolerable by polymerase.
All right. Lastly is the idea of getting everything to play well together. In Multiplex PCR, you have all these primers, they can interact in unpredictable ways, unpredictable for a human at least. The optimization is a multidimensional landscape. You are varying different primers and different concentrations, and all kinds of things that are very hard for a human to keep in their mind at once. No one can keep in mind a million different primers, hybridizing at different locations, and trying to find a combination thereof that work well together. The iterative empirical approach is sub-optimal. I’ll be recovering that in a moment. What we need is a 21st century approach to solve this kind of problem.
Let’s take a look for a moment at the empirical approach that most researchers resort to mainly ’cause they don’t have any other alternative. The first approach is the empirical approach, so you say, “All right, let’s start with optimization of individual singleplexes. Let’s make primers for each one of our individual targets. Then we’ll try to combine the singleplexes into smaller multiplexes, successfully making a multiplex larger and larger. As we make the multiplex larger, when we see a problem, we’ll fix the singleplex that doesn’t play well together. This problem is a one dimensional search at a time. All right. The problem is you’re making changes to the system without knowing why the failures occurred for each of those primers.
Okay, well this approach typically takes a PhD level scientist with several cognitions and asso-
-all the primers start failing, and it’s just mystifying, and it becomes a Whack-A-Mole type of problem where you solve one problem and boom, another pops up. You solve that one, a different one pops up. This is the consequence of the sort of one dimensional linear approach to try to solve one primer set at a time. All right, and the problem with this is Multiplex PCR is not a linear system, it’s a complex system with many interacting variables. Cross-hybridizations causes artifacts. Individual PCRs are optimized to work at different conditions, not under the one universal condition needed for the multiplex. Furthermore, amplicons are amplified at different rates because of the folding that I mentioned at the beginning of the talk.
All right. Let’s think about the computational scale of multiplexing. How difficult is it not only for humans to do in a laboratory, how difficult is it for a computer to do it? Well, this multiplexing involves finding sets of primers that are mutually compatible, where compatible here means finding primers that amplify with similar efficiency and do not form false amplicons. That is a really tough thing to achieve. For example, suppose you had a 20-plex, 20 different panels, and you wanted to design 10 primer pairs for each of those 20 panels in the hope that one of those primer pairs might play well together with each other, with the other members of the 20-plex.
To do that, you would have 200 forward primer candidates and 200 revers primer candidates. You might say, “Oh well, 200 primers, that’s not too bad.” How many combinations of multiplex are there? With that level of multiplexing, the number of possible combinations of multiplex is 10 to the 20th power. 10 to the 20th. Think Avogadro’s number, all right? Huge number of possibilities, many more than any group of humans could ever try, many more than most computers could try nowadays. What we need here is not brute force, we need an elegant algorithm that can solve this exponential explosion that occurs when you go to larger multiplexes.
All right. I might add that each of those members, each of those 10 to the 20th possibilities, would require running a separate ThermoBLAST in order to check for the false amplification. That’s even worse than you might think to bring this all together.
All right. Now we get to our fourth and last problem of multiplex PCR, and that is company resource constraints. I believe that this is really the number one cause of failure in multiplex PCR. Many users get locked into a paradigm. They use the wrong tools, they’re using freeware or tools that are inappropriate for the need, or they have … They don’t … Take a wrong approach. All right. They use empirical optimization. Why? Because that’s what they’ve done in the past and they’re comfortable with it, but it leads to sub-optimal results or complete failure.
Another problem with company resource constraint here is the idea that they have a lack of knowledge. A team may have strengths in some scientific areas but not strengths in all of the areas that are needed in order to solve the multiplex problem. Think about just the things I’ve covered in the lecture today, we talked about thermodynamics of hybridization and folding, we talked about setting cutoff for delta Gs, and what is a extensible hit versus not extensible hit? Well, that’s a great deal of knowledge that many groups just don’t have expertise or knowledge. At DNA Software, we’ve compiled a great body of expertise in biophysics and kinetics, thermodynamics, computer science, simulation, optimization, and engineering. These are skills that either are lacking in many organizations or even if they’re present, it would take years for them to develop the solution that would utilize the experience.
Lastly, the computer infrastructure. We wanted to take a 21st century approach that didn’t use just the desktop computing or laptop computing, but took advantage of cluster computer to solve things in ways that were not possible even five years ago. My message here is, don’t do it alone, get help. DNA Software can help you with your problems.
All right. Here is a kind of typical treadmill that many, many research groups get stuck in, and they think that they’re saving money by using freeware. All right? I think that everyone knows that nothing is free in life and in fact, the design freeware is very, very expensive to your group. Think about this cycle of design here. You start off with your design freeware and immediately you waste time and money trying to get the output from one software go into the next software package that you’re to use, trying to shoehorn your problem into the capabilities of those software. Once you even get through that step, you get some design results but they’re crappy results. How do you really know that those are gonna work? Well you know they’re not. You’re gonna take ’em and maybe do an additional step or running BLAST on them to see if they’re specific. We’ve shown that that is really not the right approach. But you go ahead, you order the oligo, you spend more money doing that, particularly on the labeled probes and primers.
Last, you do the experimental testing, which is a big amount of money on just all of your team doing all of that work to show that those PCRs don’t work. So you go through the classic wash, rinse, repeat cycle over and over and over again, not making a lot of progress. If you’re wondering how much money you’re wasting on your process, you can go to our … Actually, you should call us. On our web page, we have this return on investment calculator, ROI calculator. We’ll work with you to assess what you’re spending currently if you don’t really realize how much you’re spending on your PCR design, and we can show you how using PanelPlex, which I’ll show in a moment, can help solve your problems.
All right. Now, we’re done with the four problems of multiplex PCR. I’ve told you … I think you can appreciate the level of difficulty, now lets try to show you a solution to this problem.
All right. Before I show you PanelPlex, I want to say just a few words about it. PanelPlex solves all four of the problems that I’ve talked to you today about multiplex design. It’s an integrated solution. It works, you do not need to be an expert in thermodynamics or an expert in kinetics, or know all those rules about polymerase extensibility that I talked about today. Instead, you can use PanelPlex to solve your problems. PanelPlex itself consists of four design modules. All right. They’re all integrated into one simple to use user interface. The first engine is called the Designer engine. The Designer engine is what accounts for all the folding and hybridization reactions, and it simulates all of those and it solves to find all the false negative problems that we talked about, and finds the sweet spots for design.
ThermoBLAST is integrated as a part of PanelPlex. It’s integrated to determine coverage and also false positives, so that helps with both the false negatives and false positive problems that we talked about today. TargAn is a target analysis program that analyzes upfront the inclusivity and exclusivity playlists to find the regions of the targets that are most likely to be amenable to design. This part of the program I didn’t get to discuss today, but it’s the replacement for that multiple sequence alignment, which really is not appropriate for designing multiplex.
Lastly, is the MultiPick algorithm. MultiPick combines all of the different singleplex candidates into many different permutations and it is guaranteed to find the top end multiplex solution. Even out of that 10 to the 20th, it’s guaranteed to find the best. All right. It’s not approximate, it’s guaranteed. Now also, I want to say this, I can brag abour this algorithm because I’m not the one who invented this algorithm. This was invented by my team, who I’m very proud of. They used a breadth first pruning algorithm to solve this combinatorial explosion, so it’s computationally tractable. All right. My bottom line message here is think about this, we spend more than 15 years and more than 15 million dollars thinking about and experimenting on Multiplex PCR so you don’t have to.
All right. This brings me to the topic of the demo. I’m gonna show you PanelPlex in a moment, and when I do so, I want to just bring up and remind you about the Zika virus we talked about earlier. I’m going to be loading in the Zika virus genomes as the Inclusivity playlist. I’m going to be loading in the Dengue fever vires and Chikungunya viruses as the Exclusivity playlist, and we’re gonna use the human genome as the background in this case.
Now, there’s one last topic that I’m going to be discussing during the demo that I did not bring up so far, and that’s the issue of a keystone sequence. Will bring that up when we do the demo. All right. I need to … This is a cue for me to copy this keystone, the succession number, which I will use in the demo.
All right. We’re done with the four problems of PCR, now we’re gonna take you to the software to show you the demo. All right, here we are. This is our … I need to make this full screen here. All right, is that good? Yeah, it should be full screen. All right.
This is the login to our DNA Software application portal. It’s easy to register if you haven’t registered already. You come into the portal and you can see our three products that we have: qPCR CopyCount, which is for analysis of PCR to get absolute quantification; ThermoBLAST, which is a stand alone product that you can use for checking primers that you may already have at your organization; and PanelPlex, which is what I’m talking about today. Let’s go ahead and design some solutions with PanelPlex by clicking on the blue button.
Let’s go ahead and now run PanelPlex. PanelPlex has four different steps. First we need to choose our panels, our playlists, the Inclusivity, Exclusivity, and Background, and we to choose this thing called the keystone, which I’ll explain in a moment. Once you’ve entered those, we’ll be talking a little bit about the details of the target, either do we want to design to the whole target or a region of the target? And we’ll give the hybridization condition and any probes modifications.
Here are the basic steps for doing the design in PanelPlex. The first is that we need to choose our panels. I’ve already set up Inclusivity and Exclusivity lists for the Zika virus, so they’re already here. We have a separate webinar that we’ve given on ThermoBLAST. This is the same interface here that we have in ThermoBLAST for creating playlists and for accessing them. For today’s talk, I’m not gonna talk about how to create the playlists, it’s very simple, but we’re not gonna show that and instead just show you the results. Here’s the Inclusivity playlist for Zika virus, for example. It has all 168 Zika viruses and here they are. All of them are listed. You can see them, they’re on several different pages here of Zika virus variants. We’re gonna choose that to be our Inclusivity list for today. It’s a simple add, just hitting ‘Select as Inclusivity Playlist’. All right, just wait until that takes that in. All right.
Now, we have the Exclusivity playlist. I’ve already prepared this Exclusivity playlist for Zika, so let’s go ahead and use that one just to give you a sense. This Zika playlist has 5,443 near neighbors, other Flaviviruses and fever causing viruses that are somewhat related to Zika virus, that we want to make sure don’t cause false positives. For example, we can see some examples of Dengue fever virus, Chikungunya, which is actually not so related, it’s a Togavirus, but it’s still a fever causing virus, and several other viruses that you can see here in this list that are related to Zika virus, that we would not want to cause a false positive. All right, so that has been pre done, so I just choose that as the Exclusivity playlist. All right, I gotta wait until that registers. Give it a second. There we go.
All right, so we’ve now chosen our Inclusivity and Exclusivity. Last step is to choose the background sequence. In this case, I want to make sure that any Zika diagnostic I design does not bind to the human genome, so the human genome, I choose that as my background sequence. I just did a little search here out of all our playlists, I just listed every playlist that involved humans and here they all are. The first on the list is human genome, so I selected that. All right, I’ve selected now my three different playlists, and those are gonna be used by the software to determine the coverage and the background, and all those other features. I’ll say more about that in a moment. All those things we talked about, those four problems, are gonna be addressed by the software.
All right, next is this issue of the keystone. I haven’t talked to you about what a keystone is, but here it is here. There’s a definition here right at the top of this page. The keystone sequence is one of the sequences from the Inclusivity playlist that you want to detect with primers that have no mismatches. Typically, you would choose a reference genome that is biologically or historically considered to be the most important sequence in your list. Now, we have three different ways that you can choose the keystone sequence, but this is the sequence that is gonna be sort of the center of the design and variants thereof will go around that design.
Even if you choose a bad keystone, the software is smart, it will try other keystones as well and compare them to yours. If you hit the ‘Select Keystone’, it will try several different keystones, including yours, to see which one works best. In this case here, I have a starting keystone. That was what I had cued myself to remember. Before I plug in, that was … This is just showing you the complete list of Inclusivity sequences, now I put the succession in it. Here we didn’t have the one that I want to use, so that’s what I’m gonna use as my keystone.
All right, next. All right, so here we are, the target details page. We need to give this job a name. I’m gonna call it the ‘Zika memo two’. All right, so that’s the name of my job. All right lastly, we’re gonna choose what type of detection we want to use. If this was an NGS application, we would choose no probes ’cause you just want to amplify the target. If we want to detect them with a fluorescent probe, like a TaqMan probe, we choose probe here.
Now, the next option is do you want to detect the whole target, in which case the software will check the entire Zika virus, which is over 10,000 nucleotide regions lengths, and it will try to find the best places for design. Or we can choose a design range. Now, we allow design ranges up 40,000 nucleotides. Now in this case, the Zika virus is 10,000 nucleotides. We could say, “Oh, please direct our design to where we know where there’s a gene,” maybe it goes from 50 … Nucleotides 1,500 to nucleotides 2,500. Okay. It would limit the design to that region if you did that.
This is a capability that can be used to, for example, use PanelPlex as a design to a bacterial target. Though bacteria are too big to design the whole target ’cause they’re millions of nucleotides long, but you can narrow your region down to 40,000 nucleotides, which is as big as any gene or gene region that just about anyone would ever want to design to. And certainly, most viruses are shorter than 40,000 nucleotides. You could also use this to do a target in the human genome if you would like. All right, so that’s in on the target details.
Lastly, we have to choose our hybridization condition and I have one that I made up here for Zika. This is just a standard PCR condition in terms of the sodium and magnesium that are present in the solution. Okay lastly, you can assign modifications to your probes, and escape TaqMan probes. I could choose Fluorescein as the five prime end floor and BHQ Quencher as the three prime end quencher. That’s it. You hit ‘Run’ and the job is submitted. At this point, this program is going through millions and millions of computations, and they’re being done on 32 Cps.
All right. You can see the program automatically took us to the results page and here is your … Here’s the project I just started, ‘Zika demo two’. It says ‘Not Started’. It takes about 60 seconds or so to spin up a dedicated 32 core instance on the cloud to solving your problem. It will crunch away on it until it creates your solution. In the process, it’s folding all of those sequence variants of the Zika virus. It’s trying tens of thousands of primer candidates. It’s doing hybridization reactions to those Zika viruses. It’s checking for all of the false positives against all the members of the Exclusivity set, the Dengue fever viruses, West Nile viruses, all the other things you put in that. It’s checking to make sure that those primers don’t bind to the human genome. It’s a massive, massive calculation. No human could ever do the calculations that are present here.
Well, this job takes about eight hours to run and I ran one last night. I started it at 4:00 PM and it was ready this morning when I came in, and I wanted to show you the results. It’s just like those TV commercials where they put it in the oven and then they take out the version that’s already done. I’m gonna show you what the results for Zika virus look like. All right, this takes about a minute ’cause it has to download the results from the cloud for your primer design sets. All right, so this will just take a little bit of time to populate the page with all of the primer designs. Just give it a moment here. I’ve been talking pretty quickly, so people probably need a break anyway. I’m gonna take a sip of water.
All right while we’re waiting for this, we will see that the PanelPlex is able to find 10 different multiplex solutions. To cover those 168 different targets, it required … One set of primers was not sufficient to cover those 168.
All right, here they are. The results are now loaded. We can see here that the first solution it found involves two primer sets, and I’ll highlight them here. Here are the two primer sets that it used to cover all the 168 different sequences. You can see individually, the first set of primers, which I’ll highlight. This first set of primers was able to cover 96% of the Zika variants. You can go in, the software does show you want those variants are that were covered by that primer set. What you can see now is the second set of primers took the cumulative coverage from 96% to 100%. The second set of primers, shown here, by themselves, that set of primers covered 85% of the Zika virus genomes. The two primers together, have been designed to not interfere with each other and to completely solve the problem, and have a cumulative coverage of 100%.
Now, I mentioned that we have a set of not just one solution to the multiplex problem, but here’s a second solution with three primer set were required to solve that one, to get a cumulative of 100%. Here’s one with four different primer sets that were required. Here’s one with three primer sets, et cetera. Here’s one more down here with two primer sets that’s actually an excellent solution as well.
All right, now there are some more details here if you’re a user who’s interested in knowing what is it that is special about these sequences. Well, if you hit ‘View Details’, this is where all that information that I shared with you about the folding of the DNA … Actually, folding of the RNA target, folding of the amplicon. All right. We have a set of scoring metrics here that are giving us the bimolecular folding energies, unimolecular folding energies. All that stuff is integrated into the scoring equation. It’s not a voodoo here, everything is for you where you can see where these scores came from.
You can also look at the coverage here. If you want to see how well do these primers cover the different targets, you hit ‘View Coverage’ link over here. It takes you to this page and it shows you how the primers cover the Inclusivity set. You click over here, you can see how they cover the exclusions that you don’t want it to cover, et cetera.
All right. Now, all of the detailed information you can also download a CSV file. For example, all this information about this solution, you can just hit ‘Download Solution as CSV’ and voilà it will download that CSV file. We can take a look at that file. There it is. I can see that if I want to see it in my downloads. Here it is. I can open up that file and it has all of that sequence information. You can just cut and paste, and order those oligos and test them out. If you want to order a particular solution, you just order across a row like that. Order those oligos, test them, you should be good to go.
All right. Now let’s go back to … That is all we have for the demo for now, so let’s go ahead and go back to the PowerPoint presentation. Let me make that full screen again.
Ted: So what are the benefits, John?
John: Okay. The benefits are that this is going to drastically reduce your assay time from weeks to less than … From a year to less than a week. Most of that week time is not the design, the design’s finished in a few hours. Most of that time is spent with you doing the validation and the much reduced iterations required to get a final diagnostic quality design that has minimal false positives and false negatives.
All right. Then we’ll go ahead and summarize what we told you today. We talked about the four problems that cripple multiplex PCR. We talked about false negatives, false positives, poor coverage of target mutants, and we talked about the lack of organizational resource, and how the solution to these problems requires sophisticated algorithms and appropriate computational resources.
All right. Now, I wanted to mention to you that PanelPlex is an ongoing development. We have finished the part of PanelPlex that is used for doing infectious disease, that’s why I showed the Zika virus as a example today, but it works also for human targets and bacteria. We are right now on the applications for Next-Gen Sequencing, which will involve integrating our MultiPick algorithm, which I talked a little bit about. We expect that that will be released for general use in September of this year. But currently, we can … MultiPick is fully functioning, it’s completed, just hasn’t been integrated yet into PanelPlex. Currently we’re offering MultiPlex design for applications like NGS via a concierge service, what we do as a consulting contract work. Lastly, we have third variant of PanelPlex which we’re scheduling for release in early 2018, that will allow SNP and strain genotyping.
The important point here is no more Whack-A-Mole. You won’t have to go through this successive banging your head against a wall. We’re trying to empirically find your multiplex and optimize it.
Okay, so I’ve been monitoring the questions, John. There’s over 27 questions here, so obviously we’re not gonna cover that in three minutes. You answered-
Ted: Sorry we went over.
John: Two of them right away. One, does DNA Software provide assay designs and the services opposed to purchasing software? And you just talked about the concierge-
John: Service that we offer. Matt asked, “Can PanelPlex be used for NGS” and I think I just heard you say the answer to that, was yes as well. Jessica wants to know … We’re gonna stay on the line a little past the hour for those of you who want to stick around. For those of you who don’t, you will have a recorded version of this whole webinar, so you can skip to the end, and pick up the questions and follow up if you’d like.
Jessica wants to know, “How do we know these designs actually work?” That’s a great question. There is really no other way to test the design … To know if a design works but to go in the lab and try it. However, what I can tell you is we have a vast body of experimentation that we have done to validate that the results work. One of our customers, for example, we were able to design 32 different panels for infectious diseases in the upper respiratory tract, both viruses and bacterial targets, and make those work together in a multiplex. It worked on the first try. All of the oligos worked, 100%. There were a few that … Actually, the customer started getting greedy and wanted us to improve about 10% of the primers. They wanted to make it even better, which was crazy. But we were able to even meet that demand.
We have another customer that asked us to do, it just happened to also be 32, different messenger RNA targets. We’ve been able to design for them a multiplex for 32 different targets and isoforms thereof, massive RNA isoforms, and make it so that they targeted RNA … Targeted specific axon-axon junctions in their RNAs. It’s been extensively validated. Our models have been extensively validated in the lab as well. We are very confident that this is gonna dramatically reduce your effort.
Okay, Rod has three questions. I don’t know which one I should answer because we don’t have time to do all, so I’ll go to the third question, “How do you support the design of assays, including internal amplification controls?” In both of the projects I just mentioned, internal amplification controls was one of the members of the multiplex. I didn’t mention this, but you can implement the control as a fixed part of the design. In other words, it will … Unless you already have a design for your control, then you can input that into the software and it will design all the other members of the multiplex in the presence of that control.
What was the other question?
Ted: That was the only-
Ted: I didn’t ask the other two. Andrew wants to know if the assay designs are experimentally validated.
John: I think I just answered that.
Ted: Oh, okay. Sorry. All right.
Ted: Obviously, John, this is not freeware. This is designed from more complex problems and the reason why I say that is Tom wants to know how’s licensing handled and what’s the cost structure for this?
John: All right. Well, we can work with customers in variety of ways. We’re open to many different business models, but two main ways that folks work with us is either they have … Some users just have a limited number of designs that they want to do, maybe only one. In such a case, it might be best to work with in a concierge model, where we write a quote to you depending on the scale of the project, how big the multiplex is, and how complex it is, and whatever demands you might have for the design of the project, and we would give a quote. A typical quote would be something on the order of $20,000 for a project like that. They can be quite complex and involved, but they can save development teams months and months and months of effort, and we’re gonna get a beautiful design right out of the bat almost every time.
The other way that we work with customers is with an annual license model, which is $5,000 per month.
Ted: For the software.
John: For the software, $60,000 per year. The users can use the engine, the PanelPlex, pretty much as much as they want, and they own the designs and that’s our business model.
Ted: And you train them?
John: Yes, of course. We have extensive documentation and we’re happy to certainly walk people through the first few times they use PanelPlex with our best practices. Also, we can consult if it’s necessary to do higher level things that goes beyond the scope of. We will do simple requests as people need and support things as we go.
Ted: For those of you who have asked questions, we’ll follow up with you individually. In the interest of time, I think it’s time for us to move on. In terms of next steps, and John, I’m not gonna ask you to go to the website, you can visit our website. On the website, on the home page, you can find an interesting video on DNA Software. You’ll see some areas on multiplex design. Under the ThermoBLAST section, you’ll also see another video about problems involved with multiplex.
There will be a white paper available mid next week, that all of you will get a copy of, about the problems of multiplex. A recorded version of this webinar will be available and we will also send that to you. And of course, you can sign up for free calls and consultation on multiplex. Just contact us through the website and we would be more than happy to follow up with you on that. With that, we’d like to thank you for your time and attention in this webinar today, “The Four Most Common Problems in Multiplex Panel Design.” John, I want to thank you for a fascinating presentation. Thank you.
John: Thank you, Ted, and thank you to everyone for attending. I really appreciate it. I always look forward to hearing from you.
Ted: Thank you and have a pleasant afternoon. Goodbye.