“How to Eliminate False Positives from your DNA Diagnostic Assays”
Edited Transcript of the Webinar given December 3, 2014
Presented by: John SantaLucia, President, CEO and co-founder of DNA Software, Inc.
BLAST is one of the most widely used bioinformatics program in the World. BLAST was developed by David Lipman’s group at the National Institutes of Health. The BLAST algorithm is based upon the principle that the closer to organisms are evolutionarily, the more similar their sequences are. BLAST is widely used by biologists to align the query sequence against a vast database of genome sequences to reveal the sequences that are the most similar to the query, which enables researchers to infer the biological function of the query unknown. For that function, BLAST is truly outstanding.
BLAST is also widely used by molecular biologists to predict primer specificity. It’s this last point that we want to talk about today. What’s wrong with using BLAST for predicting cross-hybridization? The main disadvantage of BLAST is that it searches on sequence similarity not sequence complementarity, which is what is needed to scanning a primer or probe against a genome database. Because BLAST cannot search for sequence complementarity, people will commonly use a workaround: “Since BLAST can’t scan for the complement, let’s just take the complement of the oligo that I’m interested in and then use BLAST to find similarities to the complement of my oligo. Now, that procedure right there leads to 5 problems and we’re going to elucidate each one of those problems.
First, BLAST gives the wrong ranking of hits and that’s because it’s scoring based on similarity rather than on thermodynamics.
Second, because of the incorrect ranking of the hits and the wrong scoring method, BLAST misses approximately 80% of the thermodynamically stable hits. We’ll show some examples of that.
Third, another problem with BLAST for determining cross-hybridization is that if you BLAST an oligonucleotide against a large database like “nt” or “nr”, you get many irrelevant hits. For example, if you were trying to scan an oligonucleotide against a human genome, humans are mammals and so that primer would be likely to hit other mammals like the horse genome or the pig genome or the cow genome. Any one of those is evolutionarily related to humans but have no relevance at all to a primer design. Thus, BLAST gives many hits that you don’t care about.
Forth, BLAST does not distinguish between a hit that is extensible versus one that’s non-extensible by a polymerase. We’ll see that that’s very important for the design of primers versus probes.
Lastly, BLAST has no mechanism to detect amplicons so even though you might get a primer that binds somewhere in a genome of interest, what matters is: “Does it form a false amplicon”? ThermoBLAST incorporates an algorithm that detects all possible amplicons for a set of primers, and that is extremely important when you are doing multiplexing. If you have many primers, then the chances of two or more of those primers binding to the genome in opposite orientation to form an amplicon becomes much higher.
Let’s talk about the ranking of results. I mentioned earlier that sequence similarity is not the same as thermodynamic stability. BLAST scores all sequence matches the same so an A match is the same as a C or G or T. However, we know that thermodynamically that G-C base pairs are NOT the same as an A-T base pair. G-Cs are much more stable and furthermore really you need a nearest-neighbor model to accurately predict hybridization. So, right off the bat there is a problem with using the complement and then using BLAST. Another problem is that BLAST scores all mismatches as if they’re the same. BLAST does not understand that some mismatches are stabilizing such as GG, GT, and GA mismatches are all thermodynamically stable mismatches. In contrast, mismatches like A-C, T-C, and C-C mismatches are all destabilizing. For G-G to T-C mismatches, the equilibrium constants range over a factor of more than 1000. So it’s a big effect that not all mismatches are the same. Furthermore, BLAST uses an incorrect model for bulges. BLAST is applying evolutionary penalty for an insertion or deletion in the sequence alignment, which is NOT the same as the thermodynamic penalty for forming a bulge. In addition, BLAST does not include the effects of dangling ends or of different sequence types. Hybridization involving DNA versus RNA are not accounted for, salt and temperature effects, and multiple mismatches are incorrectly scored. All of those are considerations that make the scoring of BLAST compromised for the application of finding cross-hybridization. Furthermore, maybe the most important, is that BLAST has a minimum word length. That word length means that every hit must have at least 7 consecutive perfect matches. If there’s even one mismatch in that window (i.e. word) of seven nucleotides, then BLAST will completely ignore the hit. That becomes extremely important in finding all of the thermodynamically stable hits.
Here’s an illustration of some of the motifs that are neglected by BLAST. Dangling ends: here is a short oligonucleotide hybridized to a much longer genomic DNA. It’s not just the pairing regions that matter; these extra dangling ends also make a contribution that is sequence dependent and can be worth as much as an A-T base pair. In addition, mismatches stability depends not only on the mismatch type and surrounding base pairs, but also they depend where they are in the sequence. A mismatch in the middle of the sequence is not the same as a mismatch at the very end of the sequence or at the penultimate position. All those effects are taken into account by ThermoBLAST as we’ll see later.
Here are some examples of hybridization events that would be missed by BLAST because of the word length limitation. We have a very long complementary section of DNA and the red regions are mismatches. If you look at the sequence you notice it’s extremely thermodynamically stable. The fact that there’s no stretch that’s longer than 7 nucleotides means that these would all be missed. All of these are relatively stable. But they would ALL be missed by BLAST due to the minimum word length of 7.
Now let’s talk about ThermoBLAST-Cloud Edition. We just performed the full public release. ThermoBLAST-Cloud Edition combines the speed and database capabilities of BLAST with a lot of extra goodies. One is that it includes thermodynamics-based scoring instead of similarity-based scoring. We’ve completely rewritten the seeding and extension algorithms that are in BLAST. They’re really completely different … how we do the algorithm for finding the hits. We’ve also added something called “genome playlists”. I’ll give you a whole slide about that topic. ThermoBLAST uses massive cloud computing to allow for quick computation times even against huge genome databases. A new genome viewer application allows you to visualize your results and show the GenBANK annotations so you can see landmarks to verify that your primers are binding in the proper location. Lastly, ThermoBLAST provides automatic detection of the amplicons.
Let’s talk a little bit about genome playlists and what that’s about. The best analogy we can make is to compare ThermoBLAST to iTunes. In iTunes, the smallest unit of songs that you can get is a single song. For us in ThermoBLAST, the equivalent would be a single GenBank accession which contains a single DNA or RNA sequence. A small collection of songs in iTunes is called an album and a collection of songs and albums is called a “playlist”. Similarly, in ThermoBLAST, the concept of a “playlist” applies to small or large collections of GenBank accessions. In ThermoBLAST, a genome constitutes a small collection of accessions that is similar in flavor to an album in iTunes. For example, a human genome has 22 autosomes and an X and Y chromosome. In ThermoBLAST you can also combine multiple accessions and multiple genomes (and even other playlists) into one big playlist. We’ll see why that’s valuable later when we talk about inclusivity and exclusivity panels. One of the nice features about ThermoBLAST is that we’ve already pre-loaded our portal with a whole bunch of playlists that many of our users have already told us have been very useful to them. We already have the latest version of the human genome, the entire human microbiome with all 43,000 GenBank accessions, the entire human RefSeq which is about 98,000 GenBank accessions. We’ve assembled a bacterial diversity panel that contains about 1100 complete bacterial genomes which are useful for finding false positives. There is also a playlist collection of all the soil microbes (contains bacteria, molds, fungi and other eukaryotes) that are currently known. There are extensive playlists for viruses (more than 65,000 entries). We manually curate these database and update them on a semi-monthly basis. One of the nice features about these preloaded playlists is that you can mix and match them and combine them into larger playlists and you can add your own customized playlists to those. Even proprietary sequences that are not found in GenBank can be included into your custom playlist that combines public and proprietary sequences. That gives the user a lot of flexibility. It allows them to do things such as make an inclusivity panel. For example, if you want to design a set of primers that bind to all of the sequence variants of Ebola and yet not bind to anything else, than you could make an inclusivity panel including all of the known Ebola sequences and make an exclusivity panel against everything that you would not want those primers to bind to. For example, you wouldn’t want Ebola primers to bind to any influenza A variants, which initially have symptoms in the clinic similar to Ebola.
This slide shows the concept of an inclusivity versus an exclusivity panel. For example, there are currently 144 known complete Ebola genomes in GenBank, which you can use to make your inclusivity panel, that’s about 2.7 million nucleotides worth of genomic sequence. We can determine for a set of primers how many of them form amplicons in these 144 genomes. We can determine the coverage of how many of the genomes our primers actually will amplify. On the other hand, we want to avoid false positives with our primer design. We can come up with a quite long list of non-Ebola background sequences that could give a false positive to an Ebola. Such as the Dengue fever virus, those are genetically related to Ebola. You might guess that a poor design of an Ebola sequence might hit some of the Dengue viruses. Many patients present in the clinic with symptoms that are similar to Ebola, such as influenza A. We would want to make sure that whatever primers we design don’t hit influenza A or B or other fever-causing viruses like Marburg or Lassa Fever or any of the other viruses for that matter. The human genome would obviously be present in a sample taken from a human to test for Ebola. We want to make sure the human genome doesn’t cause a false positive. We would also check for false positives to microbes in the human microbiome or contaminating soil microbes. All of those can be used to determine if you have any false positive possibilities.
Alright, with that introduction, I’m going to be moving into a demo in just a moment, but I just wanted to briefly show you some screenshots from the software. The results are going to be shown after the user chooses their playlist and enters in their primers. They’ll get a results pages that looks like this. It will have a list of individual hits for each primer and it will have a list of amplicons that it detects.
I just wanted to mention that also in the output it will show us a genome viewer where we’ll see the forward and reverse primers and any probe and you’ll be able to click on and see what the genes are that those are hitting. There’s also a sequence view down here. You can expand it if you want to see more genomic context. There’s also a structure view that allows you to see if your primers are indeed hybridizing the way you intended and there’s a feature to download amplicons including extra sequence if you want.
Slide 16: Web Demo of ThermoBLAST-CE
This brings us to the demonstration. If you would like to poke around and look for yourself at ThermoBLAST you are welcome to do that. Here’s the website for our company (www.dnasoftware.com) and just click on the “Login” button and it will ask you to register. After you have registered, then go ahead and login. It brings us to this “Let’s get started” page. This is the entry into our portal. Currently we have two products on our portal. One is the CopyCount product. This is a product that allows users to take their already existing qPCR data and analyze it to get absolute copy number. That’s a product we won’t be talking about any further today. We’ve already posted a webinar about that if you are interested. There’s another product called ThermoBLAST Cloud Edition. That’s what we are talking about today. I want to take this opportunity to share with you that we’re continuing to build out our portfolio of products. In the future, we will be adding products to this portfolio for primer design, multiplexing for genome enrichment, epigenetic detection, antisense therapeutics, siRNAs, CRISPR design, etc. For now we have ThermoBLAST Cloud Edition and we’ll show you about that. Click on the button to “Check for mishybridization”.
This brings us into the welcome page for ThermoBLAST. You can either view some old results that we’ve previously run or we can run a new job in ThermoBLAST. There’s three basic steps to running a ThermoBLAST job. One is to pick the collection of targets that we want to scan our primers against. These are our playlists. Next we want to enter in our primer and probe sequences. Lastly, choose our salt conditions. Let’s go ahead and show a run here.
The first step is to choose our playlist. There’s three tabs here. We have a list of most popular playlists. If I scroll down a little bit we can see the preloaded playlists. We have one here for the HIV genome. Here’s the most recent version of the human genome. Here is a list of Ebola virus inclusivity panel of 144 sequences. Here are some medically important bacteria–a list of 70 or so bacteria. This playlist called “All Viruses” has about 65,000 complete virus genomes that have been sequenced to date. Here is the human RefSeq with all 98,000 accessions for all the known RNA transcripts in human cells. You can see there’s a variety of these panels that we’ve already pre-set up. Over here it shows your custom playlists that either you can keep private or share with other users. Over here, I made my own private playlist for the mouse RefSeq. I did one for zebrafish and here’s one for Ebola viruses and other viruses, and Dengue fever viruses. I made a variety of other custom playlists including Influenza A viruses, etc. One of the things you might be interested in knowing is what to do if the playlist you want isn’t in the pre-made playlists or maybe you have a proprietary sequence…how do I go about doing that? Well it’s very simple. You just click on the button to “Create a Custom playlist” and you give your playlist a name like “John’s playlist”. Then hit “Add content to this playlist”, we have a variety of ways in which you can add sequences to your playlist. One is you can search NCBI to find a playlist of interest. Another is if you already have a list of accessions, you can click on “Enter Accessions” or you can upload a FASTA file. Or you can combine it with playlists that you previously saved. Let’s go ahead and do it by entering accessions.
Here is an Excel spreadsheet just for purposes of illustration, that has whole columns full of GenBank accessions. Let’s say I wanted to make a playlist for the cancer-causing variants of the human papilloma virus, HPV. So I select those GenBank accessions and just come over here and paste them in. You can paste in a lot. We’ve pasted in up to 100,000 accessions with no problems. Then you hit “Check Accessions”. What that’s doing right now in real time is it’s going to NCBI and checking if those are valid accessions. If you made a typo or something, it would tell you there was something wrong. And there’s a couple of buttons that you need to click through to process those. Maybe I should have given this a better name like HPV Cancer. It shows you the playlist contains human papilloma viruses type 16, type 18, type 33…those are the really important ones that are cancer causing. You can see the GenBank accession numbers here for each of those. You can download the actual GenBANK file for each one if you want by clicking the web links. If there are any of them you particularly don’t want you can hit “remove” to do that. Down here we need to save the playlist one more time. Lastly, here. At this point the server actually has to ping NCBI and go through the process of building that playlist. Which takes a little time. NCBI software has a limit; they can only download 3 sequences per second and they place that limit on all users. Essentially we just wanted to show you the flexibility of being able to load in a new playlist. If I come down here I should be able to see that new playlist, “HPV Cancer”. It‘s currently preparing it. Once it’s ready to go you’ll see a notation like “Add this playlist to your current run”. It will take a few minutes for it to do that. In the meantime, let’s go ahead and give you a sense of how the program actually works. I want to get you to at least run one submission all the way through. Let’s go to the most popular playlists for a moment. Let’s run some primers against this Ebola background. I add this playlist to my current ThermoBLAST run. That contains 144 complete genomes of Ebola. It takes a minute here to communicate with the portal here, but it’s going to come up in this window here we’re going to get a summary here. Ebola background sequences. With that we’d be ready to run it. If you are interested in knowing what’s in there, you can click on the “View Playlist” down here. If you do that, it takes us out of that window briefly and shows us what’s in those 144 sequences in that playlist. It’s got a whole bunch of variants…Sudan and Zaire and lots and lots of them. You can see there’s Cote d’Ivory and Reston and all the different variants of Ebola; its only a total of 2.7 million nucleotides so this is a relatively small playlist.
If I go back to the Playlist page, then I can continue the process of submitting my ThermoBLAST job. I’ve chosen my background genome, the playlist that I want to scan against and I could have chosen a whole bunch of them here if I wanted to. Let’s just move ahead here. Let’s run ThermoBLAST against this background. Then here we can either enter in our sequences manually or we can enter in the sequences in batch mode. I’ll show you in batch mode first and then will go through the exercise of showing it manually as well. If we go to batch mode here, then I browse to find my .csv file called the “Ebola primers.CSV”. These are some primers I just designed quickly. There’s a forward primer, there’s a probe, and a reverse primer. This is a Taqman design I did quickly. The first part of this is just to enter in your sequences. If you wanted to enter in another sequence, let’s just type in some random sequence here. Then down here I can either choose to make that a primer or a probe. I’m going for a moment to show you what would happen if you check “primer” then it automatically declares it as a DNA strand since all primers are made of DNA. But if you would have chosen “probe” here and then the strand type there’ll be different types of strands it will allow you to choose—in this case DNA, RNA, or peptide nucleic acids. That depends on what the makeup is of the genome that you’re going against. Let’s go ahead and show this as a primer. Here’s my made up primer that I just randomly typed in. I mentioned there were three steps. The first step was choose our playlist, the second step was enter in your primers—which you can do manually or in batch mode, and the third and last step is to choose your hybridization conditions.
We can provide a target concentration and we provide a list of pre-entered kit types: —Typical Endpoint PCR or TaqMan Real-Time PCR. Choosing a kit, automatically fills in the details of the annealing temperature, sodium, magnesium, etc. You can also make a customized kit. Let’s say that you have some exotic buffer conditions that you’re doing you hybridization under, you can define a new kit over here. You give it a name, such as “John’s Kit”. We can give it an annealing temperature such as 58 degrees and some exotic sodium like 0.153 M and then magnesium is 0.0134 M. Then we fill out the remaining values for DMSO, TMAC, Betaine, glycerol, etc. Note that it can support all these different denaturants and buffer additives if you have them and know them, you can put them in. Now I have this new “John’s Kit” and that’s permanently saved as one of the choices here in our list. Lastly you just hit run. So that’s it. The job is running. Really it’s as simple as choose your playlist, enter in your primers, choose your solution conditions. If you want to check on the status of your job then go to the “results” page. Here is the job that’s currently running. Ebola background other organisms. You can see the progress bar while it is running. Let me show you while that’s running some results from some other runs that I ran just to show the sort of things that this does.
If I click over here and look at some results. Down here it has a list for each primer, it gives the individual hits for that primer. Here’s the forward primer 1 and here is each of the targets that it hit. It’s melting temperature is displayed for those different targets. You can scroll through the list of hits all day if you want. There could be thousands of hits there sometimes. Up here are the amplicons it found. It took each of the individual hits and then it found all of their positions in the genome and it looked for all cases where there was a forward hybridization event and a reverse hybridization event that were within some window. In this case, the window was 3,000 nucleotides. If those two oppositely pointing primers are within 3,000 nucleotides, we call that an amplicon and we would display it. Now in this case, we were running this against the 144 different Ebola sequences. We have a nice feature here that allows you to download all of those amplicons here so you can see all of the hits in an Excel spreadsheet which we can do just by clicking this. That downloaded the amplicons file into my “Downloads” folder.
Let’s move forward and show you some of these other views for a moment. We have here the genome view; let’s take a look at that. You can see the locations of these primers. Here the two blue arrows are primers; red lines are probes. If you hover over those primers and probes, it tells you what they are. This is the forward primer here; it gives you some information about the hybridization of that forward primer such as the Tm and DeltaG. If I hover over here, it shows me information about the probe and information about the other primer. The intensity of the icons is related to how strongly they’re binding. This reverse primer here is a much weaker hybridization, so its intensity is weaker. If I expand here, I can actually see what direction the primers are pointing in. One’s a forward primer and the other one’s a reverse. We use this little red icon here to indicate a probe sequence. Down here are the GenBank accessions. If you hover over those, this first one is a CDS…a coding sequence. In particular, these primers are binding to an L gene, one of the late onset genes from Ebola and then that gene product is encoding a polymerase protein. These primers are binding in the polymerase gene which is highly conserved among Ebola. Down here is the sequence view so you can see where those primers are binding, and where the probe is binding. You can select sequence from that region if you want to copy and paste. If you zoom out you can sort of see more of the genomic context here. Here’s my primers up here. Here’s my late gene. In that neighborhood is the viral protein 24, glycoprotein, etc. You get some idea about the genomic context of those hits. If I click the structure view, it shows me the actual structures of the hybridization events. Here’s one of them. You can see the 3’ end here. Its simple to blow it up by pushing two mouse keys and sliding upward. You can make it so you can see more of the hybridization or less; you can use the view to see if the 3’-end actually hybridized. If you don’t like the way it looks a little wavy, you can grab a nucleotide here; it will reoptimize the picture to make it look straighter for you.
One of these is the forward primer; one of them is the reverse primer. Over here is the forward primer. So again we can see the 5’- and 3’-ends on those pictures. You can get details about your particular structure, get the melting temperature, delta G up there, and then lastly over here we have our download amplicon feature. If you click on that button it will allow you to download the amplicon. It actually has already done a “control c” copy command to copy the sense strand. If you wanted to get more than just the actual amplicon itself, you can plug in here a padding value of 100. That will put an extra hundred nucleotides on either side of the amplicon. This can be very useful; you’ll see as soon as I click down here, we’ll see it go from 208 to 408. That went to 408 because it’s the 208 nucleotides of the amplicon plus 100 nucleotides of padding on either side. That’s very useful if you’re trying to do primer design of a particular region using some other software.
There are some other nice features I would like to show you. I need to be able to get to my “Downloads” folder. When I click on that button; download amplicons is a csv file, that’s a very nice feature. If I go here and click, then I can see every single one of those hits. Now in this case there was 144 genomes that were present in there; these are different variants of Ebola here. We can count how many are present. In this case there’s exactly 125 Ebola genomes that were hit out of 144. That’s a pretty good set of primers that it would hit that high of a percentage; 125 out of 144. In fact, probably those last remaining 19 genomes probably had sequencing errors in them. This set contains everything including sequences from 1976 the original outbreak. This is in the inclusivity set; you can see how well your primers worked that you designed; do they hit everything that you wanted? On the other hand, if you had an exclusivity panel, you want to make sure that the primers that you designed don’t hit things that would cause a false positive. All of those would have been present in the amplicon hits up here. If your primer set hits anything in your background you would see it there. With that, I’ve pretty much gone through all the features about ThermoBLAST. At this time I’m going to turn it over to Joseph Johnson to talk about pricing and then we’ll take questions from the audience.
Slide 17: Joseph Johnson, VP Business Development, DNA Software, Inc.
We offer ThermoBLAST Cloud Edition in three different pricing tiers. The first of which is the “Basic Seat License”. This includes one user login credential; full access to all the features John just discussed in ThermoBLAST-CE; a limited number of submissions per month—30 submissions per month with a playlist size of 4 billion nucleotides. To put that into perspective, the human genome is 3 billion nucleotides so that would allow you use a playlist with the entire bacterial diversity panel (more than 1100 bacterial genomes) plus the human genome. All these submissions will be run on the Amazon Web Service cloud. One seat license starts at $999 a month or it can be purchased for $9,995 annually.
The next offering is the “Pro Seat license” model. Again, this is for one user login credential; full access to all these features in ThermoBLAST with 100 submissions per month, a playlist size of 20 billion nucleotides; again these submissions will be run on the AWS cloud. If you need larger playlists and multiple users, contact us to get volume discount pricing.
We heard feedback from some customers regarding privacy concerns. To mitigate these concerns, we offer Enterprise licensing. One method is to make a private Amazon cloud server that is only accessible to the customer so they can control the security. Alternatively, we can install ThermoBLAST on your local computer cluster that is behind a firewall. This allows you to have complete control of your own data. This is the highest level of security we can provide. Again, you’ll control your number of CPU instances, the number of submissions and the playlist size. This puts the control in your hands. These licenses are quoted per customer.
Audience Questions: (answered by John SantaLucia)
At this time I would like to address questions from the audience.
First Question: How does ThermoBLAST Cloud Edition compare to the version of Visual OMP in terms of speed and size?
Answer: The version of ThermoBLAST and Visual OMP requires that you load any genomes onto your desktop computer. That step of getting the actual genomic sequence on your desktop and formatting it has been problematic for many of our customers. It does work but it‘s hard to assemble certainly large playlists of genomes. Also, you’re limited by the single CPU on your desktop whereas the Cloud Edition has 16 CPUs and we have multiple instances of 16 CPUs. We’re actually planning to increase that to 32 compute cores on each ThermoBLAST job. So the speed is just way, way faster. You can just imagine much, much more difficult things. For example, we had a customer that we did a biodefense panel for that wanted to design a 30-plex PCR for 30 different bacterial pathogens, so that has 60 primers, and they wanted to run it against everything and the kitchen sink for background. They had essentially every bacterial genome we could get our hands on. We had over 2000 bacterial genomes, the human microbiome, a series of medically important bacteria, eukaryotic pathogens, soil microbes, etc. It was a huge, huge playlist. They wanted to make sure that essentially the primers they were designing for a biodefense panel against 30 different bacteria didn’t give any false positives for other bacteria or other pathogens. The Cloud Edition is much, much more flexible [than Visual OMP] and of course the cloud edition has the genome viewer and several other features such as genome playlists. We’ve also improved the core ThermoBLAST algorithms, and we’re updating that in our Visual OMP as we speak.
Second Question: “The default playlist appears to be DNA; how does one enter an RNA playlist?”
Answer: When we are choosing the sequences, if you give them as GenBank accession numbers, then it automatically assigns the sequence type based on whatever GenBank has set it to. You can also change that within the program. You have to click on the blue button “Create a new playlist”. From within that, once you’ve created a playlist, you can change the sequence type from DNA to RNA. It also allows you to indicate whether its single-stranded or double-stranded. For playlists, we only support DNA and RNA strand types, since that’s all that’s present in biology. For the primers and probes, however, we support a variety of strand types. ThermoBLAST support probes with DNA, RNA, morpholino, PNA, 2’-0-methyl, and phosphorothioates. In the future, ThermoBLAST will support a variety of point nucleotides like LNA, 5-methylC, deoxyuridine, inosine, 7-deaza-guanine, and about 30 additional modified nucleotides.
Third Question: “Will the search report multiple hits? For example, amplicons in the same sequence?”
Answer: Yes it does. They would each be reported separately. That’s particularly important when you’re dealing with a large genome like the human genome. For example, you may desire your particular set of primers to go to the X chromosome; but, in fact it might bind to some other chromosomes in the human genome or even within the X chromosome but at the wrong sites. Note that ThermoBLAST automatically only shows primer hits that are extensible by a polymerase and it also automatically detects all the amplicons that can be formed. Thus, you are not overwhelmed with unnecessary data, but it also provides the complete list of ALL the amplicons that the primers can form, including multiple amplicons from the same target.
Forth Question: “Can I run ThermoBLAST-CE against the NR database?”
Answer: No. We do not recommend running against NR in general. You have to remember what the goal of ThermoBLAST is versus what it is when you’re doing BLAST. When you are doing a BLAST search, you have a new sequence of unknown function and you want to just blast against everything out there to see what the heck it is. With ThermoBLAST, it’s very different. We have a case where you have a set of primers, and you want them to amplify or a probe that you want to bind to a specific genome and you’re worried about the kind of things that might give a false positive assay. That’s not everything in GenBank. NR would be very inappropriate. You’d get lots of hits you don’t care about. As I mentioned during the seminar, if you were trying to design a primer for a human gene, but used NR for the search, then the search results would have hits against things you don’t care about such as horse, chicken, and cow, etc. In contrast, the focused genomes present in the ThermoBLAST playlists would provide hits that actually could cause a false positive in your assay. This prevents information overload which is what you get with BLAST, a lot of hits you don’t care about.
Fifth Question: “Can I import my own sequences; and if so what format and type?” Answer: Yes, you can import your own sequences. We currently support FASTA format so you can actually just make a text file. Let’s say you have a set of sequences that have not been deposited into GenBank. Maybe your proprietary top secret sequences, since you are in the military, or work at a biotech or Pharma company. All you have to do is just make up a FASTA formatted file. You give it the usual greater than symbol with the annotation and then your sequence. Then you repeat that with as many sequences in there as you want. If there’s an NCBI accession found anywhere in the annotation line, then our software actually recognizes that and it will ask the user “Would you like me to go to GenBank and get the latest version of the sequence or do you want to use the version that’s actually in this text file?” If you don’t give it an NCBI accession, then it will just assume that the sequence given is the one that you want. You can easily upload FASTA file. You just browse your desktop and find the FASTA file. For example, here is a FASTA file with 19 Ebola sequences; and it uploaded them to the cloud just that fast. Now in this case, I had accessions in that playlist and so it says “Do you want me to actually go to GenBank and get fresh sequences or do you want to use the ones that are actually in there?” If you hit “yes”, it’s going to reload them in; if you hit “no”, it just uses the sequences as they are in the file.
Sixth Question: “Do you have any publications for ThermoBLAST-CE in peer-reviewed journals?”
Answer: We haven’t published on ThermoBLAST Cloud Edition yet. It just came out a month ago in its commercial release. We have had a lot of previous publications on ThermoBLAST in Visual OMP and the developer edition. One of the notable publications is one that we did on a design of primers and probes for the 2009 pandemic H1N1 influenza that was published in the Journal of Clinical Chemistry. Most recently, just this last year, Norm Watkins from our company collaborated with a group at the CDC to find a set of primers for different tick species, which are carriers of Lyme Disease. That work was published in the Journal of Medical Enzymology. If anyone has specific questions for publications, we have some on our website, but I can send you PDF files. We’ve also published a paper about the 7 Myths of PCR Design from 2007. It has a lot of information about BLAST versus ThermoBLAST, and about the guts of how we do what we do.
Seventh question: “How would you batch upload in Excel for high throughput screening; so batch upload primers into ThermoBLAST-CE for high throughput screening?”
Answer: I apologize for going over that part very quickly. We can review the method of batch uploading of primers. Basically, you can prepare an excel file that contains all the primers (save the file as comma-delimited .CSV file). Alternatively, you can manually enter your sequence information from within ThermoBLAST and there is a feature that allows you to save the primer set as a CSV file, which you can use for future use. If you’re interested in the format of the .CSV file, we have some example files that you can download and that show you what columns have to have the primer, the primer name and that sort of thing.
Alright everyone, well thank you for joining this webinar. If you have additional questions that you want to email to us, I would be happy to answer them. Joe would be happy to answer them as well. Once again, we thank you for your time and we appreciate the opportunity to show you about ThermoBLAST.