© 2007-2011 Mathew E. Sowa, J. Wade Harper &
|Search||Downloads||Tutorials||Feedback & Requests|
Welcome to the Harper Lab's CompPASS tutorial.
In the following sections, I will describe how the scoring metrics are calculated and what each means.
CompPASS is an acronym for Comparative Proteomic Analysis Software Suite and as the name implies, it relies on an unbiased comparative approach for identifying high-confidence candidate interacting proteins (HCIPs for short) from the hundreds of proteins typically identified in IP-MS/MS experiments. There are several scoring metrics that we typically calculate with each having it's own advantageous properties to better delineating between background proteins and HCIPs. In the following sections, I will discuss how we calculate each of these metrics and how each may be used individually or in conmbination to assist in parsing IP-MS/MS datasets.
The "Stats Table"
The first step in our analysis of IP-MS/MS data is the generation of what we call a "stats table" (see below). Essentially, this is a matrix (or a spreadsheet) in which the rows are the unique proteins identified from all IP-MS/MS experiments and the columns represent each bait used for these experimens. Each element of the matrix (or cell of the spreadsheet) is the Total Spectral Count (TSC) for that specific interacting protein from that particular bait's IP-MS/MS experiment. The TSC for a protein provides a good estimation of that protein's abundance in the IP-MS/MS experiment. Once the stats table is created based on all of the runs in the project (for example, the runs for all 102 baits in the Dub Project), we can now calculate scores for each interactor, for each bait. It is worth noting that in our experiments, we shoot each sample twice in a row (technical duplicates) and after identifying peptides and proteins, combine these duplicates into a single "merged" run for that bait. For merging, the TSC for an interactor is the average TSC observed from the duplicate runs. These duplicate runs become critical for caluclating the D- and the weighted D-scores.
CompPASS Scoring Metrics: The Z-, S-, D-, and WD-scores     top of page
As you can tell by the title of this section, there a number of scores we use to evaluate our data and, listed in that order, represent increasing levels of resolution (and their creation lineage as well!). The table below provides a brief summary of each score, it's advantages, and it's disadvantages.
Score Advantages Disadvantages
The Z-Score     top of page
The first score is the conventional Z-score which determines the number of standard deviations away from the mean (Eq. 1) at which a measurement lies (Eq. 2). In Eq. 1 & 2 X is the TSC, i is the bait number, j is the interactor, n denotes which interactor is being considered, k is the total number of baits, and s is the standard deviation of the TSC mean.
Each interactor for each bait has a Z-score calculated and therefore, the same interactor will have a different Z-score depending on the bait (assuming the TSC is different when identified for that bait). Although the Z-score can effectively identify interactors who's TSC is significantly different from the mean, if an interactor is unique (found in association with only 1 bait), then it fails to discriminate between interactors with a single TSC ("one hit wonders") and another that may have 20 TSC or 50 TSC, etc. In this way, the Z-score will tend to upweight unique proteins, no matter their abundance. This can be dangerous since the stochastic nature of data-dependent acquisition mass spectrometry leads to spurious identification of proteins. These would be assigned the maximal Z-score as they would be unique, however they likley do not represent bona fide interactors.
The S-Score     top of page
The next score is the S-score which incorporates the frequency of the observed interactor and it's abundance (TSC). The S-score, D-score, and WD-score were all developed empirically based on their ability to effectively discriminate known interactors from known background proteins, with the S-score being the first metric we developed. Both the D- and WD-scores are based on the S-score, sharing the same fundamental formulation, but have additional terms that add increasing resolving power. The S-score (Eq. 3) is essentially a uniqueness and abundance measurement.
In Eq. 3, the variables are the same as for Eq. 1 & 2. f is a term which is 0 or 1 depending on whether or not the interacting protein was found in a given bait. Placed in the summation across all baits, it is a counting term and therefore, k/Sf is the inverse ratio (or frequency) of this interactor across all baits. The smaller f, the larger this value becomes and thus upweights interactors that are rare. The term Xi,j is the TSC for interactor j from bait i and therefore multiplying by this value scales the S-score with increasing interactor TSC - this provides a higher score to interactors having high TSC and are therefore more abundant and less likely to be stochastically sampled. Although increasing the resolution above using the Z-score alone (the S-score can discriminate between unique one hit wonders and unique interactors with high TSC), the S-score will give it's highest values to interactors that very rare and can lead to one hit wonders being scored among the top proteins. However, with a stringent cut-off value, the S-score reliably identifies HCIPs and bona fide interacting proteins but at this level, is prone to miss lower adundant likely interacting proteins. In order to address this limitation, the S-score was modified to take into account the reproducibility of the interactor for a given bait - a quantity we can determine as a result of performing duplicate mass spec runs. After adding this modification, the S-score becomes the D-score (Eq. 4).
The D-Score     top of page
The D-score is fundamentally the same as the S-score except now we add a power term take into account the reproducibility of the interaction. The term p can either be 1 (if the interactor was found in 1 of 2 duplicate runs) or 2 (if the interactor was found in both duplicate runs).
If p is 1 (the interactor was found in 1 of 2 duplicates) then the D-score is the same as the S-score. Adding the reprodicibility term now allows for better discrimination between a true one hit wonder (a protein found with 1 peptide in a single run, not in the duplicate) which is likely a false positive versus a true interactor with low (even 1) TSC that is found in both duplicate runs. Although powerful in it's ability to delineate HCIPs from background proteins, the D-score still relies heavily on the frequency term, k/Sf, and will thus assign lower scores to more frequently observed proteins. In the vast majority of the cases, this is of course a good thing since these proteins are more than likely background. However, in the event that a canonical background protein is a bona fide interactor for a specific bait (after all, these "background" proteins do have a biological function!), it's D-score would likely be too low for passing the D-score threshold (discussed below) and would not be considered a HCIP. Another example pertains to CompPASS analysis of baits from within the same biological network or pathway. In the case of the Dub Project, most of these proteins do not share interactors as this analysis was performed across a protein family - in which case the D-score works very well. However, in the Autophagy Project, many of these baits do share interactors as these proteins are part of the same biological pathway and determining these share interactors (and hence the connections among these proteins) is critical for a reliable assessment of the pathway. In these cases, the D-score works fairly well for most interactors, however it can downweight very commonly found bona fide interactors (especially when these interactors have low TSC). To address this limitation, we devised a weighting factor to be added into the D-score and thus created the WD-score (or WeightedD-score; Eq. 5).
The WD-Score     top of page
Upon examination of frequently observed proteins (considered background) that were either known not to be a bona fide interactor for any bait and those which were known to be true interactors for a subset of baits, we found that the distributions of the TSC for these groups varied in a correlated manner. In the first case, where these "background" proteins were never true interactors, the standard deviation of the TSC (sTSC) was smaller than that of the latter case ("background" proteins that were known to be true interactors for specific baits). This occurs since real background protein abundance is mainly determined by the amount of resin used in the IP whereas in the case of a background protein becoming a true interactor, its TSC then rises far above this consistent level (and thus cause sTSC to increase. In fact, when sTSC was systematically examined across all proteins fournd in >50% of the IP-MS/MS datasets, the proteins that were known to be real interactors for specific baits were found to have a sTSC that was >100% of the TSC mean for that protein across all IPs. Therefore, a weight factor term was introduced as wj and is essentially the sTSC/TSC mean for interactor j (shown below).
The weight factor, wj, is added as a multiplicative factor to the frequency term in order to offset this low value for interactors that are found frequently across baits but will only be >1 if the conditions in Eq. 5 are met. If these conditions are not met, then oj is set to 1 and the WD-score is the same as the D-score. In this way, only if a frequent interactor displays the observed characteristics of a true interactor will its score increase due to the weight factor. When the AIN data was analyzed with and without wj, it was found that adding the weight factor was only applied to ~10% of the interactors demonstrating that the vast majority of frequently observed proteins are likley background and not HCIPs for any specific baits.
Determining the score thresholds     top of page
Since all of the CompPASS scoring metrics we devised (with the exception of the Z-score) are empirical in their nature, no statistical significance can be assigned to the raw scores themselves. Therefore, the question of "What is the cutoff value for the S-score, the D-score, and the WD-score?" arises and must be addressed in order to assign meaningful thresholds for determining HCIPs. To answer this question, the S-, D-, and WD-score thresholds (ST, DT, & WDT) were determined using randomly generated simulated run data. In order to create simulated random runs, the data from actual experiments was first used to create the proteome observed from our experiments. To do this, each protein was represented by its TSC from each run - in other words, if a protein was found with a total of 450 TSC summed across all real runs, then it was represented 450 times. Simulated runs were then created by randomly drawing from this "experimental proteome" until 300 proteins were selected and the total TSC for the simulated run was ~1500 (these were the average values found across our actual experiments). Next, scores were calculated for the random runs to determine the distributions of the scores for random data. Finally, for each score, we find the corresponding value above which 5% of the random data lies and take this value to be that score's threshold. Although 5% of the random data is above this threshold value, an examination of the TSC distribution for these random data shows that >99% have TSC<4. Therefore, although there are false positive HCIPs in real datasets, we can now use this distribution to assign a p-value for proteins passing the score thresholds. In this way, we can make an argument that a protein passing a score threshold and found to have high enough TSC (reflected in the p-value) is very likely to be a real interactor. A suitable approximation for this above described method is to simply take the minimal value of the top 5% of the scores for each metric and set that value to be the threshold for that score.
Tutorial written by Mathew Sowa, to whom questions/comments should be directed