Results of analysis validation exercises

How we validate results as we continually improve our approach to carbon analysis.

Results of analysis validation exercises

by 

Sam Wolk

November 25, 2024

By comparing our results to energy audit data and post-retrofit performance reports, and by using a comprehensive test suite for data validation, we're able to directly assess the accuracy of our models.

Introduction

Carbon Signal rapidly transforms monthly energy data into detailed retrofit recommendations to support decarbonization goals. Within a few minutes, real estate owners and asset managers can access the same type of information they would find in a physical energy audit report.

The reliability and accuracy of the results from our platform are key in establishing trust with current and future customers. This article describes how we’ve validated the results produced by our platform.

Background on how Carbon Signal works

Understanding a building's energy consumption is essential for identifying opportunities for improvement. Traditional methods often rely on either physical energy audits, which can be time-consuming and costly, or rudimentary statistical estimates (e.g. comparison to industry benchmarks).

In contrast, our solution takes a much more detailed and robust approach by generating multiple iterations of high-fidelity physics-based energy models that align with a given building’s monthly energy consumption data. This "ensemble" of energy models reflects the building's observed performance in its current state.

This process allows Carbon Signal to disaggregate energy use into specific end-use categories, such as heating, cooling, lighting, and appliances. This granular view goes beyond standard utility bills, which typically only provide energy consumption by fuel type. By dissecting energy use in this manner, we empower building owners to pinpoint exact areas where energy efficiency and decarbonization measures will have the most significant impact.

In addition, we use the energy models to test a wide range of potential retrofit scenarios across entire building portfolios. This enables us to identify the areas with the largest savings potential and determine the most effective strategies for each building. By simulating different retrofit interventions with physics-based energy modeling engines, we can comprehensively assess how these strategies perform, helping building owners prioritize the best opportunities for reducing energy consumption and carbon emissions.

Validation and testing

Establishing confidence in our methodology requires rigorous testing and validation. To ensure the reliability and effectiveness of our approach, we implemented a comprehensive validation framework designed to verify our models under diverse conditions. Our approach is twofold:

  1. Real-World Data Validation: We compare our model predictions against actual data from energy audits and post-retrofit performance reports. This direct comparison helps us assess how accurately our models reflect real-world scenarios and is critical to ensure that Carbon Signal is successfully delivering real-world impact.
  2. Test Suite of Synthetic Models: Given the limited availability of comprehensive real-world data, we developed a test suite comprising hundreds of detailed building energy models across various climate zones and building types. We generated energy models to reflect the variety and complexity of buildings found in the real world. By simulating these models, we generate baseline utility data that serves as a stand-in for actual building portfolios in regions or typologies where detailed energy audit reports are not available. This suite of test buildings allows us to evaluate our pipeline extensively and perform detailed error analyses across every possible dimension of the building and in every possible climate, which would not be possible with the limited (but still critical) data available from real-world energy audits.
Real world data validation

Comparing our results to energy audit and post-retrofit performance reports, we have been able to directly assess the accuracy of our models. By comparing our predictions with the actual energy savings observed after retrofits, we can measure the effectiveness of our recommendations. This real-world validation is crucial, as it demonstrates our capability to deliver tangible results for our clients.

As former building science consultants and energy auditors, we have access to a plethora of energy audit reports. For this validation study, we used data from a customer with a large portfolio of grocery stores, 40 of which had detailed energy audits from third-party auditors. We compared these energy audit reports to the outputs of the Carbon Signal platform.

The results of this comparison showed that the total potential energy savings were very similar between both the energy audits and the Carbon Signal outputs. Where there were discrepancies, these were driven primarily by differences in assumptions of end-use breakdown.

For example, the Carbon Signal platform correctly detects when natural gas is still used even when the weather conditions are warm enough to not require heating; this suggests the natural gas used during these periods is for domestic hot water, and in turn, informs Carbon Signal’s savings calculation for the electrification of space heating. In contrast, many of the energy audit reports suggest a much higher natural gas reduction from heating system electrification, but it is unclear how the attribution of natural gas for heating was calculated.

Test suite of synthetic models

To broaden our testing scope, we created a comprehensive test suite of hundreds of building energy models that vary in size, design, location, and operational characteristics. By creating these energy models, we produce a rich dataset of baseline utility information, along with disaggregated energy use, building characteristics, and detailed energy-savings data for each building.

We then analyze these buildings’ utility data alone with our calibration, disaggregation, and savings prediction pipeline, just like we would with any other building in your portfolio. By comparing the Carbon Signal platform’s predictions to the known data for each building in the test suite, we can meticulously analyze our model's performance. This includes examining how well we predict energy savings across different building types, climate zones, and energy conservation measures.

What error metrics do we look at?

We use a range of metrics, drawing from best-practices in statistics and error analysis, to evaluate the test suite data. These tests include checking how close the median prediction is to the observed value, as well as how often the observed value falls within the predicted range. We also evaluate the rate of false positives and false negatives: instances when Carbon Signal erroneously identifies or misidentifies opportunities for savings, respectively.

  • False Positives: Recommending a strategy that does not end up leading to significant savings.
  • False Negatives: Failing to identify a strategy that would result in significant savings.

Since one of the key use cases of Carbon Signal is to identify which building retrofit strategies will yield significant benefits for specific buildings, it’s important to minimize the rate of false positives and false negatives. Additionally, since missing out on effective strategies (false negatives) can be more costly in the long term than suggesting less effective ones (false positives), we strive to maximize our success in identifying all significant opportunities for energy savings, while ensuring that the number of less impactful recommendations remain at an acceptable level.

Percentages indicate number of times, out of a random sample of tested strategies, that Carbon Signal makes a correct or incorrect recommendation.

Across our entire test suite, we find that we successfully identify 90% of cases where a specific retrofit strategy has a larger impact on savings compared to all other strategies. In other words, for every 10 opportunities that would actually yield significant energy savings if implemented, we will be able to successfully detect 9 of them. Conversely, because we prefer to be more conservative in order to prevent missing high-impact strategies, we do occasionally overestimate savings: for every 10 strategies that we suggest will have a significant impact, 7 of them will result in meaningful savings, while 3 of them might end up having little to no impact.

To account for this, we express our recommendations with varying levels of certainty - some strategies we think will almost certainly be successful if implemented, while others are less likely to be successful but are still worth exploring in more detail, as illustrated in the figure below:

Matrix of recommendations, showing strategies that have varying levels of impact along with associated degrees of confidence in the results.
Conclusion

Our validation process helps affirm the reliability of our technology and reflects our belief in transparency and analytical rigor. By analyzing both real-world assessments and a broad test suite that covers diverse building types, climates, and interventions, we demonstrate that the platform’s methodology is flexible, adaptable, and thoroughly vetted.

Ready to learn more? Contact us to see how Carbon Signal can help with your decarbonization efforts.

Get started with Carbon Signal.