Custom Data Model#

As more organizations consider moving to the cloud, energy consumption and CO2e emissions associated with the migration from on-premise or co-located data centers should be taken into consideration. Oftentimes, unless the organization provides a sophisticated server measuring software, it can be difficult to measure the energy consumption over a given time period. With much analysis and collaborative efforts with CCF open source contributors, we have created a methodology to measure both energy consumption and CO2e emissions associated with on-premise physical server usage.

As on-premise consumption tends to vary significantly across organizations, we have decided to implement a generic data model to be used as an input to CCF. Here is a sample CSV file with the data model necessary to be consumed by CCF. This is currently the minimum amount of data needed to make our calculations.

CSV Data Model:

machineNamestringCentral processing unit description
memorynumberNumber of gigabytes of memory usage
machineTypestringMachine type (Ie. server, laptop, desktop)
startTimeDateTimestamp recording the start day/time of usage
endTimeDateTimestamp recording end day/time of usage
countrystringCountry where server was located
regionstringRegion or state within country server was located
cost?numberThe amount of cost associated with each row
cpuUtilization?numberSpecific server utilization percentage (Ie. 50% = 50)
powerUsageEffectiveness?numberPower usage effectiveness for data center

You will see via the machineType column that our application supports both on-premise servers from data centers, as well as laptop and desktop devices. We have found that when considering a cloud migration, some organizations may see a change in energy/carbon emissions if they begin running laptop or desktop workloads via cloud-hosted workstations instead. Adding any other supported machine types can be easily configured into the application.

The data points for startTime and endTime are used to calculate a figure for usage hours that is used in our calculations. It is worth noting that start and end times are not always completely accurate to determine usage hours when considering machine sleep/dormant times, particularly for laptop and desktop machine types. One potential solution is to implement support for an aggregate active usage hour count or incorporate an average 'downtime'.


Based on our analysis to date, CCF is currently only able to make on-premise estimations for Compute and Memory usage. We welcome any insight and collaboration to determine estimations for other usage types such as Storage or Networking. Currently, only these operational emissions are supported. In the future, we hope to incorporate embodied emissions for on-premise as well.

Our application parses the provided input CSV file and is able to iterate over each row through the same Compute and Memory formulas laid out in the main Methodology. It is able to map the machineName column to the associated microarchitecture and leverage the SpecPower Database to determine average compute and memory coefficients, like min/max watts and memory (GB), similar to how we grab these values for cloud providers.

The main differences for the on-premise methodology are:

  • Instead of using Virtual CPU Hours in the formula, we use CPU hours as the input data which concerns entire physical machines. We also provide some configuration options for various machine types that you can see below.
  • The average PUE defaults to 1.58, based on this Uptime Institute Report.

Note: Unless there is a cpuUtilization value provided for each row in the input data, it is difficult to determine an accurate average utilization value used broadly across on-premise data centers. For now, we will continue to default to the projected estimate for the average server utilization of servers in hyperscale data centers in 2020 of 50%, from the 2016 U.S. Data Center Energy Usage Report.

Configuration Options#

We have implemented custom configuration options to help on-premise consumers calculate more accurate estimations specific to their usage and available data rather than always relying on CCF averages and default values.

The publicly available data from SpecPower Database that we rely on for the average or median min/max wattage values is only truly reliable when considering a full server. Since these values will likely differ significantly for laptop and desktop computers, we offer an option to configure a custom CPU Utilization value to use with the CCF default average watts based on SpecPower min/max watts, or instead, configure a custom Average Watts value per machine type to override the defaults.

Set in the packages/cli/.env file, here is an example of how you would set the custom configurable values:


You can also view these configurations in more detail in our Configurations Glossary.

Accessing the Data#

Currently, the CCF application only supports reading and writing to a CSV for measuring on-premise usage. In order to use our data model to calculate energy consumption and CO2e emissions associated with your on-premise usage, you must add your CSV file directly into the packages/cli directory. The CSV format must match the columns laid out in our data model. Otherwise, the data will not be able to be consumed by CCF.

In your terminal, run the following command from the application root directory:

yarn run estimate-on-premise-data --onPremiseInput [<Input File Name>]

You can optionally add the argument --onPremiseOutput [<Output File Name>] to specify the name of the output file which includes the same data from the input file as well as usageHours, kilowattHours and co2e appended as new columns.

Cloud Carbon Footprint is an open-source project, sponsored by Thoughtworks Inc. under the Apache License, Version 2.0