Datasources

It is well-known that, in real-world life-cycle studies, most of the time is spent collecting data. You almost surely end up with inventory data like:

id	geo	quantity	ram_size	storage_size	amortization_period	power	co2	water
server-small	FR	38	384	61.44	5	400	2855	6.46
server-medium	FR	62	384	11.52	5	600	10155	12.6
server-large	UK	3	768	76.8	5	900	15312	24.3

Assume that this inventory data is presented as a CSV file data/inventory.csv located in the folder data at the root of your project.

To use this data from within your lca as code models, you need first to declare a data source.

datasource inventory {
    location = "data/inventory.csv"
    schema {
        id = "an-identifier"
        geo = "FR"
        quantity = 1 p
        ram_size = 16 GB
        storage_size = 1 TB
        amortization_period = 5 year
        power = 400 W

        co2 = 0 kg_CO2_Eq
        water = 0 m3
    }
}

This expression defines a datasource inventory that we will be able to query (we will see how below). The schema block is used to declare which columns are available in the file. In particular, the identifiers must match the actual column names in the csv file. Moreover, the schema also specifies the type of value in each column by declaring default values. For instance, the statement id = "an-identifier" declares that the column id will contain values of type string. The statement ram_size = 16 GB declares that the column ram_size will contain numeric values and that they must be interpreted as a number of gigabytes.

Now, let's see how you can query this datasource.

Lookup

A common use case for data sources is to associate parameter values with, e.g., a specific equipment. In our example above, you may want to access the quantity of the server identified as server-small. The lookup primitive allows to fetch a specific row from a data source.

process my_lookup {
    products {
        1 p material
    }
    variables {
        row = lookup inventory match id = "server-small"
        quantity = row.quantity
        co2 = row.co2
    }

    impacts {
        quantity * co2 co2
    }
}

The lookup primitive returns the first row in the data source that satisfies the matching conditions. More precisely, the returned value is a record with entries indexed by the columns of the data source. To access the entry quantity in the record row, you can use the dot notation, i.e., row.quantity.

Note that the lookup will raise an error if there are no matches. In case of multiple matches, it will pick arbitrarily one of them.

For each block

Quite often, the elements in an inventory are to be included as inputs to a process. For instance, the inventory in our example lists the equipments in a given data center. How could we model our data center? Of course, one can manually define the model as follows.

process datacenter_manual {
    products {
        1 p dc
    }
    variables {
        small = lookup inventory match id = "server-small"
        medium = lookup inventory match id = "server-medium"
        large = lookup inventory match id = "server-large"
    }
    impacts {
        small.co2 co2
        medium.co2 co2
        large.co2 co2
    }
}

This approach, however, is obviously infeasible in case of more than tens or hundreds of rows in the data source. Instead, one can use a for_each block.

process datacenter {
    products {
        1 p dc
    }
    impacts {
        for_each row from inventory {
            row.co2 co2
        }
    }
}

You can also focus on a subset of the rows using matching condition.

process datacenter_fr {
    products {
        1 p dc
    }
    impacts {
        for_each row from inventory match geo = "FR" {
            row.co2 co2
        }
    }
}

Passing record as parameters

Each record in a data source contains different parameter values. One may want to pass these values as parameters of another process. You can define a process that accepts a record as a parameter. The primitive default_record returns the default record of a data source, as defined by the default values in the data source schema.

process server {
    // You can define a parameter as a row from inventory.
    // The default value for this parameter is given by the schema.
    params {
        row = default_record from inventory
    }
    products {
        1 p server
    }
    impacts {
        row.co2 co2
    }
}

A record can be passed as a parameter like any other parameter.

process pool_server {
    products {
        1 p pool_server
    }
    inputs {
        for_each row from inventory {
            // record variable can be fed to the process invoked.
            row.quantity server from server(row = row)
        }
    }
}

Sum-product

The 'sum' primitive allows compute the sum-product of multiple columns. In the example below, the columns quantity and co2 are multiplied point-wise, and then summed. For now, only the point-wise product of columns is supported.

process sum_prod {
    products {
        1 p sum_prod
    }
    impacts {
        /*
        */
        sum(inventory, quantity * co2) co2
    }
}

As an exercise, try to define a process that models the average server in the inventory.

Solution

process average {
    products {
        sum(inventory, quantity) server
    }
    impacts {
        sum(inventory, quantity * co2) co2
    }
}

process average2 {
    products {
        sum(inventory, quantity) server
    }
    impacts {
        for_each row from inventory {
            row.quantity * row.co2 co2
        }
    }
}

LCA as Code

Datasources

Lookup

For each block

Passing record as parameters

Sum-product