Sandrine Pigat of Creme Global explains how to navigate big data by applying traditional predictive models to help make informed decisions about food product development, consumer health and food safety.
The evolution of novel data processing technologies is fast paced and the volume of data being generated is growing by the second. The food industry stands to benefit from this and has been testing and adapting various routes for using data science techniques to enhance the production of safe and healthy foods.
Data science requires a multidisciplinary approach and a broad range of skill sets, from mathematics and statistics, computer science and machine learning to artificial intelligence (AI). Data science also needs to have strong ties to the actual domain knowledge[1] in order to ask the right questions and select the right data. Predictive analytics and scientific modelling are interesting areas of data science and the activity in this space is growing. Applications can range from traditional methods using advanced statistics for assessing various future scenarios to machine learning techniques, including artificial intelligence.
The use of data science within food and health has become more prevalent and has been steadily complementing more traditional approaches. Predictive modelling for making informed decisions in new product development, business strategy and consumer health and safety has now demonstrated its value to stakeholders from industry, governments and research organisations on many occasions, some of which are described below.
The collecting, centralising and formatting of data via spreadsheets, hardcopies, documents, IOT or other means is the first step into digitisation of data, followed by the structuring, validating, analysing and visualising of this data. Only then is it possible to develop more advanced models that serve to inform R&D (from product design to launch), safety (including exposure and microbial food safety), consumer health and strategy.
Probabilistic Exposure Modelling
Probabilistic Exposure Modelling has been used and applied for a number of decades (Figure 1). As part of an overall risk assessment of a food contaminant, pesticide residue, additive or a novel ingredient on a population of consumers, an exposure assessment has to be carried out. As an approximation, the exposure can be quantified by the amount of food consumed multiplied by the concentration of the contaminant in this food. However, when looking at exposure within and across consumer populations, this simple calculation can become quite complex. A chemical can be present at varying levels in a large variety of foods, consumed in varying quantities, in different combinations, by different consumers in different countries/regions, and at different life stages.
![]() |
Figure 1 Probabilistic Exposure Model |
Therefore, exposure in a population is intrinsically variable and has a number of sources of uncertainty; this variability and uncertainty should be captured using probabilistic methods in each risk assessment scenario, as required. The results can then be expressed with confidence bounds and the scenarios can be evaluated more rigorously.
As an initial screening exercise, more simplistic methods are often applied to estimate exposure, high level consumption statistics and average or maximum chemical concentration levels. When comparing those crude exposure levels to health based thresholds, such as the acceptable daily intake (ADI) or tolerable daily intake (TDI), this can become an issue as the exposure is likely to be overestimated and will potentially exceed those limits, especially when exposure is aggregated from multiple sources.
This is where probabilistic dietary exposure modelling comes into its own to refine the exposure results for a population in a far more accurate and realistic manner by applying various mathematical techniques, such as using distributions of intakes, accounting for a range of concentrations rather than using a mean or a maximum point value, occurrence of a chemical within a food, and so on. Probabilistic data can be represented by parametric or empirical distributions, integrated in the analysis using Monte Carlo simulations.
New product development and its impact on nutrition and health
Another example of using data science, and specifically predictive models, in the food industry is to assess the impact of a dietary change on nutritional intakes and subsequently health outcomes. This change can consist of a new product formulation, a new food or ingredient, a reformulated product or a portion size change. The impact of this dietary change on consumers can be assessed by using nutritional intake modelling. As with exposure modelling, data on food consumption is required.
Food consumption surveys assess population dietary behaviour and health at national and local level in various geographies using specific survey methodologies. If available, individual consumption data can be used to model the impact of dietary changes on intakes. These food consumption databases vary in quality, size and detail but usually report information at eating event level for each representative participant within the survey. Food descriptions, consumed amounts and a diary of consumption events are recorded as well as nutrient composition data for each food. The number of consumers representing a given population can range from a couple of hundred to tens of thousands; the number of consumers surveyed is chosen to be large enough to be statistically representative of the population. The individual foods recorded as consumed can range from 500 to up to 10,000 foods, usually categorised into specific food groups. Having access to such granular databases enables very targeted analysis.
One such study[2] investigated the impact of a new milk powder in China, a country where the burden of cardiovascular disease is on the incline. Potassium has been shown to reduce systolic blood pressure in pre-hypertensive consumers. Using a scientific model, a milk powder fortified with potassium was introduced into the Chinese diet.
The underlying data used in this model consisted of individual eating event level data on food consumption and composition of foods, representing the Chinese population (China Health and Nutrition Survey - CHNS) as well as the composition data of the new milk product. The composition data used for the foods consumed in the CHNS was obtained from the Institute of Nutrition and Food Safety, China CDC (2004) and Institute of Nutrition and Food Safety, China CDC (2002). The survey includes information for 21 food groups and 1,599 foods. Anthropometric measurements, blood pressure and biomarker data were also collected in this survey. The target age group was 45 years and older, which resulted in 6,134 subjects, whose dietary intakes were monitored.
Within this age group the new milk powder was either substituted for normal milk or added on top of the normal diet via different scenarios and depending on the consumers’ potassium intakes. Potassium intake distributions were assessed at baseline and after substitution. Individual increases in intake were calculated as well as the overall shift in population intakes.
Based on findings from the literature, the increase of potassium intakes and the resulting decrease in systolic blood pressure were assessed. Individual consumers’ blood pressure within the survey data was then modified to account for the impact on blood pressure resulting from the milk substitution.
The benefit to a food company of an analysis such as the above is to assess whether a product will provide a health benefit for a targeted consumer population and to quantify the possible impact.
![]()
Another example of using data science, and specifically predictive models, in the food industry is to assess the impact of a dietary change on nutritional intakes and subsequently health outcomes.
Product stability and shelf life - predictive modellingTo ensure consumer safety and product quality and to preserve concentration levels of ingredients, such as added nutrients, shelf life testing is performed. Tests are typically carried out via durability or challenge studies that span weeks or months. During these studies, the microbial counts and quality outcomes of interest are documented at defined time points and under defined environmental conditions.
Using experimental data to predict the stability of different products, i.e. when developing new products with new formulations, is an example of an area where predictive modelling can be applied. These models can provide cost and time effective guidance on the projected shelf life and stability of a new product or a product with modified processing.
This work may consist of developing a statistical model from experimental data, for example preservatives, food additives, pH etc., and measuring the microbial stability of different products. The model will predict whether a new product formulation is stable and will estimate the probability associated with this prediction.
Similarly, predictive mathematical models can be built using product or ingredient parameters, such as colour, texture, sensory characteristics etc. The aim is to determine the shelf life of this product based on the known parameters and experimental data, which can take months or years of taking measurements for selected parameters. Decision and development times can be shortened using the insights from these mathematical models. In addition the critical parameters for predicting stability and shelf-life can be identified and separated from the non-critical parameters.
Tools that can be used to assess product shelf-life and safety from a microbial perspective are called predictive microbiological models. Predictive models have been developed for both spoilage and pathogenic organisms and growth, survival and heat inactivation models are available for use. The models usually include variables, such as temperature, pH, salt or equivalent water activity and initial contamination levels.
Primary models describe changes in microbial numbers or other microbial responses over time. The model may quantify colony forming units (CFUs) per ml, toxin formation, substrate levels (which are direct measures of the response) and absorbance or impedance (which are indirect measures of the response). A mathematical equation or function describes the change in a response over time with a characteristic set of parameter values.
Secondary models describe the responses by the parameters of these primary models to changes in environmental conditions, such as temperature, pH, or water activity. Tertiary models are computer software routines that turn the primary and secondary models into ‘user-friendly’ programmes for model users in the form of software applications and expert systems. These programmes may calculate microbial responses to changing conditions, compare the effects of different conditions, or contrast the behaviour of several different microorganisms.
Once parameters have been entered into the system, a prediction can be produced. The prediction will usually be in the form of a growth curve, but parameters, such as lag time and time to reach a specified microbial level at a specified time, are also predicted.
Note that the above models can not always replace the actual experiments, but they can help steer formulations, new products being developed or process changes and can greatly increase the likelihood of success from the experimental testing.
Using experimental data to predict the stability of different products, i.e. when developing new products with new formulations, is an example of an area where predictive modelling can be applied.
Modelling product reformulation and impacts on the population
Food and Drink companies are constantly innovating their product ranges. Product reformulation and new product development play a big role in meeting consumer needs for safe and healthy products. Mandatory or voluntary targets set by public health stakeholders for ensuring healthier nutrient profiles of foods are another important reason for product development.
Within Food Drink Ireland, 15 member companies participated in a research project[3] with the aim of assessing the impact of new food and drink development and reformulation on nutrient (sodium, total fat, saturated fat, total sugar and energy) intakes in Irish consumers using anonymised data and scientific modelling. The shift in the consumption of products sold from 2005 to 2017 was incorporated into this model, taking into account the shift in consumer preference via volume sales data, the change in product composition, the discontinuation of old products and introduction of new products by the participating companies’.
As a database on consumer behaviour, the national Irish nutrition surveys were used, including The National Teens’ Food Survey (2005 – 2006), National Children’s Food Survey (2003 – 2004), National Adult Nutrition Survey (2008 – 2010) and National Preschool Nutrition Survey (2010 – 2011). These were combined with the collected nutritional data on the company products gathered for the years 2005 and 2017. For applicable food and drink products, original composition data from the surveys was replaced by the gathered company data, with given foods being represented by multiple brands. Volume sales data for a given brand and food category were used to create weighted distributions of concentrations to represent the market.
To account for the rest of the market not represented within the data, optimistic and conservative scenarios were created. The optimistic scenario assumes that other brands followed similar reformulation and consumer preference patterns, whereas the conservative scenario assumes that all other products on the market remain unchanged over time. The latter is likely to be an underestimate because other companies and retail own brands are actively reformulating and the reality is likely to be somewhere in the middle.
Data on approximately 1800 food products and over 23,000 concentration data points were collected and incorporated into the intake model. The overall findings of the project help to quantify the impact that industry actions have had on consumer intakes over a 12 year period.
The biggest impact could be seen in a reduction in total sugar intakes ranging from 3.2g/day in children to 0.8g/day in adults. The second biggest impact was observed in the reduction of saturated fat intakes, with other nutrient intake reductions being less impactful.
A future aim of this work is to measure the impact of changes in package sizes of products and to conduct further monitoring of food reformulation and new product development. The launch of the report involved the industry stakeholders as well as the Irish Food Safety Authority demonstrating the importance of collaboration for improving public health related matters.
The use of data science is unlocking information from existing data sets that was previously not available.
Conclusions
The use of data science is unlocking information from existing data sets that was previously not available. Data science is having a very positive influence on food product development by supporting more efficient and scientific decision making and replacing or complementing traditional food science methodologies. However, some challenges still remain, such as access to the right expertise to ask the right questions, understanding and applying the correct methodologies, the availability and quality of the data used and handling the uncertainties that will inevitably arise.
The application of scientific modelling, data science and new technologies are quickly maturing bringing the knowledge and the expertise required to continue to grow this important toolkit that has become an integral part of many organisations in the food sector.
Sandrine Pigat, Head of Food and Nutrition, Creme Global
The Tower, Trinity Technology & Enterprise Campus, Grand Canal Quay, Dublin 2, Ireland, D02 P956
Email sandrine.pigat@ cremeglobal.com
Web cremeglobal.com
References
1. Data Science Field/.Term Diagram: Ryan Urbanowicz, PhD University of Pennsylvania, Philadelphia PA, 19104
2. Dainelli L, Xu T, Li M, Zimmermann D, Fang H, Wu Y, Detzel P. 2017. Cost-effectiveness of milk powder fortified with potassium to decrease blood pressure and prevent cardiovascular events among the adult population in China: a Markov model. BMJ Open [Internet]. 7:e017136. Available from: http://bmjopen.bmj.com/lookup/doi/10.1136/bmjopen-2017-017136
3. https://www.fooddrinkireland.ie/IBEC/Press/PressPublicationsdoclib3.nsf/wvFDINewsByTitle/new-report-details-progress-of-food-and-drink-reformulation-20-02-2019/$file/The+evolution+of+food+and+drink+in+Ireland+2005+-+2017+-+Reformulation+and+Innovation+-+Supporting+Irish+diets.pdf