A model for airflow quantity using multiple measurement tools and diagnostic procedures
This is a post about working with engineering data. Github here.
The more focused we are on a task, the more likely we are to think that our sample data is all there is. We may vaguely define our population and its parameters. But after a moment of pondering, we usually get back to work.
There are different ways of thinking about data. Usually, we think of nuts-and-bolts machine data as sample data.
This is pragmatic, but it leaves us at a loss for greater accuracy. This is an example about HVAC machines, which produce air. We can’t see air, but we can measure it.
Thinking like a Bayesian, I’m going to write up my model starting with the parameter I want to know, total_system_airflow. I’ll assume that it’s normally distributed due to being a function of many physical processes and laws.
The environment, whether you’re in a lab or uncontrolled setting, matters. The diagnostic tools have measurement error.
OK, I’ll backup. Anyway, here’s a workflow for figuring it all out.
We can’t measure total_system_airflow without introducing variance. There are lots of sources of variance, including that based on the measurement itself.
We can introduce system-level variance by measuring different aspects of machine performance. This can be done using built-in diagnostic machines. These can be compared to system-external measurements.
Measuring machine performance at multiple levels (machine i in environment j) adds yet more uncertainty from the environment. Adding a varying-intercept term allows for testing in different environments.
I’m going to code my model up in Stan. Non-centered re-parameterization works best.
sys-sig represents environmental variance. Most of the variance comes from the environment. sigma represents data from multiple measurements on the machine itself.
I love Bayesian statistics because it tells me the parameter I want to know. This is the population parameter for the system itself.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
b0 0.13 0.11 4.87 -9.40 -3.27 0.19 3.52 9.52 2110 1.00
b1[1] 736.85 7.13 182.06 364.96 614.85 741.02 859.51 1094.83 652 1.01
b1[2] 599.41 5.36 145.70 315.17 502.22 599.18 697.40 884.46 740 1.00
b1[3] 789.73 6.20 168.34 464.11 679.87 791.09 904.09 1119.74 737 1.00
g0 0.00 0.02 1.01 -1.99 -0.70 0.00 0.65 1.97 1974 1.00
b_tesp -0.03 0.02 1.01 -1.95 -0.71 -0.04 0.67 1.97 2045 1.00
b_sys_cfm 0.68 0.01 0.16 0.38 0.58 0.69 0.79 1.00 749 1.00
sys_sig 34.78 4.04 34.66 0.20 1.93 33.00 52.74 123.00 74 1.04
b_dl -0.02 0.03 0.79 -1.55 -0.56 -0.03 0.51 1.55 748 1.00
sigma 26.68 4.24 38.78 0.12 0.98 4.74 43.99 118.07 84 1.03
lp__ -23.44 0.52 4.71 -33.61 -26.46 -22.74 -19.92 -15.69 81 1.06
The three systems in the b1[i] give coefficients for the true expectation of machine output. Other coefficients give variance and provide ranges in equipment calibration and performance effects.