Tiered Service Architecture for Remote Patient Monitoring
Siddharth Chandak1, Isha Thapa2, Nicholas Bambos1,2 and David Scheinker2,3 1 Department of Electrical Engineering, Stanford University, USA.
2 Department of Management Science & Engineering, Stanford University, USA.
3 School of Medicine, Stanford University, USA.
{chandaks, ishadt, bambos, dscheink}@stanford.edu
Abstract
We develop a remote patient monitoring (RPM) service architecture, which has two tiers of monitoring: ordinary and intensive. The patient’s health state improves or worsens in each time period according to certain probabilities, which depend on the monitoring tier. The patient incurs a “loss of quality of life” cost or an “invasiveness” cost, which is higher under intensive monitoring than under ordinary. On the other hand, their health improves faster under intensive monitoring than under ordinary. In each period, the service decides which monitoring tier to use, based on the health of the patient. We investigate the optimal policy for making that choice by formulating the problem using dynamic programming. We first provide analytic conditions for selecting ordinary vs intensive monitoring in the asymptotic regime where the number of health states is large. In the general case, we investigate the optimal policy numerically. We observe a threshold behavior, that is, when the patient’s health drops below a certain threshold the service switches them to intensive monitoring, while ordinary monitoring is used during adequately good health states of the patient. The modeling and analysis provides a general framework for managing RPM services for various health conditions with medically/clinically defined system parameters.
I Introduction
Remote Patient Monitoring (RPM) is increasingly receiving attention as a method for monitoring patients with certain medical conditions in their normal living/working environments to increase their quality of life and the level of delivered health care [1, 2, 3]. This is becoming feasible via advancements in wearable medical devices, for example, wearable glucose monitors [4], smart watches with vital sign monitoring capabilities (e.g., heart rate, ECG, pulse oximetry) [5, 6], and other such sensors. Further, such devices are increasingly networked and can transmit and receive data over the Internet and act as edge devices in communication with computation servers in the Cloud.
Studies have shown the effectiveness of RPM for various medical conditions. For example, continuous glucose monitoring has been shown to improve glycemic control in patients with diabetes [7, 8]. Smart watches have been effectively used to monitor stress, movement disorders, sleep patterns, blood pressure, heart disease, and COVID-19 [5]. Other RPM devices have also been used to manage and track cardiac conditions, such as heart failure, arrhythmia, and hypertension [3]. These studies highlight the potential of RPM to improve patient outcomes and quality of life by allowing for timely interventions and personalized care.
However, the question remains of how intensively to monitor patients. Intensive monitoring schemes could range from remotely collecting more data on the patient health state and administering more medical intervention remotely (e.g., alerting the patient to increase medication dosage) to calling the patient into an urgent care facility.
While aggressive monitoring may provide more comprehensive data, it can be resource intensive, draining the wearable device battery [6] faster and requiring the clinicians to review more RPM data. From the patient’s perspective, under intensive monitoring, the patient experiences a higher “loss of quality of life” cost or “invasiveness” cost, since intensive monitoring would normally be more invasive to their personal lifestyle and can result in treatment fatigue [9]. In contrast, intensive monitoring (and correspondingly elevated medical intervention) would enable early detection and intervention for adverse events, and hence the patient’s health is expected to improve faster than under ordinary one. Therefore, to account for this trade-off between the invasiveness cost and the possibility of an early intervention, there is an inherent need for a systematic approach to determining the appropriate level of monitoring, based on the patient’s health state.
In this paper we develop a RPM service architecture, where the patient is placed under less or more intensive levels or tiers of monitoring, based on their health state. We then study the optimal monitoring strategy for this model and how it varies with different parameters. One would intuitively expect that a patient would be placed under intensive one when their health state deteriorates; on the other hand, they would be returned to ordinary monitoring when their health state improves enough. The decision to switch from ordinary to intensive monitoring (and/or vice versa) requires a systematic analysis and depends heavily on the various parameters of the service in a rather subtle and complicated way, as analyzed in the following sections.
Of course, the design of an RPM service for a specific medical condition is highly dependent on the specifics of that condition and requires specialized medical knowledge. The point of this paper is not to design a particular RPM service but to provide a general framework and systematic methodology for RPM services based on tunable parameters (e.g., health improvement or deterioration probabilities, monitoring options and invasiveness costs) so as to make justifiable monitoring choices. The parameters will have to be decided on and tuned by medical/clinical experts for condition-specific RPM services. While we work with a simplified model here, the intuition gained from the analysis can help clinicians take more informed monitoring decisions.
In section II, we develop the model of evolution of the patient health state under ordinary and intensive monitoring and demonstrate how it can be managed, using the methodology of dynamic programming. In section III, we provide some analytical results on the optimal management policy in the asymptotic regime of a large number of health states and provide conditions on the parameters for choosing ordinary vs intensive monitoring. In section IV, we numerically investigate the structure of optimal monitoring policies and demonstrate that they place the patient under intensive monitoring when the health state deteriorates below a certain threshold; otherwise they use ordinary monitoring. Finally, in section V, we discuss some extensions. Appendix A contains proofs for results presented in section III.
II The RPM Service Model
Consider a patient who can be in a health state in each service time period . The RPM service places the patient in a monitoring/intervention state in each time period , abbreviated to monitoring state, where denotes ordinary monitoring and intensive monitoring. Thus, one can view the
monitoring-patient joint state as the system or service state at time .
A higher patient health state corresponds to the patient having better health. In particular, the lowest health state is critical in the sense that, when the patient drifts into that state, they go beyond the scope of the current service model; at that point other emergency and/or more severe medical interventions are required, which are outside the scope of this service. Because of that, the states and are absorbing for the Markovian evolution of the health state, as explained below. Indeed, when the patient enters heath state under any monitoring state or , the service evolution stops, as other medical measures/interventions are initiated.
As seen below, in defining costs incurred at the various states, we first take the patient’s quality of life point of view. Under ordinary monitoring, the patient incurs a constant cost at any state with . Correspondingly, under intensive monitoring, the patient incurs a constant cost at any state with . These costs reflect the invasiveness loss for the patient. One may argue about the costs in more elaborate ways, for example, including patient risk factors and operational considerations of the service. For simplicity, we focus here on the invasiveness argument mentioned above.
Of special interest is the critical health state , where this model ceases to apply. On either state or where the patients health is critical and a cost is incurred.
We finally define the transition costs (and ), which is associated with the service transitioning the patient from ordinary to intensive monitoring (and vice versa, respectively).
We model the system as a controlled Markov chain. Such models are commonly used for medical decision making [10, 11]. We try to stay as simple as possible, yet still capture the essence of the problem and get insights into its solution. At the beginning of every time period , the service takes the decision/action (control) to either keep the monitoring state the same (as in the previous time period) or switch it to the alternate monitoring state. Formally, the decision/action space is and each (state, action) pair is associated with a cost given by the function . The transition probabilities are given by where and . The cost functions and transition probabilities are defined as follows.
1. At health state :
No action is taken with the the service ceasing operation. A cost of is incurred.
2. At health states :
a)
Ordinary Monitoring (), no Switching ():
Does not induce a monitoring change, and the system state transitions as follows:
i)
with prob.
ii)
with prob. ,
and a cost is incurred. Note that above is used to account for with prob. since is the highest health state.
b)
Ordinary Monitoring () with Switching ():
Induces a switch to intensive monitoring , and the system state transitions as follows:
i)
with prob.
ii)
with prob. ,
and a cost is incurred.
c)
Intensive Monitoring (), no Switching ():
Does not induce a monitoring change, and the system state transitions as follows:
i)
with prob.
ii)
with prob. ,
and a cost is incurred.
d)
Intensive Monitoring () with Switching ():
Induces a switch to ordinary monitoring , and the system state transitions as follows:
i)
with prob.
ii)
with prob. ,
and a cost is incurred.
We can easily incorporate health state dependent costs and transition probabilities, but for simplicity we assume constant ones here. We make the following natural assumptions.
Assumption 1.
a)
The transition probabilities satisfy: .
b)
The costs satisfy: .
The first assumption 1.a) intuitively states that the patient’s health improves faster under intensive monitoring, rather than under ordinary. Regarding assumption 1.b), it is naturally expected that , as the patient’s “annoyance” is higher under intensive monitoring/intervention than under ordinary. Further, given the severity of entering the critical state , it is naturally expected that , and practically is expected to be much larger than .
II-AOptimal Monitoring Control
We study this problem under the discounted cost setting of the dynamic programming methodology [12], hence, costs incurred time periods into the future (with respect to present) are discounted by a factor of with . Starting from state , the total expected (discounted) cost to be incurred is
when control action is taken by the service, at cost introduced above, when its state is at time , until the patient enters the critical state at time and the service ceases operation. Hence, is the time the patient spends in the service, and at time the critical cost is incurred, however, discounted to . Thus, discounting by implicitly reflects the patient’s desire to stay longer in service, hence, incur the critical cost further in future and discounted to .
A (stationary) monitoring policy is a rule mapping each state to a control action to be taken at that state. The value function of a policy is the total expected (discounted) cost the system will incur until reaching the critical state and stop, when it starts from state at time . That is,
and satisfies the dynamic programming equation [12].
for all , given the Markovian evolution dynamics of the system, specified by the state transition probabilities defined above.
The goal is to find an optimal policy which minimizes over all policies , i.e., for every over all policies . For simplicity, we define which satisfies the following dynamic programming equation [12].
and can be solved numerically to yield the optimal policy , that is, what optimal decision to take when the patient is in health state under monitoring . For the state transition probabilities and costs defined before, this dynamic programming equation unfolds into:
(i)
For health states
(ii)
At health state ,
(iii)
At the critical health state ,
Note that in every above, the first term corresponds to the control keeping the existing monitoring state, while the second corresponds to the control switching to the alternate monitoring state and incurring the switching cost.
II-BSimplified RPM Service
In order to reduce the number of parameters for tractability of the analysis, we make the following simplification.
Definition 1.
Simplified RPM Service:.
a)
The cost for ordinary monitoring is set to zero: .
b)
The switching costs are set to zero: .
Assumption 1 is still satisfied.
Given the limited space of this short paper, we work with the simplified RPM service below (analysis and numerical results), where interesting insights emerge. The zero cost of invasiveness under ordinary monitoring has no significant impact on the results, and is merely a technical assumption to make our analysis easier. The non-zero transition cost is true in several applications where the intensive monitoring just involves a higher rate of collecting data about the patient’s health. We briefly comment on how non-zero transition costs affect our results in Section V.
The simplified RPM service is illustrated in Figure 1 below.
The dynamic programming equations for given above reduce in this case to the following:
(i)
For health states ,
(1)
(ii)
At health state ,
(2)
(iii)
At the critical health state ,
(3)
Note that, in the absence of switching costs (that is, ), for any health state the is the same under both ordinary and intensive monitoring.
III Asymptotic Analysis
The optimal policy and value function can be computed numerically (in general) from the dynamic programming equations (1)-(3) for any given set of system parameters. But to develop intuition and characterize the optimal policy, in this section, we analyse in the asymptotic regime of a large number of health states , i.e., a dynamic range of health states . This allows for analytic tractability of the optimal policy and closed-form conditions on our policies of interest. We make the following assumptions in this section.
Assumption 2.
Large Asymptotic Regime.
a)
The number of health states is very large, i.e., the system operates in the asymptotic regime of .
b)
Under ordinary monitoring the patient’s health drifts downwards, i.e., the improvement probability is and the worsening one is
Assumption 2.a) allows us to use tools from random walk analysis, as done in Lemma 1, making the analysis tractable. Assumption 2.b) allows for the Markov chain to remain stable (positive recurrent) in the asymptotic regime.
The results obtained in this asymptotic regime can be thought of as an approximation for the RPM service with finite , when grows large. In the next section, we numerically demonstrate that this asymptotic approximation tracks the optimal policy and the value function for our RPM for a number of health states as low as .
We define as the the set of policies under which the service chooses the same action irrespective of the monitoring state, that is,
This implies that and we can restrict our attention to the set . For simplicity, we introduce notation for and similarly the notation .
We next define an important policy where the patient stays under ordinary monitoring at all health states, i.e., for all . Note that and we define the corresponding value function . Our first lemma gives the value function for this policy and presents an important property about the optimal policy.
Lemma 1.
For the simplified RPM (Definition 1) and under Assumption 2,
a)
The value function for the policy is given by:
where
Note that for .
b)
For any choice of parameters, there exists such that, under the optimal policy, the patient prefers to stay in ordinary monitoring above health state , i.e., , for all .
For the simplified RPM, the cost of invasiveness under ordinary monitoring is zero. Then , where is the time taken to reach health state , when started at health state and when the patient always stay under ordinary monitoring. here is precisely the hitting time of state for a random walk initiated at state . The proof for this lemma then follows from the moment generating function of the hitting time for an -state random walk.
An important implication of the above lemma is that the policy where the patient chooses to stay in intensive monitoring for all health states is never optimal. Our next theorem shows that the policy is actually optimal for a large choice of parameters.
Theorem 1.
Under Assumption 2, the policy is optimal () for the simplified RPM (Definition 1) when the parameters satisfy
We next define a threshold-policy characterized by the health state . These are the policies under which there exists a threshold such that the patient stays in intensive monitoring when the patient’s health is below or at the threshold and in ordinary monitoring when their health is better than . Note that . So for and for . Our next theorem gives a set of conditions under which is the optimal policy for some threshold .
Theorem 2.
Under Assumption 2, the policy is optimal () for some threshold for the simplified RPM (Definition 1) when the following two conditions are satisfied:
Condition a) above is the complement of the condition in Theorem 1. Condition b) is an additional condition which our proof requires for the threshold policy to be optimal. In the asymptotic regime, we strongly believe that condition b) is not necessary, and condition a) alone is sufficient. Hence we believe that in the asymptotic regime, condition a) alone dictates what the optimal policy is and that the optimal policy can only be of two forms - and . This is reinforced by the numerical analysis, presented next.
IV Performance
In this section, we glean insights on the optimal policy by numerically solving the dynamic programming equations given by (1)-(3) to find the optimal policy.
Figure 2 depicts the two policies discussed in the last section and a sample set of parameters under which they are optimal. Figure 2(a) shows the policy , under which the patient stays in ordinary monitoring at all health states. Figure 2(b) shows the policy with threshold , where the patient stays in intensive monitoring for health states and in ordinary monitoring for health states . Let model 2(a) use the set of parameters and model 2(b) use the set of parameters . Then and with are optimal for model 2(a) and 2(b), respectively.
Note that the parameters for model 2(a) satisfy , which is sufficient for Theorem 1 to hold. Similarly the parameters for model 2(b) satisfy , which is condition (a) in Theorem 2. Note, however, that the parameters in model 2(b) do not satisfy condition (b) of Theorem 2, implying that the condition is not necessary.
When is finite, there also exist instances where the optimal policy is , where the patient chooses to stay in intensive monitoring for all health states . But we observed that this policy is optimal only in extreme cases where is very small or is very close to . Hence we do not further analyse this policy here.
Figure 3: The optimal (blue) value function (numerically computed) compared to its asymptotic counterpart (red, obtained from Lemma 1) for various health states in a system with . Note the close proximity of the two plots.
(a)Variation in optimal policy with
(b)Variation in optimal policy with
(c)Variation in optimal policy with
Figure 4: Dependence of the (numerically computed) optimal monitoring policy on the (a) cost ratio (with fixed ); on (b)
(with fixed ); and on (c) (with fixed ). We set in all cases. Below the vertical orange dashed line, the optimal policy is (ordinary monitoring is used for all health states). This line is positioned at the point where the condition of Theorem 1 achieves equality. Above this value the policy changes to and each threshold is marked in blue.
Figure 3 shows how closely our asymptotic analysis in Section III relates to the actual solution of the dynamic programming equations. The parameters are chosen such that the optimal policy is . We calculate for this model, and compare it with the value function obtained for the asymptotic case (Lemma 1). As observed in the plot, the value function obtained for is almost identical to the asymptotic approximation . The accuracy of the asymptotic approximation in predicting the optimal policy is further demonstrated by our next result.
We next study the impact of different parameters on the optimal policy . Figure 4(a) shows how the optimal policy (numerically computed) varies with the cost ratio . For the optimal policy is , while for it is with varied thresholds. Note that satisfies the condition in Theorem 1 with equality (the value of which satisfies ). This shows that the condition obtained under the asymptotic assumption is a good indicator for our original problem with finite health states. As the ratio grows, the cost incurred on reaching the critical health state increases, and it gets optimal for the patient to stay under intensive monitoring till their health significantly improves.
Figure 4(b) shows how the optimal policy varies as the probability increases. The optimal policy is for and with varied thresholds for . Again, solves the condition in Theorem 1 with equality. As the probability increases, the probability of the patient’s health improving under intensive monitoring improves, incentivizing the patient to stay under intensive monitoring for longer. Finally, Figure 4(c) shows the impact of on the optimal policy. As increases, the patient incurs a higher discounted cost on reaching the critical state, and hence they stay under intensive monitoring for longer.
V Conclusions and Extensions
We have developed a two-tier service architecture for remote patient monitoring (RPM), where the service policy decides whether to place the patient under ordinary or intensive monitoring, given their health state. The optimal policy is first analyzed in asymptotic regimes and conditions are established for choosing ordinary vs intensive monitoring. The policy is then numerically computed and the dependence of its behavior on various key parameters is investigated.
An important extension would be to consider a more general model, which includes non-zero transition costs. Based on numerical experiments performed in the general case, the optimal policy in this case would be a threshold policy with two thresholds instead of the one observed in this paper. A patient under ordinary monitoring would be switched to intensive when their health state deteriorates below a certain lower health threshold, and a patient under intensive monitoring would switch to ordinary when their health state improves above an upper health threshold. There are also other direct extensions, e.g., the costs and probabilities of transitions could also be made dependent on the health state, allowing for a more realistic model.
References
[1]
F. A. C. d. Farias, C. M. Dagostini, Y. d. A. Bicca, V. F. Falavigna, and
A. Falavigna, “Remote patient monitoring: a systematic review,”
Telemedicine and e-Health, vol. 26, no. 5, pp. 576–583, 2020.
[2]
L. P. Malasinghe, N. Ramzan, and K. Dahal, “Remote patient monitoring: a
comprehensive study,” Journal of Ambient Intelligence and Humanized
Computing, vol. 10, pp. 57–76, 2019.
[3]
A. Zinzuwadia, J. M. Goldberg, M. A. Hanson, and J. D. Wessler, “Continuous
cardiology: the intersection of telehealth and remote patient monitoring,”
in Emerging Practices in Telehealth. Elsevier, 2023, pp. 97–115.
[4]
I. Lee, D. Probst, D. Klonoff, and K. Sode, “Continuous glucose monitoring
systems-current status and future perspectives of the flagship technologies
in biosensor research,” Biosensors and Bioelectronics, vol. 181, p.
113054, 2021.
[5]
M. Masoumian Hosseini, S. T. Masoumian Hosseini, K. Qayumi, S. Hosseinzadeh,
and S. S. Sajadi Tabar, “Smartwatches in healthcare medicine: assistance and
monitoring; a scoping review,” BMC Medical Informatics and Decision
Making, vol. 23, no. 1, p. 248, 2023.
[6]
Y. B. David, T. Geller, I. Bistritz, I. Ben-Gal, N. Bambos, and E. Khmelnitsky,
“Wireless body area network control policies for energy-efficient health
monitoring,” Sensors, vol. 21, no. 12, p. 4245, 2021.
[7]
M. I. Maiorino, S. Signoriello, A. Maio, P. Chiodini, G. Bellastella,
L. Scappaticcio, M. Longo, D. Giugliano, and K. Esposito, “Effects of
continuous glucose monitoring on metrics of glycemic control in diabetes: a
systematic review with meta-analysis of randomized controlled trials,”
Diabetes Care, vol. 43, no. 5, pp. 1146–1156, 2020.
[8]
P. Prahalad, D. Scheinker, M. Desai, V. Y. Ding, F. K. Bishop, M. Y. Lee,
J. Ferstad, D. P. Zaharieva, A. Addala, R. Johari et al., “Equitable
implementation of a precision digital health program for glucose management
in individuals with newly diagnosed type 1 diabetes,” Nature
Medicine, pp. 1–9, 2024.
[9]
B. W. Heckman, A. R. Mathew, and M. J. Carpenter, “Treatment burden and treatment fatigue as barriers to health,” Current Opinion in Psychology, vol. 5, pp. 31–36, Oct. 2015.
[10]
L. N. Steimle and B. T. Denton, “Markov decision processes for screening and treatment of chronic diseases,” in Markov Decision Processes in Practice, pp. 189–222, 2017.
[11]
O. Alagoz, et al., “Markov decision processes: a tool for sequential decision making under uncertainty,” Medical Decision Making, vol. 30, no. 4, pp. 474–483, 2010.
[12]
D. Bertsekas, Dynamic programming and optimal control. Athena scientific, 2012, vol. II.
[13]
W. Feller, An introduction to probability theory and its applications,
Volume 2. John Wiley & Sons, 1991,
vol. 81.
for all . Here denotes the time at which the patient reaches health state . Since , for all . This implies that , where is the time at which the patient reaches health state .
Consider a infinite state 1-dimensional random walk with probability of moving forward and probability of moving backward . Let be the time taken to hit state for a random walk initialized at state . Then defined above is equal to for this random walk and . Note that . Now,
Here equality (a) follows from the independence of and equality (b) follows from the fact that follow the same distribution.
Since , the probability that the patient reaches state in the random walk starting at state is . Then [13, Chapter XIV, eqn. 4.8] gives us and
where
b)
Let . Then
Since , note that
for all . Consider a policy such that . Then . Now for any , as for . This implies that the policy cannot be optimal, and hence under the optimal policy, the patient prefers to stay in ordinary monitoring for all health states . This completes the proof for Lemma 1.
Since , eqn. (A-B) is not satisfied for . This implies that is not optimal. This implies that under the optimal policy the patient will stay under intensive monitoring for some state. Define (respectively, ). denotes the Q-functions, where action (or , respectively) is taken when initialized at health state and then actions are taken using the optimal policy. Note that if , and otherwise.
Suppose is monotonically decreasing as increases, then if action is optimal at some ,
then it will also be optimal at and so on (i.e., ). As condition (a) already enforces that policy is not optimal, action has to be taken at state under the optimal policy. Also, Lemma 1 part (b) shows that there exists such that the optimal policy for health states above is , which implies that policy is played at some state. Hence
the optimal policy has to be for some .
Hence we just need to show that is monotonically decreasing as increases to prove that is optimal.
We can show that
Now,
Hence if is true for all then is monotonically decreasing with . Now we know using [12, Proposition 2.2] that
and
With some further manipulation, we can show that
We can show that is always less than for . Hence if , then and hence is monotonically decreasing with . Under the additional condition that , this implies that is the optimal policy for some threshold .
∎