Yijia Wang

Publications and Papers Under Review

Approximate Dynamic Programming / Machine Learning

Faster Approximate Dynamic Programming by Freezing Slow States
Yijia Wang, Daniel R. Jiang
Submitted, 2023.
Brief Description: We consider fast-slow MDPs, where certain states move “fast” while other parts of the state space transition more “slowly.” This is common when decisions need to be made at high frequencies, yet information that varies at a slower timescale also influences the optimal policy. We propose several new algorithms, each based on the idea of periodically “freezing” and then “releasing” slow states, leading to dramatic computational benefits.
ArXiv
Structured Actor-Critic for Managing Public Health Points-of-Dispensing
Yijia Wang, Daniel R. Jiang
Under revision, 2022.
Brief Description: We consider the setting of public health medical inventory control/dispensing and propose a new actor-critic algorithm that tracks both policy and value function approximations. The algorithm utilizes structure in both the policy and value to improve the empirical convergence rate. We also provide a case study for the problem of dispensing naloxone (an overdose reversal drug) amidst the ongoing opioid crisis.
arXiv
One Backward from Ten Forward, Subsampling for Large-Scale Deep Learning
Chaosheng Dong, Xiaojie Jin, Weihao Gao, Yijia Wang, Hongyi Zhang, Xiang Wu, Jianchao Yang, Xiaobing Liu
The first International Workshop on Data-Efficient Machine Learning (DeMaL 2021).
Brief Description: We propose to record a constant amount of information per instance in large-scale machine learning systems from the forward passes. The extra information measurably improves the selection of which data instances should participate in forward and backward passes. A novel optimization framework is proposed to analyze this problem and we provide an efficient approximation algorithm under the framework of Mini-batch gradient descent as a practical solution.
ArXiv
Inverse multiobjective optimization through online learning
Chaosheng Dong, Yijia Wang, Bo Zeng
Submitted to ICLR 2023.
Brief Description: We study the problem of learning the objective functions or constraints of a multiobjective decision making model, based on a set of sequentially arrived decisions. In particular, these decisions might not be exact and possibly carry measurement noise or are generated with the bounded rationality of decision makers. In this paper, we propose a general online learning framework to deal with this learning problem using inverse multiobjective optimization. More precisely, we develop two online learning algorithms with implicit update rules which can handle noisy data.
ArXiv
Exploration via Cost-Aware Subgoal Design
Yijia Wang, Matthias Poloczek, Daniel R. Jiang
Preliminary version at Task-Agnostic Reinforcement Learning Workshop at ICLR 2019.
Brief Description: We consider problems where an agent faces an unknown task (drawn from a distribution of MDPs) in the future and is given prior opportunities to “practice” on related tasks where the interactions are still expensive. We propose a one-step Bayes-optimal algorithm for selecting subgoal designs, along with the number of episodes and the episode length during training, to efficiently maximize the expected performance of the agent at test time.
ArXiv

Supply chain management / Game theory

Capacity procurement in logistics service supply chain with demand updating and rational expectation behavior
Weihua Liu, Donglei Zhu, Yijia Wang
Asia-Pacific Journal of Operational Research, 34(06), 2017.
Brief Description: This paper discusses the applicable conditions of two capacity procurement sub-strategies with two ordering opportunities, where for the second ordering opportunity, sub-strategy I only allows increasing purchase quantity, while sub-strategy II allows either increasing or reducing order quantities. The system is composed of an integrator and a provider. Based on the assumption of Bayesian updating, this paper investigates the conditions for sub-strategy II outperforming sub-strategy I. The main results are verified through numerical analysis.
Online
Quality control game model in logistics service supply chain based on different combinations of risk attitude
Weihua Liu, Yijia Wang
International Journal of Production Economics, 161, pp. 181-191, 2015.
Brief Description: We consider the quality control game of supply chain with an integrator and a provider, and study the impact of different combinations of risk attitudes of both players on the mixed-strategy Nash equilibrium. Results show that the integrator prefers risk-seeking provider in order to obtain smaller supervision possibility and larger compliance possibility.
Online
A multi-period order allocation model of two-echelon logistics service supply chain based on inequity aversion theory
Weihua Liu, Zhicheng Liang, Yang Liu, Yijia Wang, Qian Wang
International Journal of Shipping and Transport Logistics, 7(2), pp. 197-220, 2015.
Brief Description: We consider a multi-period order allocation problem in a supply chain with an integrator and multiple providers who care about the inequity. The integrator has two goals: to maximise its profit and to maximise the comprehensive order allocation utility (COAU) of the providers. The results show that the profit of the integrator and the COAU of the providers decrease as the inequality aversion coefficients increase; and postponed improvement adopted to order allocation model has better effects than that of prompt improvement.
Online
The influence analysis of number of functional logistics service providers on quality supervision game in LSSC with compensation strategy
Weihua Liu, Yijia Wang, Zhicheng Liang, Xiaoyan Liu
Abstract and Applied Analysis, Vol. 2014, special issue (1), pp. 558-577, 2014.
Brief Description: We consider the quality control game of supply chain with an integrator and multiple providers, and study the impact of competition of providers on the mixed-strategy Nash equilibrium. Results show that under competition the ordinary mixed payment contract cannot optimize all the quality supervision game parameters. Therefore, we incorporate different compensation mechanisms in the model, including fixed, linear, and nonlinear compensation mechanisms, and show the optimal compensation mechanisms.
Online
A time scheduling model of logistics service supply chain based on the customer order decoupling point: a perspective from the constant service operation time
Weihua Liu, Yi Yang, Haitao Xu, Xiaoyan Liu, Yijia Wang, Zhicheng Liang
The Scientific World Journal, 2014.
Brief Description: We study the time scheduling of a supply chain considering the customer order decoupling point, and show that gap between the order completion time and the scheduling time should be limited. Also, the increase in supply chain comprehensive performance caused by the increase in the relationship coefficient of the integrator is limited.
Online