Abstract
We develop a framework for obtaining polynomial time approximation schemes (PTAS) for a class of stochastic dynamic programs. Using our framework, we obtain the first PTAS for the following stochastic combinatorial optimization problems:
1) Probemax [Munagala, 2016]: We are given a set of n items, each item i in [n] has a value X_i which is an independent random variable with a known (discrete) distribution pi_i. We can probe a subset P subseteq [n] of items sequentially. Each time after {probing} an item i, we observe its value realization, which follows the distribution pi_i. We can adaptively probe at most m items and each item can be probed at most once. The reward is the maximum among the m realized values. Our goal is to design an adaptive probing policy such that the expected value of the reward is maximized. To the best of our knowledge, the best known approximation ratio is 11/e, due to Asadpour et al. [Asadpour and Nazerzadeh, 2015]. We also obtain PTAS for some generalizations and variants of the problem.
2) Committed Pandora's Box [Weitzman, 1979; Singla, 2018]: We are given a set of n boxes. For each box i in [n], the cost c_i is deterministic and the value X_i is an independent random variable with a known (discrete) distribution pi_i. Opening a box i incurs a cost of c_i. We can adaptively choose to open the boxes (and observe their values) or stop. We want to maximize the expectation of the realized value of the last opened box minus the total opening cost.
3) Stochastic Target [{I}lhan et al., 2011]: Given a predetermined target T and n items, we can adaptively insert the items into a knapsack and insert at most m items. Each item i has a value X_i which is an independent random variable with a known (discrete) distribution. Our goal is to design an adaptive policy such that the probability of the total values of all items inserted being larger than or equal to T is maximized. We provide the first bicriteria PTAS for the problem.
4) Stochastic Blackjack Knapsack [Levin and Vainer, 2014]: We are given a knapsack of capacity C and probability distributions of n independent random variables X_i. Each item i in [n] has a size X_i and a profit p_i. We can adaptively insert the items into a knapsack, as long as the capacity constraint is not violated. We want to maximize the expected total profit of all inserted items. If the capacity constraint is violated, we lose all the profit. We provide the first bicriteria PTAS for the problem.
BibTeX  Entry
@InProceedings{fu_et_al:LIPIcs:2018:9060,
author = {Hao Fu and Jian Li and Pan Xu},
title = {{A PTAS for a Class of Stochastic Dynamic Programs}},
booktitle = {45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)},
pages = {56:156:14},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {9783959770767},
ISSN = {18688969},
year = {2018},
volume = {107},
editor = {Ioannis Chatzigiannakis and Christos Kaklamanis and D{\'a}niel Marx and Donald Sannella},
publisher = {Schloss DagstuhlLeibnizZentrum fuer Informatik},
address = {Dagstuhl, Germany},
URL = {http://drops.dagstuhl.de/opus/volltexte/2018/9060},
URN = {urn:nbn:de:0030drops90609},
doi = {10.4230/LIPIcs.ICALP.2018.56},
annote = {Keywords: stochastic optimization, dynamic program, markov decision process, block policy, approximation algorithm}
}
Keywords: 

stochastic optimization, dynamic program, markov decision process, block policy, approximation algorithm 
Collection: 

45th International Colloquium on Automata, Languages, and Programming (ICALP 2018) 
Issue Date: 

2018 
Date of publication: 

04.07.2018 