MDP Modeling in GAMS


Joined
Mar 10, 2023
Messages
1
Reaction score
0
Hi there,

I'm developing an MDP code in GAMS and can't understand the errors:

1. Firstly, I keep getting a "Error 203: Too few arguments for function" error. the function is a standard bellman V(s) = max_a (R(s,a) + gamma * sum_s' P(s'|s,a) * V(s'))

2. At the final display for optimal policy, I keep getting a "Error 767 Unexpected symbol will terminate the loop - symbol replaced by )" around the smax and "Error 409: Unrecognisable Item" the equations is an optimized bellman Q(s,a) = R(s,a) + gamma * sum_s' P(s'|s,a) * V(s')

sets
regions /r1*r11/
actions /A, B, C/
states /low, medium, high/
states_next /low, medium, high/;

parameters
reward(states,actions) /low.A -5, low.B 1, low.C 5, medium.A -5, medium.B 1, medium.C 5, high.A -5, high.B 1, high.C 5/
discount_rate /0.95/
value(states, regions)
value_new(states, regions)
epsilon /0.01/
max_iter /1000/
iter;

value(states, regions)=0;
value_new(states, regions)=0;
iter=0;

* Transition probability
* Probability of transitioning from one state to another when an action is taken
* Format: (region, current state, action, next state)
* Actions A=415V, B=33/11kV, C=330/132kV
Set transition_prob(regions, states, actions, states_next) /r1.high.A.low 0.54, r1.high.B.low 0.54, r1.high.C.low 0.54,
r2.medium.A.low 0.54, r2.medium.B.low 0.54, r2.medium.C.low 0.54,
r3.medium.A.low 0.54, r3.medium.B.low 0.54, r3.medium.C.low 0.54,
r4.medium.A.low 0.54, r4.medium.B.low 0.54, r4.medium.C.low 0.54,
r5.low.A.low 0.54, r5.low.B.low 0.54, r5.low.C.low 0.54,
r6.low.A.low 0.54, r6.low.B.low 0.54, r6.low.C.low 0.54,
r7.low.A.low 0.54, r7.low.B.low 0.54, r7.low.C.low 0.5,
r8.low.A.low 0.54, r8.low.B.low 0.54, r8.low.C.low 0.54,
r9.low.A.low 0.54, r9.low.B.low 0.54, r9.low.C.low 0.54,
r10.low.A.low 0.54, r10.low.B.low 0.54, r10.low.C.low 0.54,
r11.low.A.low 0.54, r11.low.B.low 0.54, r11.low.C.low 0.54/;

* Value iteration to convergence
while((iter = max_iter) or ((value_new - value) < epsilon),
*while(iter = max_iter,
iter = iter + 1;
value = value_new;

loop(regions,
loop(states,
loop(actions,
value_new(states, regions) = max(reward(states, actions) +
discount_rate * sum(transition_prob(regions, states, actions, states_next) * value_new(states, regions)),
value_new(states, regions))
);
);
);
);


* Print the optimal policy
display "Optimal policy for each region:";

loop(regions,
display regions;
loop(states,
display states, " action: ", actions(smax(reward(states, actions) +
discount_rate *
sum(transition_prob(regions, states, actions, states_next) * value(states_next, regions))
, actions))

);

Please kindly assist.
 
Ad

Advertisements


Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top