MDP Modeling in GAMS

Joined
Mar 10, 2023
Messages
1
Reaction score
0
Hi there,

I'm developing an MDP code in GAMS and can't understand the errors:

1. Firstly, I keep getting a "Error 203: Too few arguments for function" error. the function is a standard bellman V(s) = max_a (R(s,a) + gamma * sum_s' P(s'|s,a) * V(s'))

2. At the final display for optimal policy, I keep getting a "Error 767 Unexpected symbol will terminate the loop - symbol replaced by )" around the smax and "Error 409: Unrecognisable Item" the equations is an optimized bellman Q(s,a) = R(s,a) + gamma * sum_s' P(s'|s,a) * V(s')

sets
regions /r1*r11/
actions /A, B, C/
states /low, medium, high/
states_next /low, medium, high/;

parameters
reward(states,actions) /low.A -5, low.B 1, low.C 5, medium.A -5, medium.B 1, medium.C 5, high.A -5, high.B 1, high.C 5/
discount_rate /0.95/
value(states, regions)
value_new(states, regions)
epsilon /0.01/
max_iter /1000/
iter;

value(states, regions)=0;
value_new(states, regions)=0;
iter=0;

* Transition probability
* Probability of transitioning from one state to another when an action is taken
* Format: (region, current state, action, next state)
* Actions A=415V, B=33/11kV, C=330/132kV
Set transition_prob(regions, states, actions, states_next) /r1.high.A.low 0.54, r1.high.B.low 0.54, r1.high.C.low 0.54,
r2.medium.A.low 0.54, r2.medium.B.low 0.54, r2.medium.C.low 0.54,
r3.medium.A.low 0.54, r3.medium.B.low 0.54, r3.medium.C.low 0.54,
r4.medium.A.low 0.54, r4.medium.B.low 0.54, r4.medium.C.low 0.54,
r5.low.A.low 0.54, r5.low.B.low 0.54, r5.low.C.low 0.54,
r6.low.A.low 0.54, r6.low.B.low 0.54, r6.low.C.low 0.54,
r7.low.A.low 0.54, r7.low.B.low 0.54, r7.low.C.low 0.5,
r8.low.A.low 0.54, r8.low.B.low 0.54, r8.low.C.low 0.54,
r9.low.A.low 0.54, r9.low.B.low 0.54, r9.low.C.low 0.54,
r10.low.A.low 0.54, r10.low.B.low 0.54, r10.low.C.low 0.54,
r11.low.A.low 0.54, r11.low.B.low 0.54, r11.low.C.low 0.54/;

* Value iteration to convergence
while((iter = max_iter) or ((value_new - value) < epsilon),
*while(iter = max_iter,
iter = iter + 1;
value = value_new;

loop(regions,
loop(states,
loop(actions,
value_new(states, regions) = max(reward(states, actions) +
discount_rate * sum(transition_prob(regions, states, actions, states_next) * value_new(states, regions)),
value_new(states, regions))
);
);
);
);


* Print the optimal policy
display "Optimal policy for each region:";

loop(regions,
display regions;
loop(states,
display states, " action: ", actions(smax(reward(states, actions) +
discount_rate *
sum(transition_prob(regions, states, actions, states_next) * value(states_next, regions))
, actions))

);

Please kindly assist.
 
Joined
Apr 26, 2023
Messages
3
Reaction score
0
value_new(states, regions) = max(reward(states, actions), value_new(states, regions))

action = smax(reward(states, actions) + discount_rate * sum(transition_prob(regions, states, actions, states_next) * value(states_next, regions)), actions)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Change array shapes 1
Help please 8
Can't figure out what the error is here 4
GCC inline assembly 3
Register Dump 1
Data Register Block 0
Simple Processor VHDL Doubt 0
error when running app in arm-linux 1

Members online

Forum statistics

Threads
473,983
Messages
2,570,187
Members
46,748
Latest member
MerryWhitm

Latest Threads

Top