This project implements a risk-sensitive reinforcement learning approach to model human decision-making behavior in the Iowa Gambling Task (IGT). The implementation closely follows the original experimental parameters from Bechara et al. (1994) and incorporates prospect theory and conditional value at risk (CVaR) to model human-like risk sensitivity.
Based on Bechara et al. (1994):
- Number of trials: 200 (two phases of 100 trials each)
- Deck configurations:
- Deck A (High risk, high punishment):
- Reward: +100 per selection
- Punishment: -150 to -350 (frequency: 50%)
- Net expected value: -25 per card
- Deck B (High risk, infrequent punishment):
- Reward: +100 per selection
- Punishment: -1250 (frequency: 10%)
- Net expected value: -25 per card
- Deck C (Low risk, low reward):
- Reward: +50 per selection
- Punishment: -50 (frequency: 50%)
- Net expected value: +25 per card
- Deck D (Low risk, infrequent punishment):
- Reward: +50 per selection
- Punishment: -250 (frequency: 10%)
- Net expected value: +25 per card
- Deck A (High risk, high punishment):
-
Prospect Theory Parameters (based on Tversky & Kahneman, 1992):
- α (value function curvature for gains): 0.88
- β (value function curvature for losses): 0.88
- λ (loss aversion coefficient): 2.25
- Reference point: Dynamic, updated based on running average
-
Conditional Value at Risk (CVaR) Parameters:
- α (confidence level): 0.05
- λ_risk (risk sensitivity): 0.7
- Window size: 20 trials
-
Learning Parameters:
- Learning rate (α): 0.1
- Discount factor (γ): 0.95
- Exploration rate (ε): Linear decay from 1.0 to 0.1
- Batch size: 32
- Memory buffer size: 10000
- Target network update frequency: 100 steps
- Custom IGT environment following OpenAI Gym interface
- State space: [last_reward, running_average, deck_frequencies]
- Action space: Discrete(4) representing decks A-D
- Reward structure matching Bechara et al. (1994)
-
Baseline Model:
- Standard DQN with 3-layer neural network
- Layer sizes: [64, 128, 64]
- ReLU activation
- Adam optimizer (lr=0.001)
-
Risk-Sensitive Model:
- Modified DQN incorporating prospect theory value function
- CVaR risk measure in Q-value computation
- Same architecture as baseline
- Additional risk-processing layers
-
Phase 1 (Exploration): Episodes 1-100
- Higher exploration rate (ε: 1.0 → 0.3)
- Focus on learning deck characteristics
- More weight on immediate rewards
-
Phase 2 (Exploitation): Episodes 101-200
- Lower exploration rate (ε: 0.3 → 0.1)
- Increased risk sensitivity
- More weight on long-term value
Based on Bechara et al. (1994) statistics:
- Initial exploration period: ~30 trials
- Gradual shift to advantageous decks
- Final deck preferences:
- Decks A/B: ~15% each
- Decks C/D: ~35% each
-
Baseline Model:
- Faster initial learning
- Higher mean rewards
- Less risk-sensitive behavior
-
Risk-Sensitive Model:
- Slower initial learning
- More human-like deck preferences
- Better matches human risk aversion patterns
Final deck preferences (Risk-Sensitive Model vs Human Data):
- Deck A: 13.7% vs 15.4%
- Deck B: 13.0% vs 20.2%
- Deck C: 37.2% vs 34.2%
- Deck D: 36.1% vs 30.1%
- Risk aversion increases over time
- Strong correlation between losses and subsequent risk-averse choices
- CVaR effectively captures human-like loss aversion
src/igt_env.py
: IGT environment implementationsrc/train.py
: Training loop and model definitionssrc/visualization.py
: Results visualizationsrc/process_results.py
: Data processing and analysis
See requirements.txt
for full list. Key packages:
- PyTorch 1.9.0+
- Gymnasium 0.26.0+
- NumPy 1.21.0+
- Pandas 1.5.0+
- Plotly 5.3.0+
-
Bechara, A., Damasio, A. R., Damasio, H., & Anderson, S. W. (1994). Insensitivity to future consequences following damage to human prefrontal cortex. Cognition, 50(1-3), 7-15.
-
Tversky, A., & Kahneman, D. (1992). Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and Uncertainty, 5(4), 297-323.
-
Rockafellar, R. T., & Uryasev, S. (2000). Optimization of conditional value-at-risk. Journal of Risk, 2, 21-42.
-
Bechara, A., Damasio, H., Tranel, D., & Damasio, A. R. (1997). Deciding advantageously before knowing the advantageous strategy. Science, 275(5304), 1293-1295.
-
Worthy, D. A., Pang, B., & Byrne, K. A. (2013). Decomposing the roles of perseveration and expected value representation in models of the Iowa gambling task. Frontiers in Psychology, 4, 640.
- Implement additional risk measures (e.g., entropy, variance)
- Explore different neural architectures
- Add real-time visualization during training
- Incorporate physiological measures from human studies
- Extend to other decision-making tasks
MIT License - See LICENSE file for details