We use policy gradient to help agents learn optimal policies in a competitive multi-agent contextual bandit setting
What is the CSKrishna/Optimal-bidding-policy-using-Policy-Gradient-in-a-Multi-agent-Contextual-Bandit-setting GitHub project? Description: "We use policy gradient to help agents learn optimal policies in a competitive multi-agent contextual bandit setting". Written in Jupyter Notebook. Explain what it does, its main use cases, key features, and who would benefit from using it.
Question is copied to clipboard — paste it after the AI opens.
Clone via HTTPS
Clone via SSH
Download ZIP
Download master.zipReport bugs or request features on the Optimal-bidding-policy-using-Policy-Gradient-in-a-Multi-agent-Contextual-Bandit-setting issue tracker:
Open GitHub Issues