SrcLog.com

tongjingqi

👤 Developer

2 repositories on SrcLog

View on GitHub

2 Repos

449 Stars

10 Forks

449 Watchers

Repositories (2)

AI-Can-Learn-Scientific-Taste tongjingqi/AI-Can-Learn-Scientific-Taste

We propose Reinforcement Learning from Community Feedback (RLCF), a training paradigm that uses large-scale community signals as supervision, and formulate scientific taste learning as a preference modeling and alignment problem.

389 10

MathTrap tongjingqi/MathTrap Python

In this work, we investigate the compositionality of large language models (LLMs) in mathematical reasoning. Specifically, we construct a new dataset MATHTRAP‡ by introducing carefully designed logical traps into the problem descriptions of MATH and GSM8K.

60 0