RL Post-Training for LLMs: DPO, GRPO, and OPD Evolution Tutorial · Teamily