DEEP REINFORCEMENT LEARNING BASED OPTIMAL X2026