A high-performance differentiable cloth simulator from scratch in C++/CUDA, using a novel adjoint-based gradient computation scheme to efficiently handle contact-rich dynamics and enable scalable optimization, robotic control, and system identification.