Text-based diffusion models have made signiffcant breakthroughs in generating high-quality images and videos from textual descriptions. However, the lengthy sampling time of the denoising process remains a signiffcant bottleneck in practical applications. Previous methods either ignore the statistical relationships between adjacent steps or rely on attention or feature similarity between them, which often only works with specifc network structures. To address this issue, we want to propose a novel training-free method which both enhances the efficiency of diffusion models and minimizes additional resources.