You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For connected component computation, we need to replace manual Parquet checkpointing with DataFrame.checkpoint().
Simplify the code by leveraging the built-in DataFrame.checkpoint() API, which is available in Spark ≥ 2.3.
Reduce potential correctness issues and maintenance burden, especially those related to S3 eventual consistency, manual path management, and explicit reloads.
Align with Spark best practices, where checkpointing can be triggered eagerly or lazily, depending on workflow needs.