-
Notifications
You must be signed in to change notification settings - Fork 252
Description
TL;DR
I ran across a bug in spark 3.5 and graphframes 0.8.3 where when using connected components, if spark.sql.adaptive.enabled
is not false
, the results returned will be incorrect, with many edges being seemingly ignored when finding paths.
I've been working on a project that is unfortunately in an air-gapped environment, so I can't share code, but I'll try to provide whatever information I can.
We have been using Graphframes connected components for a few years, and we recently migrated to spark 3.5.0 and graphframes 0.8.3.
We also migrated from a YARN cluster with hdfs to a k8s cluster with MinIO object storage.
For us, both connected components and BFS are returning results that look like they are not using edges that are present in the edge set.
If we set spark.sql.adaptive.enabled = false
, then the results appear to be calculated as we would have expected.
Mostly wanted to put this here in case anyone else is pulling their hair out over this one, but if anyone has ideas about what could be causing this it would be good to fix, silent errors like this can really be nasty.