Skip to content

Connected Components gives wrong results #453

@wisundstrom

Description

@wisundstrom

TL;DR
I ran across a bug in spark 3.5 and graphframes 0.8.3 where when using connected components, if spark.sql.adaptive.enabled is not false, the results returned will be incorrect, with many edges being seemingly ignored when finding paths.

 
I've been working on a project that is unfortunately in an air-gapped environment, so I can't share code, but I'll try to provide whatever information I can.

We have been using Graphframes connected components for a few years, and we recently migrated to spark 3.5.0 and graphframes 0.8.3.
We also migrated from a YARN cluster with hdfs to a k8s cluster with MinIO object storage.

For us, both connected components and BFS are returning results that look like they are not using edges that are present in the edge set.

If we set spark.sql.adaptive.enabled = false, then the results appear to be calculated as we would have expected.

Mostly wanted to put this here in case anyone else is pulling their hair out over this one, but if anyone has ideas about what could be causing this it would be good to fix, silent errors like this can really be nasty.

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions