AttributeError: 'Dataset' object has no attribute '_sc_'

jmccartin · ‎05-02-2019

Hi, this is something for which I would normally make a pull request, but you don't have a public API. I therefore thought it best if I created a bug report here instead.

Problem:

If one accidentally (or by means of code) passes an object to the `write_with_schema` function in dataiku.spark that isn’t a spark dataframe, the underlying code tries to access the spark context within that assumed dataframe, and crashes with an internal Dataiku error:


[2019/04/25-08:07:24.881] [null-out-100] [INFO] [dku.utils]  -   File "/opt/dataiku-dss-5.1.2/python/dataiku/spark/__init__.py", line 139, in write_with_schema

[2019/04/25-08:07:24.881] [null-out-100] [INFO] [dku.utils]  -     write_schema_from_dataframe(dataset, dataframe)

[2019/04/25-08:07:24.881] [null-out-100] [INFO] [dku.utils]  -   File "/opt/dataiku-dss-5.1.2/python/dataiku/spark/__init__.py", line 122, in write_schema_from_dataframe

[2019/04/25-08:07:24.881] [null-out-100] [INFO] [dku.utils]  -     dsc = __dataikuSparkContext(dataframe._sc._jvm)

[2019/04/25-08:07:24.881] [null-out-100] [INFO] [dku.utils]  - AttributeError: 'Dataset' object has no attribute '_sc'

This can happen easily if you have a function that returns a None type which gets passed to the writer instead of a dataframe, resulting in the same kind of AttributeError.

Solution:

A single line that asserts that the `dataframe` object is a spark dataframe could be added just before dataiku/spark/__init__.py line 122, where it tries to access the underlying spark context. A TypeError exception would offer a little more help to the user than the current stacktrace.

cperdigou · ‎05-02-2019

Thank you very much for this report and investigating a solution. I'll pass this information to the development team.

AttributeError: 'Dataset' object has no attribute '_sc_'

AttributeError: 'Dataset' object has no attribute '_sc_'

Labels

Preparation

Python

Spark

Sign up to take part

AttributeError: 'Dataset' object has no attribute '_sc_'

AttributeError: 'Dataset' object has no attribute '_sc_'

Labels

Preparation

Python

Spark