From this great tutorial on building webapps http://learn.dataiku.com/howto/code/webapps/use-python-backend.html I can see that I can use Python at the backend for larger volumes of data.

Does this extend to PySpark?
It's a good question! We're currently doing some tests on this functionality and should have a response soon.
