Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hello all,
I am working on a project where I have to access images and files from an S3 folder. I have the folder within my flow paired with a Python recipe which performs the computation.
I would ideally be able to employ some directory to access these files, similar to how I could with a project on my local machine to access something with a directory path.
Any help is much appreciated!
I am not sure what you are asking here. You can create a Dataiku Managed folder in an S3 bucket then you can access the managed folder via Python. Is that what you want?
Apologies for the vagueness, I already have a Dataiku managed folder within the S3 bucket set up. Currently I have a Python recipe from that folder in the flow. My current road block is with the implementation of a package which requires a parameter being the path of a file within the folder.
I printed the current working directory, being:
/data/dataiku/dss_data/jupyter-run/dku-workdirs/[PROJ_NAME]/notebook_editor_for_[FORMULA_NAME]/ipythondir/profile_default/db
The directory of the S3 bucket within AWS is:
AmazonS3/Buckets/[dept.]/dataiku/[PROJ_NAME]/[*folder*]
I'm just confused regarding the file structure of Dataiku, and how to access this folder.
Hope that cleared things up, thanks!
In order to interact with a Dataiku managed folder you need to use the Dataiku API. Also because this code may run outside of the DSS server you should use the external API. Here is some sample code:
import dataikuapi
host = "http://localhost:11200"
apiKey = "some_key"
client = dataikuapi.DSSClient(host, apiKey)
project = client.get_project('MY_PROJECT')
folder = project.get_managed_folder("my_folder_id")
for content in folder.list_contents()['items']:
last_modified_seconds = content["lastModified"] / 1000
last_modified_str = datetime.fromtimestamp(last_modified_seconds).strftime("%Y-%m-%d %H:%m:%S")
print("size=%s mtime=%s %s" % (content["size"], last_modified_str, content["path"]))
Full API method list here: https://developer.dataiku.com/latest/api-reference/python/managed-folders.html#dataikuapi.dss.manage...