0 votes
I am currently trying to connect the Common Crawl S3 to Dataiku.

I have tried different configurations. However I am not sure what to enter as "Access Key" and "Secret Key". I guess it is not my private AWS credential.
Does anyone have experience with that?

Thanks,

Matthew
asked by

3 Answers

0 votes
Best answer

Hi,

thanks for your patience. Somehow, I can't manage to connect the commoncrawl bucket.
My most recent error is the following:

So I am really unsure, whether you can access the bucket from dataiku or not.

answered by
selected by
Even though the bucket is public, if your AWS key does not have your full permissions (ie if it's a restricted IAM user), you need to grant explicit access to the commoncrawl bucket: attach the following policy to your IAM user:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1503647467000",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::commoncrawl/*",
                "arn:aws:s3:::commoncrawl"
            ]
        }
    ]
}
That works, thanks a lot!
0 votes
Hi,

Credentials-less access to S3 is not supported. However, since the "commoncrawl" bucket is public, using your private AWS credentials will work
answered by
0 votes

Hi,

this is my current setup:

However, when adding a S3 dataset I get the following error:

"Could not list buckets: The request signature we calculated does not match the signature you provided. Check your key and signing method. (Service: Amazon S3; Status Code: 403; Error Code: SignatureDoesNotMatch; Request ID: BFBCCF653E7B199D)"

 

answered by
595 questions
605 answers
440 comments
327 users