Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi,
I am following this tutorial to work with pdf and managed folders :
https://knowledge.dataiku.com/latest/code/managed-folders/tutorial-managed-folders.html
But reading the pdf with tabula doesn't work, i have this error message
UnsupportedOperation: seek
My managed folder is in S3, how can I read this file ?
Can you try to add at the beginning of the code
import io
tables = read_pdf(io.BytesIO(stream.read()), pages = "12-26", multiple_tables = True)
instead of
tables = read_pdf(stream, pages = "12-26", multiple_tables = True)
Can you try to add at the beginning of the code
import io
tables = read_pdf(io.BytesIO(stream.read()), pages = "12-26", multiple_tables = True)
instead of
tables = read_pdf(stream, pages = "12-26", multiple_tables = True)