Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gemini handle the pdf file? #158

Open
helai78 opened this issue May 21, 2024 · 4 comments
Open

Gemini handle the pdf file? #158

helai78 opened this issue May 21, 2024 · 4 comments
Labels
component:quickstarts Issues/PR referencing quickstarts folder status:awaiting response Awaiting a response from the author type:help Support-related issues

Comments

@helai78
Copy link

helai78 commented May 21, 2024

Description of the feature request:

https://ai.google.dev/gemini-api/docs/prompting_with_media?lang=python
based on the above link, it seems not to work on the pdf file?
is my understanding right?

What problem are you trying to solve with this feature?

No response

Any other information you'd like to share?

No response

@helai78 helai78 added component:examples Issues/PR referencing examples folder component:quickstarts Issues/PR referencing quickstarts folder type:feature request New feature request/enhancement labels May 21, 2024
@singhniraj08
Copy link
Collaborator

@helai78, As shown in documentation, Supported text formats are noted here. Gemini API won't support PDF file, as application/pdf MIME type is not supported yet. Alternatively, you can use AI Studio to work with pdf files using Gemini. Thank you!

@singhniraj08 singhniraj08 added status:awaiting response Awaiting a response from the author type:help Support-related issues and removed type:feature request New feature request/enhancement component:examples Issues/PR referencing examples folder labels May 22, 2024
@helai78
Copy link
Author

helai78 commented May 22, 2024

Hello, @singhniraj08 Thank you for you clarfication.

AI Studio you mentioned is Vertex AI Gemini API which can handle pdf file. this Vertex AI is part of google could, which means 90 days free for me. is my undersanding correct?

could you tell me any alternatives to handle the pdf files with the use of gemini 1.5 pro?

thanks in adcance.

@anusonawane
Copy link

Hello @helai78 ,
Currently, there's no direct support for uploading PDF files, but we can work around this by converting the PDF to images and extracting text separately.
https://github.com/google-gemini/cookbook/blob/main/quickstarts/PDF_Files.ipynb

@helai78
Copy link
Author

helai78 commented Jul 10, 2024

Hello @helai78 , Currently, there's no direct support for uploading PDF files, but we can work around this by converting the PDF to images and extracting text separately. https://github.com/google-gemini/cookbook/blob/main/quickstarts/PDF_Files.ipynb

Hello, @anusonawane
I almost do the same thing as you mentioned, that i used the tesseract to OCR the text from the image..
but the problem is that
the image should be categorized to some types: text, data chart and picture. but the function of OCR is only good for the image with text, not good for data chart and picture. and while i just have the limited token. but it is very good challenge...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:quickstarts Issues/PR referencing quickstarts folder status:awaiting response Awaiting a response from the author type:help Support-related issues
Projects
None yet
Development

No branches or pull requests

3 participants