Instructions to use kpyu/video-blip-flan-t5-xl-ego4d with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use kpyu/video-blip-flan-t5-xl-ego4d with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "image-to-text" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("image-to-text", model="kpyu/video-blip-flan-t5-xl-ego4d")# Load model directly from transformers import AutoProcessor, AutoModelForSeq2SeqLM processor = AutoProcessor.from_pretrained("kpyu/video-blip-flan-t5-xl-ego4d") model = AutoModelForSeq2SeqLM.from_pretrained("kpyu/video-blip-flan-t5-xl-ego4d") - Notebooks
- Google Colab
- Kaggle
VideoBLIP, Flan T5-xl, fine-tuned on Ego4D
VideoBLIP model, leveraging BLIP-2 with Flan T5-xl (a large language model with 2.7 billion parameters) as its LLM backbone.
Model description
VideoBLIP is an augmented BLIP-2 that can handle videos.
Bias, Risks, Limitations, and Ethical Considerations
VideoBLIP-OPT uses off-the-shelf Flan-T5 as the language model. It inherits the same risks and limitations from Flan-T5:
Language models, including Flan-T5, can potentially be used for language generation in a harmful way, according to Rae et al. (2021). Flan-T5 should not be used directly in any application, without a prior assessment of safety and fairness concerns specific to the application.
VideoBLIP has not been tested in real world applications. It should not be directly deployed in any applications. Researchers should first carefully assess the safety and fairness of the model in relation to the specific context they’re being deployed within.
How to use
For code examples, please refer to the official repository.
- Downloads last month
- 67