OpenVLA: An Open-Source Vision-Language-Action Model
Paper • 2406.09246 • Published • 47
How to use yihannwang/openvla-libero-spatial-epoch-05-step-000685 with Transformers:
# Load model directly
from transformers import OpenVLAForActionPrediction
model = OpenVLAForActionPrediction.from_pretrained("yihannwang/openvla-libero-spatial-epoch-05-step-000685", dtype="auto")OpenVLA模型在LIBERO-Spatial数据集上fine-tuned的checkpoint。
from transformers import AutoModelForVision2Seq, AutoProcessor
import torch
# 加载模型
model = AutoModelForVision2Seq.from_pretrained(
"yihannwang/openvla-libero-spatial-epoch-05-step-000685",
trust_remote_code=True,
torch_dtype=torch.bfloat16
).to("cuda")
# 加载processor
processor = AutoProcessor.from_pretrained(
"yihannwang/openvla-libero-spatial-epoch-05-step-000685",
trust_remote_code=True
)
# 预测动作
from PIL import Image
image = Image.open("observation.jpg")
prompt = "In: What action should the robot take to pick up the object?\nOut:"
inputs = processor(prompt, image).to("cuda", dtype=torch.bfloat16)
action = model.predict_action(**inputs, unnorm_key="libero_spatial_no_noops", do_sample=False)
print(action) # 7-DoF action vector
在LIBERO-Spatial任务上进行评估:
python experiments/robot/libero/run_libero_eval.py \
--model_family openvla \
--pretrained_checkpoint yihannwang/openvla-libero-spatial-epoch-05-step-000685 \
--task_suite_name libero_spatial_no_noops \
--center_crop False \
--num_trials_per_task 50
@article{kim2024openvla,
title={OpenVLA: An Open-Source Vision-Language-Action Model},
author={Kim, Moo Jin and Pertsch, Karl and Karamcheti, Siddharth and Xiao, Ted and Balakrishna, Ashwin and Nair, Suraj and Rafailov, Rafael and Foster, Ethan and Lam, Grace and Sanketi, Pannag and Nasiriany, Soroush and Liang, Zheyuan and Sadigh, Dorsa and Levine, Sergey and Liang, Percy},
journal={arXiv preprint arXiv:2406.09246},
year={2024}
}
MIT License