《Keras3:使用VisionTransformers进行物体检测》
- 软件开发
- 2025-08-21 22:00:02

《Keras 3 :使用 Vision Transformers 进行物体检测》
作者:Karan V. Dave 创建日期:2022 年 3 月 27 日最后修改时间:2023 年 11 月 20 日描述:使用 Vision Transformer 进行对象检测的简单 Keras 实现。
(i) 此示例使用 Keras 3在 Colab 中查看
GitHub 源
介绍Alexey Dosovitskiy 等人的文章Vision Transformer (ViT)架构。 表明直接应用于图像序列的纯 transformer 补丁可以在对象检测任务中表现良好。
在这个 Keras 示例中,我们实现了一个对象检测 ViT 我们在加州理工学院 101 数据集上对其进行训练,以检测给定图像中的飞机。
导入和设置 import os os.environ["KERAS_BACKEND"] = "jax" # @param ["tensorflow", "jax", "torch"] import numpy as np import keras from keras import layers from keras import ops import matplotlib.pyplot as plt import numpy as np import cv2 import os import scipy.io import shutil 准备数据集我们使用加州理工学院 101 数据集。
# Path to images and annotations path_images = "./101_ObjectCategories/airplanes/" path_annot = "./Annotations/Airplanes_Side_2/" path_to_downloaded_file = keras.utils.get_file( fname="caltech_101_zipped", origin=" data.caltech.edu/records/mzrjq-6wc02/files/caltech-101.zip", extract=True, archive_format="zip", # downloaded file format cache_dir="/", # cache and extract in current directory ) download_base_dir = os.path.dirname(path_to_downloaded_file) # Extracting tar files found inside main zip file shutil.unpack_archive( os.path.join(download_base_dir, "caltech-101", "101_ObjectCategories.tar.gz"), "." ) shutil.unpack_archive( os.path.join(download_base_dir, "caltech-101", "Annotations.tar"), "." ) # list of paths to images and annotations image_paths = [ f for f in os.listdir(path_images) if os.path.isfile(os.path.join(path_images, f)) ] annot_paths = [ f for f in os.listdir(path_annot) if os.path.isfile(os.path.join(path_annot, f)) ] image_paths.sort() annot_paths.sort() image_size = 224 # resize input images to this size images, targets = [], [] # loop over the annotations and images, preprocess them and store in lists for i in range(0, len(annot_paths)): # Access bounding box coordinates annot = scipy.io.loadmat(path_annot + annot_paths[i])["box_coord"][0] top_left_x, top_left_y = annot[2], annot[0] bottom_right_x, bottom_right_y = annot[3], annot[1] image = keras.utils.load_img( path_images + image_paths[i], ) (w, h) = image.size[:2] # resize images image = image.resize((image_size, image_size)) # convert image to array and append to list images.append(keras.utils.img_to_array(image)) # apply relative scaling to bounding boxes as per given image and append to list targets.append( ( float(top_left_x) / w, float(top_left_y) / h, float(bottom_right_x) / w, float(bottom_right_y) / h, ) ) # Convert the list to numpy array, split to train and test dataset (x_train), (y_train) = ( np.asarray(images[: int(len(images) * 0.8)]), np.asarray(targets[: int(len(targets) * 0.8)]), ) (x_test), (y_test) = ( np.asarray(images[int(len(images《Keras3:使用VisionTransformers进行物体检测》由讯客互联软件开发栏目发布,感谢您对讯客互联的认可,以及对我们原创作品以及文章的青睐,非常欢迎各位朋友分享到个人网站或者朋友圈,但转载请说明文章出处“《Keras3:使用VisionTransformers进行物体检测》”