sam-vit-base辅助检测卡车的可拽雨覆完全覆盖

手机
2025-09-20 22:03:03

还是续着这个需求做技术预研

DINOv2 + yolov8 + opencv 检测卡车的可拉拽雨覆是否完全覆盖-CSDN博客文章浏览阅读680次，点赞24次，收藏10次。加载 DINOv2 主干网络并冻结参数# 添加分割头，适配 518x518 输入# 从 DINOv2 提取特征并通过分割头生成分割结果作用定义基于 DINOv2 的语义分割模型，使用预训练的 ViT（Vision Transformer）作为主干网络，冻结其参数以减少计算量。添加自定义分割头（segmentation_head），将 37x37 的特征图上采样并生成 3 类的分割结果（背景、车斗、覆盖布）。forward 方法处理输入图像（518x518），输出分割 logits。 blog.csdn.net/u011564831/article/details/145800227之前使用DINOv2 做分割，存在场景使用不明确的问题

后来找了另一个语义分割模型 huggingface.co/facebook/sam-vit-base huggingface.co/facebook/sam-vit-base

DINOv2 和 SAM相比方面DINOv2 的缺点SAM 的优势设计目标非专为分割设计，需额外分割头原生支持分割，开箱即用分辨率和细节特征图分辨率低，细节丢失严重高分辨率掩码，保留细节语义理解无直接语义能力，需监督训练零样本分割，结合提示可加语义计算效率计算成本高，需额外处理优化推理效率，提示驱动更灵活提示支持无提示机制，缺乏交互性支持多种提示，适应性强 DINOv2：泛化能力强，但需要针对具体任务（如语义分割）进行微调。不依赖提示，但这也意味着它无法像 SAM 那样通过用户交互快速聚焦目标区域。 SAM：支持提示驱动的分割（点、框、文本），可以快速适应不同场景和用户需求。在无提示的情况下，SAM 也能自动生成多个候选掩码，具有更高的灵活性。使用SAM

使用了甲方提供的一个类似的摄像头位置的图片

先利用官方提供的API 做整图语义分割

下载facebook/sam-vit-base

huggingface-cli download --resume-download facebook/sam-vit-base --local-dir ./sam-vit-base/

运行程序

输出掩码图片，可以看到车斗上方的空间被识别一块独立的区域

在pipline 全局进行分割的基础上我们需要把注意力集中到卡车所在位置上

输入点的引导作用：输入点 [[450, 600]] 是用户提供的提示，告诉 SAM 模型关注图像中的特定区域（假设是卡车的中心或显著部分）。后续我会继续使用yolov8来识别卡车找到中心位置SAM 使用这个点作为种子，生成与该点最相关的分割掩码。 SAM 的分割能力： SAM 模型经过大规模训练（SA-1B 数据集），能够根据提示分割任意对象。它通过 Transformer 架构和掩码解码器，捕捉输入点周围的上下文特征，生成高质量的掩码。对于 trunk2.jpg，SAM 可能识别出卡车的轮廓，并生成多个候选掩码（例如整个卡车、车斗、背景）。 IoU 分数的作用： SAM 输出的 iou_scores 评估每个掩码与提示点的匹配程度。分数最高的掩码（0.9669）通常是覆盖主要物体的掩码，因为它最符合输入点的预期区域。例如，如果 [450, 600] 在卡车上，最佳掩码会分割出卡车的主要部分，而不是背景或其他次要区域。后处理调整分辨率： post_process_masks 将掩码从模型分辨率（例如 256x256）调整到原始图像大小（1440x2560），保留了分割的细节。这确保掩码与图像像素级对齐，精确覆盖主要物体。可视化突出主要物体： show_mask 使用随机颜色叠加掩码，使分割区域在图像上突出显示。因为 best_mask 已聚焦于主要物体（通过 IoU 选择），最终图片自然集中于卡车。 import torch from PIL import Image from transformers import SamModel, SamProcessor, pipeline import matplotlib.pyplot as plt import numpy as np from scipy.ndimage import label # 用于连通区域分析 # 检查 GPU 可用性并设置设备 device = "cuda" if torch.cuda.is_available() else "cpu" print(f"使用设备: {device}") # 加载模型和处理器 model = SamModel.from_pretrained("./sam-vit-base").to(device) processor = SamProcessor.from_pretrained("./sam-vit-base") # 加载图像 img_url = "./trunk2.jpg" raw_image = Image.open(img_url).convert("RGB") # 定义输入点 input_points = [[[450, 600]]] # 窗口的 2D 定位点 # 预处理输入并移到指定设备 inputs = processor(raw_image, input_points=input_points, return_tensors="pt").to(device) # 运行模型推理 with torch.no_grad(): outputs = model(**inputs) # 后处理掩码 masks = processor.image_processor.post_process_masks( outputs.pred_masks.cpu(), inputs["original_sizes"].cpu(), inputs["reshaped_input_sizes"].cpu() )[0] # [num_masks, height, width] scores = outputs.iou_scores.cpu() # [1, 1, num_masks] print('scores:', scores) print('masks shape:', masks.shape) # 选择最佳掩码（基于 IoU 分数） best_mask_idx = scores.argmax(dim=2).item() # 选择最高 IoU 的掩码索引 best_mask = masks[best_mask_idx] # [height, width] print('best_mask shape:', best_mask.shape) # 分析掩码结构 best_mask_np = best_mask.numpy() # 转换为 NumPy 数组 if best_mask_np.dtype != bool: best_mask_np = best_mask_np > 0.5 # 二值化（如果不是布尔型） # 计算掩码覆盖的像素数 object_pixels = np.sum(best_mask_np) total_pixels = best_mask_np.size print(f"掩码覆盖的像素数: {object_pixels} / {total_pixels} ({object_pixels / total_pixels * 100:.2f}%)") # 连通区域分析 labeled_array, num_features = label(best_mask_np) print(f"连通区域数量: {num_features}") if num_features > 0: for i in range(1, num_features + 1): area = np.sum(labeled_array == i) print(f"区域 {i} 的像素数: {area}") # 释放 GPU 内存 del outputs torch.cuda.empty_cache() # 使用 pipeline 生成掩码 generator = pipeline("mask-generation", model="./sam-vit-base", device=0 if device == "cuda" else -1) outputs_pipeline = generator( img_url, points_per_batch=64, pred_iou_thresh=0.88, stability_score_thresh=0.9 ) # 定义可视化函数 def show_mask(mask, ax, random_color=False): if random_color: color = np.concatenate([np.random.random(3), np.array([0.6])], axis=0) else: color = np.array([30 / 255, 144 / 255, 255 / 255, 0.6]) mask = np.array(mask) if mask.ndim == 3: mask = mask[0] # 取第一个通道 elif mask.ndim != 2: raise ValueError(f"掩码维度 {mask.shape} 不符合预期") h, w = mask.shape mask_image = mask.reshape(h, w, 1) * color.reshape(1, 1, -1) ax.imshow(mask_image) # 可视化 pipeline 生成的掩码并保存 plt.figure(figsize=(8, 8)) plt.imshow(np.array(raw_image)) ax = plt.gca() for mask in outputs_pipeline["masks"]: show_mask(mask, ax=ax, random_color=True) plt.axis("off") plt.savefig("./pipeline_masks.png", bbox_inches='tight', pad_inches=0, dpi=300) plt.close() # 可视化手动推理的最佳掩码并保存 plt.figure(figsize=(8, 8)) plt.imshow(np.array(raw_image)) ax = plt.gca() show_mask(best_mask, ax=ax, random_color=True) plt.axis("off") plt.title("Manual Inference Best Mask") plt.savefig("./manual_best_mask.png", bbox_inches='tight', pad_inches=0, dpi=300) plt.close() print("图片已保存到当前目录：pipeline_masks.png 和 manual_best_mask.png")

使用设备: cuda scores: tensor([[[0.8627, 0.7113, 0.7364]]]) masks shape: torch.Size([1, 3, 1440, 2560]) best_mask shape: torch.Size([3, 1440, 2560]) 掩码覆盖的像素数: 1505626 / 11059200 (13.61%) 连通区域数量: 16 区域 1 的像素数: 4150 区域 2 的像素数: 3762 区域 3 的像素数: 51 区域 4 的像素数: 39 区域 5 的像素数: 353 区域 6 的像素数: 2605 区域 7 的像素数: 2 区域 8 的像素数: 6 区域 9 的像素数: 222 区域 10 的像素数: 175 区域 11 的像素数: 59 区域 12 的像素数: 1493623 区域 13 的像素数: 43 区域 14 的像素数: 69 区域 15 的像素数: 119 区域 16 的像素数: 348 图片已保存到当前目录：pipeline_masks.png 和 manual_best_mask.png

输出最佳掩码效图片效果

计算遮盖率代码整体思路目标：输入图像 trunk2.jpg，检测卡车位置。使用 YOLOv8 推导出车斗和雨布的提示点。通过 SAM 模型生成车斗和雨布的分割掩码。计算车斗掩码与雨布掩码的交集，判断覆盖比例。可视化结果并保存图片主要步骤：检测卡车：用 YOLOv8 找到卡车的边界框。推导提示点：基于边界框几何位置定义车斗和雨布的中心点。生成掩码：用 SAM 模型分割车斗和雨布区域。计算覆盖：比较两个掩码的重叠程度。可视化：展示分割结果和交集区域。推导提示点 x1, y1, x2, y2 = truck_boxes[0] truck_width = x2 - x1 truck_height = y2 - y1 truck_bed_x = x1 + truck_width // 2 truck_bed_y = y1 + int(truck_height * 0.75) truck_bed_point = [[[truck_bed_x, truck_bed_y]]] print(f"车斗中心点: {truck_bed_point}") tarp_x = x1 + truck_width // 2 tarp_y = y1 + int(truck_height * 0.25) tarp_point = [[[tarp_x, tarp_y]]] print(f"雨布中心点: {tarp_point}")

需要根据实际摄像头角度做调整，比如下面提示点就出现错误

手动调整了下提示位置

假设只处理第一辆检测到的卡车，提取其边界框 [x1, y1, x2, y2]。计算卡车宽度和高度。

格式化为 SAM 所需的提示点结构 [[[x, y]]]。通过几何假设，将 YOLOv8 的检测结果转化为 SAM 的输入提示点。

选择最佳掩码 truck_bed_scores = outputs_truck_bed.iou_scores.cpu() tarp_scores = outputs_tarp.iou_scores.cpu() print('truck bed scores:', truck_bed_scores) print('masks_truck_bed shape:', masks_truck_bed.shape) print('tarp scores:', tarp_scores) print('masks_tarp shape:', masks_tarp.shape) truck_bed_best_idx = truck_bed_scores.argmax(dim=2).item() tarp_best_idx = tarp_scores.argmax(dim=2).item() if truck_bed_best_idx >= masks_truck_bed.shape[0]: print(f"警告: truck_bed_best_idx ({truck_bed_best_idx}) 超出掩码数量 ({masks_truck_bed.shape[0]})，使用第一个掩码") truck_bed_best_idx = 0 if tarp_best_idx >= masks_tarp.shape[0]: print(f"警告: tarp_best_idx ({tarp_best_idx}) 超出掩码数量 ({masks_tarp.shape[0]})，使用第一个掩码") tarp_best_idx = 0 truck_bed_mask = masks_truck_bed[truck_bed_best_idx] tarp_mask = masks_tarp[tarp_best_idx] 获取 IoU 分数并打印掩码形状，分析 SAM 输出。使用 argmax 选择分数最高的掩码索引。添加保护逻辑：如果索引超出掩码数量（例如 masks_tarp 只有 1 个掩码），回退到第一个掩码。提取最佳掩码：truck_bed_mask 和 tarp_mask，形状为 [height, width]。

目的：确保选择最符合提示点的掩码，同时处理掩码数量不一致的情况。

计算覆盖情况 truck_bed_mask_np = truck_bed_mask.numpy() > 0.5 tarp_mask_np = tarp_mask.numpy() > 0.5 intersection = np.logical_and(truck_bed_mask_np, tarp_mask_np) truck_bed_area = np.sum(truck_bed_mask_np) intersection_area = np.sum(intersection) coverage_ratio = intersection_area / truck_bed_area if truck_bed_area > 0 else 0 print(f"车斗面积: {truck_bed_area} 像素") print(f"交集面积: {intersection_area} 像素") print(f"覆盖比例: {coverage_ratio * 100:.2f}%") if coverage_ratio >= 0.95: print("车斗上方空间被雨布完全覆盖") else: print("车斗上方空间未被雨布完全覆盖") 将掩码二值化（> 0.5），转换为布尔数组。计算交集（logical_and）：车斗和雨布重叠的区域。计算面积：

truck_bed_area：车斗掩码的像素数。 intersection_area：交集的像素数。计算覆盖比例：intersection_area / truck_bed_area，判断是否接近 100%（阈值 95%）。目的：量化雨布对车斗的覆盖程度，提供判断依据。

可视化结果定义 show_mask 函数：将掩码转换为 RGB 图像，叠加到原始图像上。可视化车斗和雨布掩码（并排放置），保存为 truck_bed_and_tarp_masks.png。可视化交集区域，保存为 intersection_mask.png。 plt.figure(figsize=(12, 6)) plt.subplot(1, 2, 1) plt.imshow(np.array(raw_image)) ax = plt.gca() show_mask(truck_bed_mask, ax=ax, random_color=True) plt.title("Truck Bed Mask") plt.axis("off") plt.subplot(1, 2, 2) plt.imshow(np.array(raw_image)) ax = plt.gca() show_mask(tarp_mask, ax=ax, random_color=True) plt.title("Tarp Mask") plt.axis("off") plt.savefig("./truck_bed_and_tarp_masks.png", bbox_inches='tight', pad_inches=0, dpi=300) plt.figure(figsize=(8, 8)) plt.imshow(np.array(raw_image)) ax = plt.gca() show_mask(intersection, ax=ax, random_color=True) plt.title("Intersection of Truck Bed and Tarp") plt.axis("off") plt.savefig("./intersection_mask.png", bbox_inches='tight', pad_inches=0, dpi=300)

经过思考，多次修改和试验后得到代码

import torch from PIL import Image from transformers import SamModel, SamProcessor from ultralytics import YOLO import matplotlib.pyplot as plt import numpy as np # 检查 GPU 可用性并设置设备 device = "cuda" if torch.cuda.is_available() else "cpu" print(f"使用设备: {device}") # 加载 YOLOv8 模型 yolo_model = YOLO("./yolov8n.pt") # 加载 SAM 模型和处理器 sam_model = SamModel.from_pretrained("./sam-vit-base").to(device) sam_processor = SamProcessor.from_pretrained("./sam-vit-base") # 加载图像 img_url = "./trunk2.jpg" raw_image = Image.open(img_url).convert("RGB") # 使用 YOLOv8 检测卡车 results = yolo_model(raw_image, conf=0.3) truck_boxes = [] for result in results: boxes = result.boxes for box in boxes: cls = int(box.cls[0]) if cls == 7: # truck 的类别 ID 为 7 x1, y1, x2, y2 = box.xyxy[0].tolist() truck_boxes.append((int(x1), int(y1), int(x2), int(y2))) print(f"检测到卡车: ({x1}, {y1}, {x2}, {y2})") if not truck_boxes: raise ValueError("图像中未检测到卡车，请检查 yolov8n.pt 或图像内容") # 假设只处理第一辆检测到的卡车 x1, y1, x2, y2 = truck_boxes[0] truck_width = x2 - x1 truck_height = y2 - y1 # 调整提示点推导（假设卡车为侧面视图，车头在左，车斗在右） # truck_bed_point：车斗中心点，取边界框右侧中心 truck_bed_x = x1 + int(truck_width * 0.8) # 右侧 75% 位置 truck_bed_y = y1 + truck_height * 0.25 # 垂直中心 truck_bed_point = [[[truck_bed_x, truck_bed_y]]] print(f"车斗中心点: {truck_bed_point}") # tarp_point：雨布中心点，取车斗上部（基于车斗位置） tarp_x = x1 + int(truck_width * 0.6) tarp_y = y1 - truck_height* 0.20 # 上部 25% 位置 tarp_point = [[[tarp_x, tarp_y]]] print(f"雨布中心点: {tarp_point}") # 可视化提示点（调试用） plt.figure(figsize=(8, 8)) plt.imshow(np.array(raw_image)) plt.scatter([truck_bed_x], [truck_bed_y], c='red', label='Truck Bed Point') plt.scatter([tarp_x], [tarp_y], c='blue', label='Tarp Point') plt.legend() plt.axis("off") plt.savefig("./prompt_points.png", bbox_inches='tight', dpi=300) plt.close() # 使用 SAM 模型生成车斗和雨布掩码 inputs_truck_bed = sam_processor(raw_image, input_points=truck_bed_point, return_tensors="pt").to(device) inputs_tarp = sam_processor(raw_image, input_points=tarp_point, return_tensors="pt").to(device) with torch.no_grad(): outputs_truck_bed = sam_model(**inputs_truck_bed) outputs_tarp = sam_model(**inputs_tarp) # 后处理掩码 masks_truck_bed = sam_processor.image_processor.post_process_masks( outputs_truck_bed.pred_masks.cpu(), inputs_truck_bed["original_sizes"].cpu(), inputs_truck_bed["reshaped_input_sizes"].cpu() )[0] masks_tarp = sam_processor.image_processor.post_process_masks( outputs_tarp.pred_masks.cpu(), inputs_tarp["original_sizes"].cpu(), inputs_tarp["reshaped_input_sizes"].cpu() )[0] # 选择最佳掩码 truck_bed_scores = outputs_truck_bed.iou_scores.cpu() tarp_scores = outputs_tarp.iou_scores.cpu() print('truck bed scores:', truck_bed_scores) print('masks_truck_bed shape:', masks_truck_bed.shape) print('tarp scores:', tarp_scores) print('masks_tarp shape:', masks_tarp.shape) truck_bed_best_idx = truck_bed_scores.argmax(dim=2).item() tarp_best_idx = tarp_scores.argmax(dim=2).item() if truck_bed_best_idx >= masks_truck_bed.shape[0]: print(f"警告: truck_bed_best_idx ({truck_bed_best_idx}) 超出掩码数量 ({masks_truck_bed.shape[0]})，使用第一个掩码") truck_bed_best_idx = 0 if tarp_best_idx >= masks_tarp.shape[0]: print(f"警告: tarp_best_idx ({tarp_best_idx}) 超出掩码数量 ({masks_tarp.shape[0]})，使用第一个掩码") tarp_best_idx = 0 truck_bed_mask = masks_truck_bed[truck_bed_best_idx] tarp_mask = masks_tarp[tarp_best_idx] print('truck_bed_mask shape:', truck_bed_mask.shape) print('tarp_mask shape:', tarp_mask.shape) # 释放 GPU 内存 del outputs_truck_bed, outputs_tarp torch.cuda.empty_cache() # 定义可视化函数 def show_mask(mask, ax, random_color=False): if random_color: color = np.concatenate([np.random.random(3), np.array([0.6])], axis=0) else: color = np.array([30 / 255, 144 / 255, 255 / 255, 0.6]) mask = np.array(mask) if mask.ndim == 3: mask = mask[0] elif mask.ndim != 2: raise ValueError(f"掩码维度 {mask.shape} 不符合预期") h, w = mask.shape mask_image = mask.reshape(h, w, 1) * color.reshape(1, 1, -1) ax.imshow(mask_image) # 可视化车斗和雨布掩码 plt.figure(figsize=(12, 6)) plt.subplot(1, 2, 1) plt.imshow(np.array(raw_image)) ax = plt.gca() show_mask(truck_bed_mask, ax=ax, random_color=True) plt.title("Truck Bed Mask") plt.axis("off") plt.subplot(1, 2, 2) plt.imshow(np.array(raw_image)) ax = plt.gca() show_mask(tarp_mask, ax=ax, random_color=True) plt.title("Tarp Mask") plt.axis("off") plt.savefig("./truck_bed_and_tarp_masks.png", bbox_inches='tight', pad_inches=0, dpi=300) plt.close() # 计算覆盖情况 truck_bed_mask_np = truck_bed_mask.numpy() > 0.5 tarp_mask_np = tarp_mask.numpy() > 0.5 intersection = np.logical_and(truck_bed_mask_np, tarp_mask_np) truck_bed_area = np.sum(truck_bed_mask_np) intersection_area = np.sum(intersection) coverage_ratio = intersection_area / truck_bed_area if truck_bed_area > 0 else 0 print(f"车斗面积: {truck_bed_area} 像素") print(f"交集面积: {intersection_area} 像素") print(f"覆盖比例: {coverage_ratio * 100:.2f}%") if coverage_ratio >= 0.95: print("车斗上方空间被雨布完全覆盖") else: print("车斗上方空间未被雨布完全覆盖") # 可视化交集区域 plt.figure(figsize=(8, 8)) plt.imshow(np.array(raw_image)) ax = plt.gca() show_mask(intersection, ax=ax, random_color=True) plt.title("Intersection of Truck Bed and Tarp") plt.axis("off") plt.savefig("./intersection_mask.png", bbox_inches='tight', pad_inches=0, dpi=300) plt.close() print("图片已保存到当前目录：truck_bed_and_tarp_masks.png 和 intersection_mask.png")

运行后输出

使用设备: cuda

0: 384x640 1 truck, 103.4ms Speed: 9.4ms preprocess, 103.4ms inference, 437.4ms postprocess per image at shape (1, 3, 384, 640) 检测到卡车: (20.757080078125, 515.2138671875, 1079.369140625, 1426.887451171875) 车斗中心点: [[[761, 241.7]]] 雨布中心点: [[[549, 360.13]]] truck bed scores: tensor([[[0.7191, 0.8671, 0.7891]]]) masks_truck_bed shape: torch.Size([1, 3, 1440, 2560]) tarp scores: tensor([[[0.8107, 0.9023, 0.8953]]]) masks_tarp shape: torch.Size([1, 3, 1440, 2560]) 警告: truck_bed_best_idx (1) 超出掩码数量 (1)，使用第一个掩码警告: tarp_best_idx (1) 超出掩码数量 (1)，使用第一个掩码 truck_bed_mask shape: torch.Size([3, 1440, 2560]) tarp_mask shape: torch.Size([3, 1440, 2560]) 车斗面积: 308711 像素交集面积: 276499 像素覆盖比例: 89.57% 车斗上方空间未被雨布完全覆盖

图像掩码

车斗和雨覆的掩码交集

尝试过调整阈值去计算掩码，发现雨覆的掩码计算的不甚理想，还是回归到利用pipeline生成通用掩码，再利用提示点单独提取需要的掩码

pipeline通用掩码+提示点获取车斗和雨覆掩码

pipeline通用掩码

先手动指定提示点

车斗中心点: [[[761, 241.7]]] 雨布中心点: [[[549, 360.13]]]

运行程序

import torch from PIL import Image from transformers import SamModel, SamProcessor, pipeline import matplotlib.pyplot as plt import numpy as np # 检查 GPU 可用性并设置设备 device = "cuda" if torch.cuda.is_available() else "cpu" print(f"使用设备: {device}") # 加载模型和处理器 model = SamModel.from_pretrained("./sam-vit-base").to(device) processor = SamProcessor.from_pretrained("./sam-vit-base") # 加载图像 img_url = "./trunk2.jpg" raw_image = Image.open(img_url).convert("RGB") # 定义输入点 input_points = [[[450, 600]]] # 窗口的 2D 定位点 # 预处理输入并移到指定设备 inputs = processor(raw_image, input_points=input_points, return_tensors="pt").to(device) # 运行模型推理 with torch.no_grad(): outputs = model(**inputs) # 后处理掩码 masks = processor.image_processor.post_process_masks( outputs.pred_masks.cpu(), inputs["original_sizes"].cpu(), inputs["reshaped_input_sizes"].cpu() )[0] # [num_masks, height, width] scores = outputs.iou_scores.cpu() # [1, 1, num_masks] print('scores:', scores) print('masks shape:', masks.shape) # 选择最佳掩码（基于 IoU 分数） best_mask_idx = scores.argmax(dim=2).item() best_mask = masks[best_mask_idx] print('best_mask shape:', best_mask.shape) # 释放 GPU 内存 del outputs torch.cuda.empty_cache() # 使用 pipeline 生成掩码 generator = pipeline("mask-generation", model="./sam-vit-base", device=0 if device == "cuda" else -1) outputs_pipeline = generator( img_url, points_per_batch=64, pred_iou_thresh=0.88, stability_score_thresh=0.9 ) # 定义提示点 truck_bed_point = (761, 241) # 车斗提示点 tarp_point = (549, 360) # 雨布提示点 points = [truck_bed_point, tarp_point] # 点列表 # 定义可视化函数 def show_mask(mask, ax, points, random_color=False): # 将 points 转换为整数坐标列表 points = [(int(x), int(y)) for x, y in points] # 处理掩码维度 mask = np.array(mask) # 转换为 NumPy 数组 if mask.ndim == 3: # 如果是 [channels, height, width] mask = mask[0] # 取第一个通道 elif mask.ndim != 2: raise ValueError(f"掩码维度 {mask.shape} 不符合预期，应为 [height, width] 或 [channels, height, width]") h, w = mask.shape # 获取掩码的高度和宽度 # 检查掩码是否包含任一指定点 contains_points = [(x, y) for x, y in points if 0 <= y < h and 0 <= x < w and mask[y, x]] # 如果包含任一点，生成独立图片 if contains_points: if random_color: color = np.concatenate([np.random.random(3), np.array([0.6])], axis=0) else: color = np.array([30 / 255, 144 / 255, 255 / 255, 0.6]) # 创建新图像，而不是使用传入的 ax plt.figure(figsize=(8, 8)) plt.imshow(np.array(raw_image)) # 假设 raw_image 在全局可用 new_ax = plt.gca() mask_image = mask.reshape(h, w, 1) * color.reshape(1, 1, -1) new_ax.imshow(mask_image) plt.title(f"Mask containing points: {contains_points}") plt.axis("off") # 生成唯一文件名 global mask_counter # 使用全局计数器 if 'mask_counter' not in globals(): mask_counter = 0 filename = f"./mask_{mask_counter}.png" plt.savefig(filename, bbox_inches='tight', pad_inches=0, dpi=300) plt.close() mask_counter += 1 print(f"保存掩码到 {filename}，包含点: {contains_points}") # 可视化 pipeline 生成的掩码并保存 plt.figure(figsize=(8, 8)) plt.imshow(np.array(raw_image)) ax = plt.gca() for mask in outputs_pipeline["masks"]: show_mask(mask, ax=ax, points=points, random_color=True) plt.axis("off") plt.savefig("./pipeline_masks.png", bbox_inches='tight', pad_inches=0, dpi=300) plt.close() print("图片已保存到当前目录：pipeline_masks.png")

查看生成掩码

车斗上方空间分割掩码

雨覆空间分割掩码

效果比较好了，在这两个掩码基础上计算覆盖率

if truck_bed_mask is not None and tarp_mask is not None: # 二值化掩码 truck_bed_mask_np = truck_bed_mask > 0.5 tarp_mask_np = tarp_mask > 0.5 # 计算交集和面积 intersection = np.logical_and(truck_bed_mask_np, tarp_mask_np) truck_bed_area = np.sum(truck_bed_mask_np) intersection_area = np.sum(intersection) coverage_ratio = intersection_area / truck_bed_area if truck_bed_area > 0 else 0 print(f"车斗面积: {truck_bed_area} 像素") print(f"交集面积: {intersection_area} 像素") print(f"雨布覆盖率: {coverage_ratio * 100:.2f}%") # 可视化交集 plt.figure(figsize=(8, 8)) plt.imshow(np.array(raw_image)) ax_inter = plt.gca() inter_color = np.array([1.0, 0.0, 0.0, 0.6]) # 红色表示交集 inter_image = intersection.reshape(h, w, 1) * inter_color.reshape(1, 1, -1) ax_inter.imshow(inter_image) plt.title(f"Intersection (Coverage: {coverage_ratio * 100:.2f}%)") plt.axis("off") plt.savefig("./intersection_mask.png", bbox_inches='tight', pad_inches=0, dpi=300) plt.close()

运行结果，这个结果算是比较靠谱了，毕竟无法考虑透视的实际面积

车斗面积: 178494 像素交集面积: 123257 像素雨布覆盖率: 69.05%

车斗和雨覆的交集部分

检测到没有被覆盖的车斗部分

标签：

sam-vit-base辅助检测卡车的可拽雨覆完全覆盖由讯客互联手机栏目发布，感谢您对讯客互联的认可，以及对我们原创作品以及文章的青睐，非常欢迎各位朋友分享到个人网站或者朋友圈，但转载请说明文章出处“sam-vit-base辅助检测卡车的可拽雨覆完全覆盖”

上一篇
【已解决】pyodbc5.2[ODBC驱动程序管理器]未发现

下一篇
智慧校园平台在学生学习与生活中的应用