1

I've racked my brain at customizing the TensorFlow object detection using webcam tutorial to count how many objects are detected from each classification. I trained my custom detection model using the efficientdet_d0_coco17_tpu-32 model. I am also using the 'detect_from_webcam.py' tutorial script. I was able to get the detection working and displaying classifications on the screen. Now I would like to display how many of each classification is detected.

I have looked at and attempted the TensorFlow object counting API and just can't seem to understand how to integrate it with my custom trained model. Counting_API

Forgive me if this is a silly question as I am starting out with Python coding and machine learning in general. Thanks in advance for your help!

I am using Tensorflow 2.4.1 and Python 3.7.0

Can anyone help me or point me to what I would need to add to count the objects detected?

This is the command I pass to the script using CMD:

python detect_from_webcam.py -m research\object_detection\inference_graph\saved_model -l research\object_detection\Training\labelmap.pbtxt

This is the script:

import numpy as np
import argparse
import tensorflow as tf
import cv2
import pathlib

from object_detection.utils import ops as utils_ops
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_util
from api import object_counting_api
from utils import backbone
# patch tf1 into `utils.ops`
utils_ops.tf = tf.compat.v1

# Patch the location of gfile
tf.gfile = tf.io.gfile


def load_model(model_path):
    model = tf.saved_model.load(model_path)
    return model


def run_inference_for_single_image(model, image):
    image = np.asarray(image)
    # The input needs to be a tensor, convert it using `tf.convert_to_tensor`.
    input_tensor = tf.convert_to_tensor(image)
    # The model expects a batch of images, so add an axis with `tf.newaxis`.
    input_tensor = input_tensor[tf.newaxis,...]
    
    # Run inference
    output_dict = model(input_tensor)

    # All outputs are batches tensors.
    # Convert to numpy arrays, and take index [0] to remove the batch dimension.
    # We're only interested in the first num_detections.
    num_detections = int(output_dict.pop('num_detections'))
    output_dict = {key: value[0, :num_detections].numpy()
                   for key, value in output_dict.items()}
    output_dict['num_detections'] = num_detections
    #print(num_detections)
    # detection_classes should be ints.
    output_dict['detection_classes'] = output_dict['detection_classes'].astype(np.int64)
    
    # Handle models with masks:
    if 'detection_masks' in output_dict:
        # Reframe the the bbox mask to the image size.
        detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
                                    output_dict['detection_masks'], output_dict['detection_boxes'],
                                    image.shape[0], image.shape[1])      
        detection_masks_reframed = tf.cast(detection_masks_reframed > 0.5, tf.uint8)
        output_dict['detection_masks_reframed'] = detection_masks_reframed.numpy()
    
    return output_dict


def run_inference(model, category_index, cap):
    
    while True:
        ret, image_np = cap.read()
        
        # Actual detection.
        output_dict = run_inference_for_single_image(model, image_np)
        # Visualization of the results of a detection.
        vis_util.visualize_boxes_and_labels_on_image_array(
            image_np,
            output_dict['detection_boxes'],
            output_dict['detection_classes'],
            output_dict['detection_scores'],
            category_index,
            instance_masks=output_dict.get('detection_masks_reframed', None),
            use_normalized_coordinates=True,
            line_thickness=8)
           
        cv2.imshow('object_detection', cv2.resize(image_np, (1920, 1080)))
        if cv2.waitKey(25) & 0xFF == ord('q'):
            cap.release()
            cv2.destroyAllWindows()
            break


if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Detect objects inside webcam videostream')
    parser.add_argument('-m', '--model', type=str, required=True, help='Model Path')
    parser.add_argument('-l', '--labelmap', type=str, required=True, help='Path to Labelmap')
    args = parser.parse_args()

    detection_model = load_model(args.model)
    category_index = label_map_util.create_category_index_from_labelmap(args.labelmap, use_display_name=True)
    
    cap = cv2.VideoCapture(0)
    run_inference(detection_model, category_index, cap)
4
  • 1
    I think you forgot to ask a question. Commented Mar 16, 2021 at 23:18
  • 1
    Len(output_dict) will give you count of boxes, which would be count of objects detected. Commented Mar 17, 2021 at 3:17
  • Do you need just the value or display it on the video? Commented Jan 9, 2022 at 17:47
  • @pratap in tensorflow v2, you need something more elaborated steps to get the cound Commented Jan 9, 2022 at 17:49

2 Answers 2

0

You can count objects in an image using single_image_object_counting.py of tensorflow object counting api. You just replace ssd_mobilenet_v1_coco_2018_01_28 with your own model containing inference graph.

You can refer code as shown below

input_video = "image.jpg"
detection_graph, category_index = backbone.set_model(MODEL_DIR)

is_color_recognition_enabled = False # set it to true for enabling the color prediction for the detected objects

# targeted objects counting
result = object_counting_api.single_image_object_counting(input_video, detection_graph, category_index, is_color_recognition_enabled) 

print (result)

For more details you can refer here.

Sign up to request clarification or add additional context in comments.

Comments

0

Note: This answer don't write the detection count on the image or video, just compute the detection count as a single value.

After a lot of python code reviews, I achieved to get just the detection count for a given class:

threshold=0.5
labels="dog"
category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)
detection_count = 0
output_dict = run_inference_for_single_image(model, image_np)

  for i, (y_min, x_min, y_max, x_max) in enumerate(output_dict['detection_boxes']):
    # validates if score has a acceptable value and if its class match with expected class
    if output_dict['detection_scores'][i] > threshold and (labels == None or category_index[output_dict['detection_classes'][i]]['name'] in labels):
      detection_count += 1

With the detection count value ready to use, you could add it to an image or video.

I will share the entire code when it's ready. Is based on this:

https://colab.research.google.com/github/tensorflow/models/blob/master/research/object_detection/colab_tutorials/object_detection_tutorial.ipynb

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.