Numpy reader changes #5

SundarRajan98 · 2024-02-19T07:09:52Z

No description provided.

fiona-gladwin

Please address review comments
Change copyright in newly added files

rocAL/include/api/rocal_api_augmentation.h

fiona-gladwin · 2024-02-20T03:21:58Z

rocAL/include/api/rocal_api_augmentation.h

@@ -1098,4 +1098,6 @@ extern "C" RocalTensor ROCAL_API_CALL rocalSSDRandomCrop(RocalContext context, R
                                                         RocalTensorLayout output_layout = ROCAL_NONE,
                                                         RocalTensorOutputType output_datatype = ROCAL_UINT8);

+extern "C" RocalTensor ROCAL_API_CALL rocalSetLayout(RocalContext context, RocalTensor input,


This API is to set the layout for any tensor, In that case why is it included in api_augmentations.h?

Numpy loader doesn't provide layout for the loader tensor so I'm using the set layout function to specify the layout for augmentations.

What layouts will be set for numpy reader? Where this will be called?
I think we need to rename output_layout to just layout.

rocAL/include/api/rocal_api_data_loaders.h

fiona-gladwin · 2024-02-20T03:23:49Z

rocAL/include/api/rocal_api_data_loaders.h

+ * \param [in] context Rocal context
+ * \param [in] source_path A NULL terminated char string pointing to the location on the disk
+ * \param [in] internal_shard_count Defines the parallelism level by internally sharding the input dataset and load/decode using multiple decoder/loader instances. Using shard counts bigger than 1 improves the load/decode performance if compute resources (CPU cores) are available.
+ * \param [in] is_output Determines if the user wants the loaded images to be part of the output or not.


Is it loaded images here?
Please rephrase according to the API, applicable to the other lines of description where image is used

The numpy arrays are 4D images but I'll rephrase this as loaded arrays

rocAL_pybind/amd/rocal/plugin/pytorch.py

rocAL_pybind/amd/rocal/readers.py

fiona-gladwin · 2024-02-20T05:22:46Z

rocAL_pybind/rocal_pybind.cpp

    m.def("rocalResetLoaders", &rocalResetLoaders);
    m.def("videoMetaDataReader", &rocalCreateVideoLabelReader, py::return_value_policy::reference);
    // rocal_api_augmentation.h
+    m.def("setLayout", &rocalSetLayout,


check if this must come under augmentation.h

Dataloader APIs are used for dataset specific functions but this one seemed to be a tensor specific function which is why I added it in api_augmentations.h
@shobana-mcw to check

swetha097 · 2024-02-28T06:50:35Z

tests/cpp_api_tests/rocAL_unittests/rocAL_unittests.cpp

+                            out_f_buffer = (float *)output_tensor_list->at(idx)->buffer();
+
+                        out_buffer = (unsigned char *)malloc(output_tensor_list->at(idx)->data_size() / 4);
+                        // convert_float_to_uchar_buffer(out_f_buffer, out_buffer, output_tensor_list->at(idx)->data_size() / 4);


Remove commented code in line 793

swetha097 · 2024-02-28T06:50:48Z

tests/cpp_api_tests/rocAL_unittests/rocAL_unittests.cpp

+                            out_f16_buffer = (half *)output_tensor_list->at(idx)->buffer();
+
+                        out_buffer = (unsigned char *)malloc(output_tensor_list->at(idx)->data_size() / 2);
+                        // convert_float_to_uchar_buffer(out_f16_buffer, out_buffer, output_tensor_list->at(idx)->data_size() / 2);


Remove the commented code

swetha097 · 2024-02-28T06:52:23Z

rocAL_pybind/examples/rocAL_api_numpy_reader.py

@@ -0,0 +1,48 @@
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function


Check if the 3 top imports is necessary and remove if not needed

swetha097 · 2024-02-28T06:53:08Z

rocAL_pybind/examples/rocAL_api_numpy_reader.py

+import amd.rocal.fn as fn
+import amd.rocal.types as types
+import sys
+import os


This import seems unsused. Please remove it

swetha097 · 2024-02-28T06:53:31Z

rocAL_pybind/examples/rocAL_api_numpy_reader.py

+        print("+++++++++++++++++++++++++++++EPOCH+++++++++++++++++++++++++++++++++++++",epoch)
+        for i , [it] in enumerate(numpyIteratorPipeline):
+            print(it.shape)
+            print("************************************** i *************************************",i)


space after ,

swetha097 · 2024-02-28T06:54:06Z

rocAL_pybind/examples/rocAL_api_numpy_reader.py

+    print(len(numpyIteratorPipeline))
+    for epoch in range(1):
+        print("+++++++++++++++++++++++++++++EPOCH+++++++++++++++++++++++++++++++++++++",epoch)
+        for i , [it] in enumerate(numpyIteratorPipeline):


why is it needed to be used inside sqare bracktes ?

swetha097 · 2024-02-28T06:55:19Z

rocAL_pybind/amd/rocal/readers.py

+
+
+def numpy(*inputs, file_root='', num_shards=1,
+          random_shuffle=False, shard_id=0, stick_to_shard=False, pad_last_batch=False):


Please move
random_shuffle=False, shard_id=0, stick_to_shard=False, pad_last_batch=False):
to prev line

swetha097 · 2024-02-28T06:56:37Z

rocAL_pybind/amd/rocal/readers.py

@@ -350,3 +350,15 @@ def mxnet(path, stick_to_shard=False, pad_last_batch=False):
    mxnet_metadata = b.mxnetReader(
        Pipeline._current_pipeline._handle, *(kwargs_pybind.values()))
    return mxnet_metadata
+
+
+def numpy(*inputs, file_root='', num_shards=1,


Add docstring explaning each argument

swetha097 · 2024-02-28T06:59:06Z

rocAL_pybind/amd/rocal/plugin/pytorch.py

@@ -29,6 +29,91 @@
 import ctypes


+class ROCALNumpyIterator(object):
+    def __init__(self, pipeline, tensor_dtype=types.FLOAT, device="cpu", device_id=0, return_roi=False):


Should this be a part of the pytorch.py ? @shobana-mcw please add comments

swetha097 · 2024-02-28T07:00:46Z

rocAL_pybind/amd/rocal/plugin/generic.py

@@ -29,6 +29,95 @@
 import amd.rocal.types as types
 import ctypes

+class ROCALNumpyIterator(object):


The Numpy iterator seems to be added in both the generic.py and the pytorch.py.

We should think about keeping it here or just as a custom iterator in the respective example pipelines as was done for Coco Readers

@shobana-mcw Please add your comments on this

swetha097 · 2024-02-28T07:02:03Z

rocAL/source/readers/image/numpy_data_reader.cpp

+#include "numpy_data_reader.h"
+
+#include <commons.h>
+


remove blank lines

swetha097 · 2024-02-28T07:05:45Z

rocAL/source/readers/image/numpy_data_reader.cpp

+    size_t read_size = 0;
+    for (unsigned d = 0; d < shapes[dim]; d++) {
+        read_size += copy_array_data<T>(startPtr, strides, shapes, dim + 1);
+        startPtr += strides[dim + 1];


change from CamelCase to snake_case

swetha097 · 2024-02-28T07:25:05Z

rocAL/include/loaders/image/numpy_loader.h

+    LoaderModuleStatus load_next() override;
+    void initialize(ReaderConfig reader_cfg, DecoderConfig decoder_cfg, RocalMemType mem_type, unsigned batch_size, bool keep_orig_size = false) override;
+    void set_output(Tensor* output_image) override;
+    void set_random_bbox_data_reader(std::shared_ptr<RandomBBoxCrop_MetaDataReader> randombboxcrop_meta_data_reader) override;


void set_random_bbox_data_reader(std::shared_ptr<RandomBBoxCrop_MetaDataReader> randombboxcrop_meta_data_reader) override { THROW("set_random_bbox_data_reader is not compatible with this implementation") };

Please change to THROW err

swetha097 · 2024-02-28T07:26:30Z

rocAL/source/loaders/image/numpy_loader.cpp

+            while ((file_counter != _batch_size) && _reader->count_items() > 0) {
+                auto read_ptr = data + _image_size * file_counter;
+                auto max_shape = _output_tensor->info().max_shape();
+                size_t readSize = _reader->open();


change from cameCase to snake_case

swetha097 · 2024-02-28T07:27:29Z

rocAL/source/loaders/image/node_numpy_loader_single_shard.cpp

+*/
+
+#include "node_numpy_loader_single_shard.h"
+


remove this blank line

swetha097 · 2024-02-28T07:29:00Z

rocAL/source/api/rocal_api_data_loaders.cpp

+    auto context = static_cast<Context*>(p_context);
+    try {
+        auto max_dimensions = evaluate_numpy_data_set(StorageType::NUMPY_DATA, DecoderType::SKIP_DECODE,
+                                                      source_path);


move this to prev line

swetha097 · 2024-02-28T07:29:29Z

rocAL/source/api/rocal_api_data_loaders.cpp

+    try {
+        auto max_dimensions = evaluate_numpy_data_set(StorageType::NUMPY_DATA, DecoderType::SKIP_DECODE,
+                                                      source_path);
+        auto dtype = max_dimensions.at(max_dimensions.size() - 1);


does dimensions store data type ?

swetha097 · 2024-02-28T07:31:37Z

rocAL/include/readers/image/numpy_data_reader.h

+    template <size_t N>
+    bool check_and_skip_string(const char*& ptr, const char (&what)[N]);
+    template <size_t N>
+    void skip_field(const char*& ptr, const char (&name)[N]);


do we want to add comments here on what this function does ?

swetha097 · 2024-02-28T07:34:41Z

rocAL/include/loaders/image/numpy_loader.h

+#include "circular_buffer.h"
+#include "commons.h"
+#include "image_read_and_decode.h"
+//


remove this //

swetha097 · 2024-02-28T07:35:41Z

rocAL/include/loaders/image/numpy_loader_sharded.h

+    void fast_forward_through_empty_loaders();
+    size_t _prefetch_queue_depth;
+    Tensor *_output_tensor;
+    std::shared_ptr<RandomBBoxCrop_MetaDataReader> _randombboxcrop_meta_data_reader = nullptr;


this can be removed and add This doesnt the implementation for the function set_random_bbox_data_reader

swetha097 · 2024-02-28T07:36:41Z

rocAL/include/api/rocal_api_data_loaders.h

+ * \return Reference to the output tensor
+ */
+extern "C"  RocalTensor  ROCAL_API_CALL rocalNumpyFileSource(
+            RocalContext p_context,


please align the argumengts

swetha097 · 2024-02-28T07:36:58Z

rocAL/include/api/rocal_api_data_loaders.h

+ * \return Reference to the output tensor
+ */
+extern "C"  RocalTensor  rocalNumpyFileSourceSingleShard(
+            RocalContext p_context,


Please align the arguments, refer other functions above

swetha097

Added review comments , please address it

shobana-mcw

Please address the review comments.
I will review the cpp files and add comments.

shobana-mcw · 2024-02-28T06:52:11Z

rocAL/include/api/rocal_api_augmentation.h

@@ -1098,4 +1098,6 @@ extern "C" RocalTensor ROCAL_API_CALL rocalSSDRandomCrop(RocalContext context, R
                                                         RocalTensorLayout output_layout = ROCAL_NONE,
                                                         RocalTensorOutputType output_datatype = ROCAL_UINT8);

+extern "C" RocalTensor ROCAL_API_CALL rocalSetLayout(RocalContext context, RocalTensor input,


What layouts will be set for numpy reader? Where this will be called?
I think we need to rename output_layout to just layout.

shobana-mcw · 2024-02-28T06:58:01Z

rocAL/include/loaders/image/node_numpy_loader.h

+*/
+
+#pragma once
+#include "graph.h"


Do you need this include? Check this.

shobana-mcw · 2024-02-28T06:58:42Z

rocAL/include/loaders/image/node_numpy_loader.h

+    /// \param internal_shard_count Defines the amount of parallelism user wants for the load and decode process to be handled internally.
+    /// \param source_path Defines the path that includes the image dataset
+    /// \param load_batch_count Defines the quantum count of the images to be loaded. It's usually equal to the user's batch size.
+    /// The loader will repeat images if necessary to be able to have images in multiples of the load_batch_count,


Change these comments since its wrt images.

shobana-mcw · 2024-02-28T06:59:30Z

rocAL/include/loaders/image/node_numpy_loader_single_shard.h

+    /// \param  user_shard_id shard id from user
+    /// \param source_path Defines the path that includes the numpy array dataset
+    /// \param load_batch_count Defines the quantum count of the numpy arrays to be loaded. It's usually equal to the user's batch size.
+    /// The loader will repeat samples if necessary to be able to have samples in multiples of the load_batch_count,


Change these comments since its wrt images.

shobana-mcw · 2024-02-28T07:22:57Z

rocAL/include/loaders/image/numpy_loader.h

+    LoaderModuleStatus set_cpu_sched_policy(struct sched_param sched_policy);
+    void set_gpu_device_id(int device_id);
+    std::vector<std::string> get_id() override;
+    decoded_image_info get_decode_image_info() override;


Do we use this API in numpy reader?

shobana-mcw · 2024-02-28T07:25:27Z

rocAL/include/loaders/image/numpy_loader.h

+    size_t _image_size;
+    std::thread _load_thread;
+    RocalMemType _mem_type;
+    decoded_image_info _decoded_img_info;


Address this.

shobana-mcw · 2024-02-28T07:55:29Z

rocAL/include/loaders/image_source_evaluator.h

@@ -41,10 +41,14 @@ enum class MaxSizeEvaluationPolicy {
 class ImageSourceEvaluator {
   public:
    ImageSourceEvaluatorStatus create(ReaderConfig reader_cfg, DecoderConfig decoder_cfg);
+    ImageSourceEvaluatorStatus create(ReaderConfig reader_cfg);


Can we introduce another file for numpy source evaluator? Are there any common functions required?

shobana-mcw · 2024-02-28T07:58:07Z

rocAL/include/pipeline/master_graph.h

@@ -389,3 +391,39 @@ inline std::shared_ptr<VideoLoaderSingleShardNode> MasterGraph::add_node(const s

    return node;
 }
+
+template <>


Add comments.

shobana-mcw · 2024-02-28T08:01:42Z

rocAL/include/readers/image/image_reader.h

@@ -162,6 +183,10 @@ class Reader {
    //! Copies the data of the opened item to the buf
    virtual size_t read_data(unsigned char *buf, size_t read_size) = 0;

+    virtual const NumpyHeaderData get_numpy_header_data() { THROW("Not Implemented") }


Add comments for these API.

Do we need to introduce numpy API's here?
Can we try multiple inheritance?

Both APIs are needed for numpy source evaluator

shobana-mcw · 2024-02-28T08:57:26Z

rocAL/source/api/rocal_api_augmentation.cpp

@@ -2155,3 +2155,26 @@ rocalNop(
    }
    return output;
 }
+
+RocalTensor ROCAL_API_CALL


This API should not be part of this file.

shobana-mcw · 2024-02-28T16:19:26Z

rocAL/source/loaders/image/node_numpy_loader.cpp

+    if (internal_shard_count < 1)
+        THROW("Shard count should be greater than or equal to one")
+    _loader_module->set_output(_outputs[0]);
+    // Set reader and decoder config accordingly for the NumpyLoaderNode


Change the comments. We don't use a decoder config.

shobana-mcw

Please move the numpy related files to a new folder called numpy. It need not be under images. Need to make this change for loader and reader.

shobana-mcw · 2024-02-28T16:20:14Z

rocAL/source/loaders/image/node_numpy_loader_single_shard.cpp

+    if (shard_id >= shard_count)
+        THROW("Shard is should be smaller than shard count")
+    _loader_module->set_output(_outputs[0]);
+    // Set reader and decoder config accordingly for the NumpyLoaderNode


remove decoder config in comments.

shobana-mcw · 2024-02-28T16:24:31Z

rocAL/source/loaders/image/numpy_loader_sharded.cpp

+
+    increment_loader_idx();
+
+    // Since loaders may have different number of images loaded, some run out earlier than other.


Change the comments wrt images.

shobana-mcw · 2024-02-28T16:25:00Z

rocAL/source/loaders/image/numpy_loader_sharded.cpp

+}
+
+void NumpyLoaderSharded::set_random_bbox_data_reader(std::shared_ptr<RandomBBoxCrop_MetaDataReader> randombboxcrop_meta_data_reader) {
+    _randombboxcrop_meta_data_reader = randombboxcrop_meta_data_reader;


Address this.

shobana-mcw · 2024-02-28T16:25:22Z

rocAL/source/loaders/numpy_source_evaluator.cpp

@@ -0,0 +1,65 @@
+/*
+Copyright (c) 2019 - 2023 Advanced Micro Devices, Inc. All rights reserved.


Change the copyrights year.

shobana-mcw · 2024-02-28T16:26:43Z

rocAL/source/pipeline/tensor.cpp

+            unsigned *tensor_shape = _roi[i].end;
+            tensor_shape[i] = _max_shape[i];


Address this.

shobana-mcw · 2024-02-28T16:28:07Z

rocAL/source/readers/image/numpy_data_reader.cpp

+        LOG("NumpyDataReader ShardID [" + TOSTR(_shard_id) + "] Replicated " + _folder_path + _last_file_name + " " + TOSTR((_batch_count - _in_batch_read_count)) + " times to fill the last batch")
+    }
+    if (!_file_names.empty())
+        LOG("NumpyDataReader ShardID [" + TOSTR(_shard_id) + "] Total of " + TOSTR(_file_names.size()) + " images loaded from " + _full_path)


Change the reference of image.

shobana-mcw · 2024-02-28T16:30:08Z

rocAL/source/readers/image/numpy_data_reader.cpp

+        std::string subfolder_path = _full_path + "/" + entry_name_list[dir_count];
+        filesys::path pathObj(subfolder_path);
+        if (filesys::exists(pathObj) && filesys::is_regular_file(pathObj)) {
+            // ignore files with extensions .tar, .zip, .7z


Where are we checking for this?

shobana-mcw · 2024-02-28T16:32:00Z

rocAL/source/readers/image/numpy_data_reader.cpp

+}
+
+const RocalTensorDataType NumpyDataReader::get_numpy_dtype(const std::string& format) {
+    if (format == "u1") return RocalTensorDataType::UINT8;


Switch case will be more readable.

HazarathKumarM · 2024-02-28T17:00:45Z

rocAL_pybind/examples/rocAL_api_numpy_reader.py

@@ -0,0 +1,48 @@
+from __future__ import absolute_import
+from __future__ import division


Looks like there's no division operation being used in this file. Please remove it if it is not used.

HazarathKumarM · 2024-02-28T17:13:14Z

rocAL_pybind/examples/rocAL_api_numpy_reader.py

+import os
+
+def main():
+    if  len(sys.argv) < 3:


It should be 'if len(sys.argv) < 4' here, as you require three arguments, and argv[0] by default contains the Python script name.

HazarathKumarM · 2024-02-28T17:15:48Z

rocAL_pybind/examples/rocAL_api_numpy_reader.py

+    numpyIteratorPipeline = ROCALNumpyIterator(pipeline, tensor_dtype=types.UINT8)
+    print(len(numpyIteratorPipeline))
+    for epoch in range(1):
+        print("+++++++++++++++++++++++++++++EPOCH+++++++++++++++++++++++++++++++++++++",epoch)


space before epoch

Adding numpy reader support to rocAL

8d34d4c

SundarRajan98 requested review from shobana-mcw, swetha097 and fiona-gladwin February 19, 2024 07:09

SundarRajan98 self-assigned this Feb 19, 2024

SundarRajan98 changed the base branch from develop to param_vx_changes February 19, 2024 07:16

SundarRajan98 added 2 commits February 19, 2024 07:57

Removing missed instances of files and seed arguments

56eecf2

Fixing build issues

effbd86

fiona-gladwin reviewed Feb 20, 2024

View reviewed changes

Resolving review comments

3361c10

swetha097 reviewed Feb 28, 2024

View reviewed changes

rocAL/source/readers/image/numpy_data_reader.cpp

#include "numpy_data_reader.h"

#include <commons.h>

Copy link

Collaborator

swetha097 Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove blank lines

swetha097 reviewed Feb 28, 2024

View reviewed changes

shobana-mcw reviewed Feb 28, 2024

View reviewed changes

SundarRajan98 requested a review from HazarathKumarM February 28, 2024 10:25

Resolving review comments

1dc2159

shobana-mcw reviewed Feb 28, 2024

View reviewed changes

HazarathKumarM reviewed Feb 28, 2024

View reviewed changes



		def numpy(*inputs, file_root='', num_shards=1,
		random_shuffle=False, shard_id=0, stick_to_shard=False, pad_last_batch=False):


		increment_loader_idx();

		// Since loaders may have different number of images loaded, some run out earlier than other.

		@@ -0,0 +1,65 @@
		/*
		Copyright (c) 2019 - 2023 Advanced Micro Devices, Inc. All rights reserved.

		unsigned *tensor_shape = _roi[i].end;
		tensor_shape[i] = _max_shape[i];

		@@ -0,0 +1,48 @@
		from __future__ import absolute_import
		from __future__ import division

Numpy reader changes #5

Are you sure you want to change the base?

Numpy reader changes #5

Conversation

SundarRajan98 commented Feb 19, 2024

fiona-gladwin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

swetha097 Feb 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

swetha097 left a comment

Choose a reason for hiding this comment

shobana-mcw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shobana-mcw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

swetha097 Feb 28, 2024 •

edited

Loading