Skip to content
This repository was archived by the owner on Mar 21, 2024. It is now read-only.

Make reduce() not call assignment operator over uninitialized memory #208

Closed
ghost opened this issue Aug 3, 2012 · 1 comment
Closed
Labels
duplicate Already exists.

Comments

@ghost
Copy link

ghost commented Aug 3, 2012

I found out that the reduce operation on the OMP back-end just assigns values over uninitialized memory. Here's a patch to solve this problem by using uninitialized_fill on the first pass.

diff -r b84efa1fa1f0 thrust/system/omp/detail/reduce.inl
--- a/thrust/system/omp/detail/reduce.inl   Mon Jul 30 22:05:36 2012 -0700
+++ b/thrust/system/omp/detail/reduce.inl   Fri Aug 03 14:23:57 2012 -0400
@@ -51,12 +51,14 @@
   // allocate storage for the initializer and partial sums
   // XXX use select_system for Tag
   thrust::detail::temporary_array<OutputType,System> partial_sums(system, decomp1.size() + 1);
+
+  // set first element of temp array to init
+  //partial_sums[0] = init;
+  partial_sums.uninitialized_fill_n(partial_sums.begin(), 1, init);

-  // set first element of temp array to init
-  partial_sums[0] = init;

   // accumulate partial sums (first level reduction)
-  thrust::system::omp::detail::reduce_intervals(system, first, partial_sums.begin() + 1, binary_op, decomp1);
+  thrust::system::omp::detail::reduce_intervals_uninitialized(system, first, partial_sums.begin() + 1, binary_op, decomp1);

   // reduce partial sums (second level reduction)
   thrust::system::omp::detail::reduce_intervals(system, partial_sums.begin(), partial_sums.begin(), binary_op, decomp2);
diff -r b84efa1fa1f0 thrust/system/omp/detail/reduce_intervals.h
--- a/thrust/system/omp/detail/reduce_intervals.h   Mon Jul 30 22:05:36 2012 -0700
+++ b/thrust/system/omp/detail/reduce_intervals.h   Fri Aug 03 14:23:57 2012 -0400
@@ -44,6 +44,17 @@
                       BinaryFunction binary_op,
                       Decomposition decomp);

+template <typename System,
+          typename InputIterator,
+          typename OutputIterator,
+          typename BinaryFunction,
+          typename Decomposition>
+void reduce_intervals_uninitialized(dispatchable<System> &system,
+                                    InputIterator input,
+                                    OutputIterator output,
+                                    BinaryFunction binary_op,
+                                    Decomposition decomp);
+
 } // end namespace detail
 } // end namespace omp
 } // end namespace system
diff -r b84efa1fa1f0 thrust/system/omp/detail/reduce_intervals.inl
--- a/thrust/system/omp/detail/reduce_intervals.inl Mon Jul 30 22:05:36 2012 -0700
+++ b/thrust/system/omp/detail/reduce_intervals.inl Fri Aug 03 14:23:57 2012 -0400
@@ -86,6 +86,63 @@
 #endif // THRUST_DEVICE_COMPILER_IS_OMP_CAPABLE
 }

+template <typename System,
+          typename InputIterator,
+          typename OutputIterator,
+          typename BinaryFunction,
+          typename Decomposition>
+void reduce_intervals_uninitialized(dispatchable<System> &system,
+                                    InputIterator input,
+                                    OutputIterator output,
+                                    BinaryFunction binary_op,
+                                    Decomposition decomp)
+{
+  // we're attempting to launch an omp kernel, assert we're compiling with omp support
+  // ========================================================================
+  // X Note to the user: If you've found this line due to a compiler error, X
+  // X you need to enable OpenMP support in your compiler.                  X
+  // ========================================================================
+  THRUST_STATIC_ASSERT( (thrust::detail::depend_on_instantiation<InputIterator,
+                        (THRUST_DEVICE_COMPILER_IS_OMP_CAPABLE == THRUST_TRUE)>::value) );
+
+#if (THRUST_DEVICE_COMPILER_IS_OMP_CAPABLE == THRUST_TRUE)
+  typedef typename thrust::iterator_value<OutputIterator>::type OutputType;
+
+  // wrap binary_op
+  thrust::detail::host_function<BinaryFunction,OutputType> wrapped_binary_op(binary_op);
+
+  typedef thrust::detail::intptr_t index_type;
+
+  index_type n = static_cast<index_type>(decomp.size());
+
+#if (THRUST_DEVICE_COMPILER_IS_OMP_CAPABLE == THRUST_TRUE)
+# pragma omp parallel for
+#endif // THRUST_DEVICE_COMPILER_IS_OMP_CAPABLE
+  for(index_type i = 0; i < n; i++)
+  {
+    InputIterator begin = input + decomp[i].begin();
+    InputIterator end   = input + decomp[i].end();
+
+    if (begin != end)
+    {
+      OutputType sum = *begin;
+
+      ++begin;
+
+      while (begin != end)
+      {
+        sum = wrapped_binary_op(sum, *begin);
+        ++begin;
+      }
+
+      OutputIterator tmp = output + i;
+      //*tmp = sum;
+      uninitialized_fill(system, tmp, tmp+1, sum);
+    }
+  }
+#endif // THRUST_DEVICE_COMPILER_IS_OMP_CAPABLE
+}
+
 } // end namespace detail
 } // end namespace omp
 } // end namespace system```
@jaredhoberock
Copy link
Contributor

Hi Daniel, nice work on the patch. This issue is a actually a special case of #35.

To fix this problem comprehensively, temporary_array's constructor [1] needs to use either uninitialized_copy or uninitialized_fill instead of thrust::copy or nothing, respectively. But to avoid unnecessary overhead, it should omit uninitialized_fill when the value type is POD.

[1] https://github.com/thrust/thrust/blob/master/thrust/detail/temporary_array.inl#L65

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
duplicate Already exists.
Projects
None yet
Development

No branches or pull requests

1 participant