Dev cuda #18

Halfmuh · 2018-12-16T13:05:15Z

Cuda realization of spatial mesh (SpatialMeshCu) and field solver (FieldSolver)
map_particle_charge methods commented (conflicting attempts to variables now stored on GPU memory)
todo:inner regions realization

код инициализации памяти и задания начальных условий и гран условия

no inner regions, no convergence

particle charge map interaction with mesh excluded inner regions excluded

FieldSolver.cu

noooway · 2018-12-21T11:46:51Z

FieldSolver.cu

+	is_on_low_border = ((threadIdx.x == (blockDim.x - 1)) && (blockIdx.x == (gridDim.x - 1)));
+	is_inside_borders = !(is_on_low_border || is_on_up_border);
+
+	e.x = -(1 / (1 + is_inside_borders)) * GradientComponent(


(1 / (1 + is_inside_borders))
я правильно понимаю, что это должно быть 1/2 для точки внутри области и 1 для точки на границе?
У меня есть опасение, что сейчас из-за округления это будет либо 1, либо 0.

noooway · 2018-12-21T12:11:37Z

FieldSolver.cu

+
+	offset = d_n_nodes[0].x;
+	is_on_up_border = ((threadIdx.x == 0) && (blockIdx.x == 0));
+	is_on_low_border = ((threadIdx.x == (blockDim.x - 1)) && (blockIdx.x == (gridDim.x - 1)));


is_on_up_border = ((threadIdx.x == 0) && (blockIdx.x == 0));
is_on_low_border = ((threadIdx.x == (blockDim.x - 1)) && (blockIdx.x == (gridDim.x - 1)));

Для e.y эти строчки такие же, как и выше для случая e.x. И дальше для e.z тоже без изменений.
Это точно должно быть так? Не нужно поменять индексацию на что-нибудь типа ((threadIdx.y == 0) && (blockIdx.y == 0)) ?

Можно еще раз, как организована индексация?

Количество узлов в spat_mesh считывается из конфига.

Для запуска на видеокарте все узлы делятся на куски эти функциями:

dim3 SpatialMeshCu::GetThreads() { return dim3(16, 16, n_nodes.z / 16); } dim3 SpatialMeshCu::GetBlocks(dim3 nThreads) { return dim3(n_nodes.x / nThreads.x, n_nodes.y / nThreads.y, 16);

Дальше мы запускаем ядро с этим числом блоков/процессов на GPU.
Дальше каждый процесс обрабатывает только один "свой" элемент из spat_mesh.

Чтобы построить отображение в одномерный массив, используется функция

__device__ int GetIdx() { //int xStepthread = 1; int xStepBlock = blockDim.x; int yStepThread = d_n_nodes[0].x; int yStepBlock = yStepThread * blockDim.y; int zStepThread = d_n_nodes[0].x * d_n_nodes[0].y; int zStepBlock = zStepThread * blockDim.z; return (threadIdx.x + blockIdx.x * xStepBlock) + (threadIdx.y * yStepThread + blockIdx.y * yStepBlock) + (threadIdx.z * zStepThread + blockIdx.z * zStepBlock); }

Т.е. за каждым GPU процессом закреплена своя точка из spat_mesh и пересчет из номера процесса в индекс spat_mesh можно делать по формуле spat_mesh_idx = threadIdx.x + blockIdx.x * blockSize.x (ну или что-то вроде) и аналогично для y и z компонент?

…Error_t cudaStatus;`)

SpatialMeshCu.cu

noooway · 2018-12-29T11:04:52Z

SpatialMeshCu.cu

+	int idx = zIdx + threadIdx.x * xStepThread + blockIdx.x * xStepBlock
+		+ threadIdx.y * yStepThread + blockIdx.y * yStepBlock;
+	potential[idx] = ((double)(1 - blockIdx.z)) * d_boundary[NEAR]
+		+ (blockIdx.z * d_boundary[FAR]);


Все эти оптимизации это конечно замечательно. Но сейчас есть какой-то косяк с выставлением граничных условий. И я сходу не могу понять где именно.

noooway · 2018-12-30T10:00:06Z

SpatialMeshCu.cu

+	blocks = dim3(n_nodes.x / 4, n_nodes.y / 4, 2);
+	SetBoundaryConditionOrthoZ <<< blocks, threads >>> (d_potential);
+	cuda_status = cudaDeviceSynchronize();
+	cuda_status_check(cuda_status, debug_message);


Если я правильно понял исходную идею за этими функциями, то они должны вызываться либо с 2 потоками и 1 блоком по Z, либо с 1 потоком и 2 блоками.
Есть подозрение, что когда 2 и в блоках и в потоках - результат будет неверным.

threads = dim3(4, 4, 2); blocks = dim3(n_nodes.x / 4, n_nodes.z / 4, 2);

Halfmuh added 11 commits December 6, 2018 03:59

SpatialMeshCu

3ace667

код инициализации памяти и задания начальных условий и гран условия

MAke try

7ec2ad5

fix sythax

06feba5

fix sythax

80158c5

spatial mesh cuda hdf5 read write

a7914e3

cleaning unnecessary methods

272946a

Cuda field solver

a6a8552

no inner regions, no convergence

cleaning

a7dabae

fixes+ convergence

6fd3adb

Spatial mesh + Field solver realised on cuda

9d3938c

particle charge map interaction with mesh excluded inner regions excluded

simple set device

fc27c97

noooway changed the base branch from master to devCuda December 18, 2018 20:13

noooway added 2 commits December 19, 2018 00:15

In FieldSolver ComputePhiNext: neibhour -> neighbour

9dfe305

In FieldSolver.cu minor formatting fixes

3de46aa

noooway reviewed Dec 18, 2018

View reviewed changes

FieldSolver.cu Show resolved Hide resolved

noooway reviewed Dec 18, 2018

View reviewed changes

FieldSolver.cu Outdated Show resolved Hide resolved

This was referenced Dec 20, 2018

In GPU version remove assert-convergence check from FieldSolver #19

Open

In GPU version add additional checks to number of spatial nodes provided by user in config #20

Open

noooway reviewed Dec 21, 2018

View reviewed changes

noooway and others added 10 commits December 22, 2018 13:38

In main.cpp fix undeclared cudaStatus (cudaError_t status; -> `cuda…

35729a8

…Error_t cudaStatus;`)

PhiSolver fix jacobi - cuda part

8b99ef1

merge solver fixes

a5a237a

explicit double Z grad component on cuda

9c564ec

memory access violation fix

37f105f

cuda run params thread.x/y/z=4

583add7

spatial mesh debug message extended

c1db6d9

constants copying fix

c10d9d7

temp border double variables

b762ae0

debug log extended: copying constants success

9f7a5d7

noooway added 2 commits December 29, 2018 12:32

Reminder to determine number of threads dynamically

04cf213

Explicit functions to map between thread, volume and array indexes

505d4a6

noooway reviewed Dec 29, 2018

View reviewed changes

SpatialMeshCu.cu Outdated Show resolved Hide resolved

Attempt to fix boundary conditions

9e5656b

noooway reviewed Dec 29, 2018

View reviewed changes

noooway added 15 commits December 29, 2018 14:32

Rename vol_idx -> mesh_idx

b4a2d69

Rewrite SetBoundaryConditionOrthoX

46ab2ac

Fix n of blocks in set_boundary_conditions

802b392

Use blockIdx instead of threadIdx to determine boundary side

765ccf9

Remove d_n_nodes from SetBoundaryConditionsX argument

a0d122e

Change Makefile to work in GoogleColab

7be0487

Attemp to simplify Makefile

15db6fc

Fix include guards for SpatialMeshCu.cuh

638c2ae

Try to distinguish between system and local includes

d44be2b

Remove -fstack-protector-strong option for NVCC

8128935

Remove -Wformat option from nvcc

9f14719

Remove -Werror=format-security from nvcc

1c4b412

Remove -Wall from nvcc

2b34b0c

Distinguish between system and local includes

082377e

Remove c_boundary in copy_boundary_to_device

766d6a1

noooway reviewed Dec 30, 2018

View reviewed changes

noooway and others added 9 commits December 30, 2018 13:10

Attempt to simplify boundary conditions setting

52e6c69

wrong arguments order fix

e47f92b

Merge remote-tracking branch 'origin/devCuda' into devCuda

849cd5d

Merge branch 'devCuda' into DebugSpatMeshCu

63e9ad4

PhiNext computation Signs

a8113d1

explicit double boundary conditions

f0141fa

1 jacobi iteration

c83d23e

jacobi iter 150 again

c447ea8

Merge branch 'DebugSpatMeshCu' into devCuda

d9ae6e7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dev cuda #18

Dev cuda #18

Halfmuh commented Dec 16, 2018

noooway Dec 21, 2018

noooway Dec 21, 2018 •

edited

Loading

noooway Dec 29, 2018

noooway Dec 30, 2018

Dev cuda #18

Are you sure you want to change the base?

Dev cuda #18

Conversation

Halfmuh commented Dec 16, 2018

noooway Dec 21, 2018

Choose a reason for hiding this comment

noooway Dec 21, 2018 • edited Loading

Choose a reason for hiding this comment

noooway Dec 29, 2018

Choose a reason for hiding this comment

noooway Dec 30, 2018

Choose a reason for hiding this comment

noooway Dec 21, 2018 •

edited

Loading