-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathspec.txt
1650 lines (1650 loc) · 106 KB
/
spec.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
2. Directives
852 This chapter describes the syntax and behavior of the OpenACC directives. In C and C++, Open853 ACC directives are specified using the #pragma mechanism provided by the language. In Fortran,
854 OpenACC directives are specified using special comments that are identified by a unique sentinel.
855 Compilers will typically ignore OpenACC directives if support is disabled or not provided.
856 2.1 Directive Format
857 In C and C++, OpenACC directives are specified with the #pragma mechanism. The syntax of an
858 OpenACC directive is:
859 #pragma acc directive-name [clause-list] new-line
860 Each directive starts with #pragma acc. The remainder of the directive follows the C and C++
861 conventions for pragmas. Whitespace may be used before and after the #; whitespace may be
862 required to separate words in a directive. Preprocessing tokens following the #pragma acc are
863 subject to macro replacement. Directives are case-sensitive.
864 In Fortran, OpenACC directives are specified in free-form source files as
865 !$acc directive-name [clause-list]
866 The comment prefix (!) may appear in any column, but may only be preceded by whitespace (spaces
867 and tabs). The sentinel (!$acc) must appear as a single word, with no intervening whitespace.
868 Line length, whitespace, and continuation rules apply to the directive line. Initial directive lines
869 must have whitespace after the sentinel. Continued directive lines must have an ampersand (&) as
870 the last nonblank character on the line, prior to any comment placed in the directive. Continuation
871 directive lines must begin with the sentinel (possibly preceded by whitespace) and may have an
872 ampersand as the first non-whitespace character after the sentinel. Comments may appear on the
873 same line as a directive, starting with an exclamation point and extending to the end of the line. If
874 the first nonblank character after the sentinel is an exclamation point, the line is ignored.
875 In Fortran fixed-form source files, OpenACC directives are specified as one of
876 !$acc directive-name [clause-list]
877 c$acc directive-name [clause-list]
878 *$acc directive-name [clause-list]
879 The sentinel (!$acc, c$acc, or *$acc) must occupy columns 1-5. Fixed form line length,
880 whitespace, continuation, and column rules apply to the directive line. Initial directive lines must
881 have a space or zero in column 6, and continuation directive lines must have a character other than
882 a space or zero in column 6. Comments may appear on the same line as a directive, starting with an
883 exclamation point on or after column 7 and continuing to the end of the line.
884 In Fortran, directives are case-insensitive. Directives cannot be embedded within continued state885 ments, and statements must not be embedded within continued directives. In this document, free
886 form is used for all Fortran OpenACC directive examples.
887 Only one directive-name can appear per directive, except that a combined directive name is consid888 ered a single directive-name.
27
The OpenACC
R API Version 3.3 2.3. Internal Control Variables
889 The order in which clauses appear is not significant unless otherwise specified. A program must not
890 depend on the order of evaluation of expressions in clause arguments or on any side effects of the
891 evaluations. (See examples below.) Clauses may be repeated unless otherwise specified.
892 H H
893 Examples
894
895 • In the following example, the order and number of evaluations of ++i and calls to foo()
896 and bar() are unspecified.
897 #pragma acc parallel \
898 num_gangs(foo(++i)) \
899 num_workers(bar(++i)) \
900 async(foo(++i))
901 { ... }
902 See Section 2.5.1 for the parallel construct.
903 • In the following example, if the implementation knows that array is not present in the
904 current device memory, it may omit calling size().
905 #pragma acc update \
906 device(array[0:size()])
907 if_present
908 See Section 2.14.4 for the update directive.
909 N N
910
911 2.2 Conditional Compilation
912 The _OPENACC macro name is defined to have a value yyyymm where yyyy is the year and mm is
913 the month designation of the version of the OpenACC directives supported by the implementation.
914 This macro must be defined by a compiler only when OpenACC directives are enabled. The version
915 described here is 202211.
916 2.3 Internal Control Variables
917 An OpenACC implementation acts as if there are internal control variables (ICVs) that control the
918 behavior of the program. These ICVs are initialized by the implementation, and may be given
919 values through environment variables and through calls to OpenACC API routines. The program
920 can retrieve values through calls to OpenACC API routines.
921 The ICVs are:
922 • acc-current-device-type-var - controls which type of device is used.
923 • acc-current-device-num-var - controls which device of the selected type is used.
924 • acc-default-async-var - controls which asynchronous queue is used when none appears in an
925 async clause.
28
The OpenACC
R API Version 3.3 2.4. Device-Specific Clauses
926 2.3.1 Modifying and Retrieving ICV Values
927 The following table shows environment variables or procedures to modify the values of the internal
928 control variables, and procedures to retrieve the values:
ICV Ways to modify values Way to retrieve value
acc-current-device-type-var acc_set_device_type acc_get_device_type
set device_type
init device_type
ACC_DEVICE_TYPE
acc-current-device-num-var acc_set_device_num acc_get_device_num
set device_num
init device_num
ACC_DEVICE_NUM
acc-default-async-var acc_set_default_async acc_get_default_async
set default_async
929
930 The initial values are implementation-defined. After initial values are assigned, but before any
931 OpenACC construct or API routine is executed, the values of any environment variables that were
932 set by the user are read and the associated ICVs are modified accordingly. There is one copy of
933 each ICV for each host thread that is not generated by a compute construct. For threads that are
934 generated by a compute construct the initial value for each ICV is inherited from the local thread.
935 The behavior for each ICV is as if there is a copy for each thread. If an ICV is modified, then a
936 unique copy of that ICV must be created for the modifying thread.
937 2.4 Device-Specific Clauses
938 OpenACC directives can specify different clauses or clause arguments for different devices using
939 the device_type clause. Clauses that precede any device_type clause are default clauses.
940 Clauses that follow a device_type clause up to the end of the directive or up to the next
941 device_type clause are device-specific clausesfor the device types specified in the device_type
942 argument. For each directive, only certain clauses may be device-specific clauses. If a directive has
943 at least one device-specific clause, it is device-dependent, and otherwise it is device-independent.
944 The argument to the device_type clause is a comma-separated list of one or more device ar945 chitecture name identifiers, or an asterisk. An asterisk indicates all device types that are not named
946 in any other device_type clause on that directive. A single directive may have one or several
947 device_type clauses. The device_type clauses may appear in any order.
948 Except where otherwise noted, the rest of this document describes device-independent directives, on
949 which all clauses apply when compiling for any device type. When compiling a device-dependent
950 directive for a particular device type, the directive is treated as if the only clauses that appear are (a)
951 the clauses specific to that device type and (b) all default clauses for which there are no like-named
952 clauses specific to that device type. If, for any device type, the resulting directive is nonconforming,
953 then the original directive is nonconforming.
954 The supported device types are implementation-defined. Depending on the implementation and the
955 compiling environment, an implementation may support only a single device type, or may support
956 multiple device types but only one at a time, or may support multiple device types in a single
957 compilation.
29
The OpenACC
R API Version 3.3 2.4. Device-Specific Clauses
958 A device architecture name may be generic, such as a vendor, or more specific, such as a partic959 ular generation of device; see Appendix A Recommendations for Implementers for recommended
960 names. When compiling for a particular device, the implementation will use the clauses associated
961 with the device_type clause that specifies the most specific architecture name that applies for
962 this device; clauses associated with any other device_type clause are ignored. In this context,
963 the asterisk is the least specific architecture name.
964 Syntax
965 The syntax of the device_type clause is
966 device_type( * )
967 device_type( device-type-list )
968
969 The device_type clause may be abbreviated to dtype.
970 H H
971 Examples
972
973 • On the following directive, worker appears as a device-specific clause for devices of type
974 foo, but gang appears as a default clause and so applies to all device types, including foo.
975 #pragma acc loop gang device_type(foo) worker
976 • The first directive below is identical to the previous directive except that loop is replaced
977 with routine. Unlike loop, routine does not permit gang to appear with worker,
978 but both apply for device type foo, so the directive is nonconforming. The second directive
979 below is conforming because gang there applies to all device types except foo.
980 // nonconforming: gang and worker not permitted together
981 #pragma acc routine gang device_type(foo) worker
982
983 // conforming: gang and worker for different device types
984 #pragma acc routine device_type(foo) worker \
985 device_type(*) gang
986 • On the directive below, the value of num_gangs is 4 for device type foo, but it is 2 for all
987 other device types, including bar. That is, foo has a device-specific num_gangs clause,
988 so the default num_gangs clause does not apply to foo.
989 !$acc parallel num_gangs(2) &
990 !$acc device_type(foo) num_gangs(4) &
991 !$acc device_type(bar) num_workers(8)
992 • The directive below is the same as the previous directive except that num_gangs(2) has
993 moved after device_type(*) and so now does not apply to foo or bar.
994 !$acc parallel device_type(*) num_gangs(2) &
995 !$acc device_type(foo) num_gangs(4) &
996 !$acc device_type(bar) num_workers(8)
997 N N
998
30
The OpenACC
R API Version 3.3 2.5. Compute Constructs
999 2.5 Compute Constructs
1000 Compute constructs indicate code that should be executed on the current device. It is implementa1001 tion defined how users specify for which accelerators that code is compiled and whether it is also
1002 compiled for the host.
1003 2.5.1 Parallel Construct
1004 Summary
1005 This fundamental construct starts parallel execution on the current device.
1006 Syntax
1007 In C and C++, the syntax of the OpenACC parallel construct is
1008 #pragma acc parallel [clause-list] new-line
1009 structured block
1010
1011 and in Fortran, the syntax is
1012 !$acc parallel [ clause-list ]
1013 structured block
1014 !$acc end parallel
1015 or
1016 !$acc parallel [ clause-list ]
1017 block construct
1018 [!$acc end parallel]
1019 where clause is one of the following:
1020 async [ ( int-expr ) ]
1021 wait [ ( int-expr-list ) ]
1022 num_gangs( int-expr-list )
1023 num_workers( int-expr )
1024 vector_length( int-expr )
1025 device_type( device-type-list )
1026 if( condition )
1027 self [ ( condition ) ]
1028 reduction( operator : var-list )
1029 copy( var-list )
1030 copyin( [ readonly: ] var-list )
1031 copyout( [ zero: ] var-list )
1032 create( [ zero: ] var-list )
1033 no_create( var-list )
1034 present( var-list )
1035 deviceptr( var-list )
1036 attach( var-list )
1037 private( var-list )
1038 firstprivate( var-list )
1039 default( none | present )
31
The OpenACC
R API Version 3.3 2.5. Compute Constructs
1040 Description
1041 When the program encounters an accelerator parallel construct, one or more gangs of workers
1042 are created to execute the accelerator parallel region. The number of gangs, and the number of
1043 workers in each gang and the number of vector lanes per worker remain constant for the duration of
1044 that parallel region. Each gang begins executing the code in the structured block in gang-redundant
1045 mode even if there is only a single gang. This means that code within the parallel region, but outside
1046 of a loop construct with gang-level worksharing, will be executed redundantly by all gangs.
1047 One worker in each gang begins executing the code in the structured block of the construct. Note:
1048 Unless there is a loop construct within the parallel region, all gangs will execute all the code within
1049 the region redundantly.
1050 If the async clause does not appear, there is an implicit barrier at the end of the accelerator parallel
1051 region, and the execution of the local thread will not proceed until all gangs have reached the end
1052 of the parallel region.
1053 The copy, copyin, copyout, create, no_create, present, deviceptr, and attach
1054 data clauses are described in Section 2.7 Data Clauses. The private and firstprivate
1055 clauses are described in Sections 2.5.13 and Sections 2.5.14. The device_type clause is de1056 scribed in Section 2.4 Device-Specific Clauses. Implicitly determined data attributes are described
1057 in Section 2.6.2. Restrictions are described in Section 2.5.4.
1058 2.5.2 Serial Construct
1059 Summary
1060 This construct defines a region of the program that is to be executed sequentially on the current
1061 device. The behavior of the serial construct is the same as that of the parallel construct
1062 except that it always executes with a single gang of a single worker with a vector length of one.
1063 Note: The serial construct may be used to execute sequential code on the current device,
1064 which removes the need for data movement when the required data is already present on the device.
1065 Syntax
1066 In C and C++, the syntax of the OpenACC serial construct is
1067 #pragma acc serial [clause-list] new-line
1068 structured block
1069
1070 and in Fortran, the syntax is
1071 !$acc serial [ clause-list ]
1072 structured block
1073 !$acc end serial
1074 or
1075 !$acc serial [ clause-list ]
1076 block construct
1077 [!$acc end serial]
1078 where clause is as for the parallel construct except that the num_gangs, num_workers, and
1079 vector_length clauses are not permitted.
32
The OpenACC
R API Version 3.3 2.5. Compute Constructs
1080 2.5.3 Kernels Construct
1081 Summary
1082 This construct defines a region of the program that is to be compiled into a sequence of kernels for
1083 execution on the current device.
1084 Syntax
1085 In C and C++, the syntax of the OpenACC kernels construct is
1086 #pragma acc kernels [ clause-list ] new-line
1087 structured block
1088
1089 and in Fortran, the syntax is
1090 !$acc kernels [ clause-list ]
1091 structured block
1092 !$acc end kernels
1093 or
1094 !$acc kernels [ clause-list ]
1095 block construct
1096 [!$acc end kernels]
1097 where clause is one of the following:
1098 async [ ( int-expr ) ]
1099 wait [ ( int-expr-list ) ]
1100 num_gangs( int-expr )
1101 num_workers( int-expr )
1102 vector_length( int-expr )
1103 device_type( device-type-list )
1104 if( condition )
1105 self [ ( condition ) ]
1106 copy( var-list )
1107 copyin( [ readonly: ] var-list )
1108 copyout( [ zero: ] var-list )
1109 create( [ zero: ] var-list )
1110 no_create( var-list )
1111 present( var-list )
1112 deviceptr( var-list )
1113 attach( var-list )
1114 default( none | present )
1115 Description
1116 The compiler will split the code in the kernels region into a sequence of accelerator kernels. Typi1117 cally, each loop nest will be a distinct kernel. When the program encounters a kernels construct,
1118 it will launch the sequence of kernels in order on the device. The number and configuration of gangs
1119 of workers and vector length may be different for each kernel.
33
The OpenACC
R API Version 3.3 2.5. Compute Constructs
1120 If the async clause does not appear, there is an implicit barrier at the end of the kernels region,
1121 and the local thread execution will not proceed until the entire sequence of kernels has completed
1122 execution.
1123 The copy, copyin, copyout, create, no_create, present, deviceptr, and attach
1124 data clauses are described in Section 2.7 Data Clauses. The device_type clause is described
1125 in Section 2.4 Device-Specific Clauses. Implicitly determined data attributes are described in Sec1126 tion 2.6.2. Restrictions are described in Section 2.5.4.
1127 2.5.4 Compute Construct Restrictions
1128 The following restrictions apply to all compute constructs:
1129 • A program may not branch into or out of a compute construct.
1130 • Only the async, wait, num_gangs, num_workers, and vector_length clauses
1131 may follow a device_type clause.
1132 • At most one if clause may appear. In Fortran, the condition must evaluate to a scalar logical
1133 value; in C or C++, the condition must evaluate to a scalar integer value.
1134 • At most one default clause may appear, and it must have a value of either none or
1135 present.
1136 • A reduction clause may not appear on a parallel construct with a num_gangs clause
1137 that has more than one argument.
1138 2.5.5 Compute Construct Errors
1139 • An acc_error_wrong_device_type error is issued if the compute construct was not
1140 compiled for the current device type. This includes the case when the current device is the
1141 host multicore.
1142 • An acc_error_device_type_unavailable error is issued if no device of the cur1143 rent device type is available.
1144 • An acc_error_device_unavailable error is issued if the current device is not avail1145 able.
1146 • An acc_error_device_init error is issued if the current device cannot be initialized.
1147 • An acc_error_execution error is issued if the execution of the compute construct on
1148 the current device type fails and the failure can be detected.
1149 • Explicit or implicitly determined data attributes can cause an error to be issued; see Sec1150 tion 2.7.3.
1151 • An async or wait clause can cause an error to be issued; see Sections 2.16.1 and 2.16.2.
1152 See Section 5.2.2.
1153 2.5.6 if clause
1154 The if clause is optional.
34
The OpenACC
R API Version 3.3 2.5. Compute Constructs
1155 When the condition in the if clause evaluates to true., the region will execute on the current device.
1156 When the condition in the if clause evaluates to false, the local thread will execute the region.
1157 2.5.7 self clause
1158 The self clause is optional.
1159 The self clause may have a single condition-argument. If the condition-argument is not present it
1160 is assumed to evaluate to true. When both an if clause and a self clause appear and the condition
1161 in the if clause evaluates to false, the self clause has no effect.
1162 When the condition evaluates to true, the region will execute on the local device. When the condition
1163 in the self clause evaluates to false, the region will execute on the current device.
1164 2.5.8 async clause
1165 The async clause is optional; see Section 2.16 Asynchronous Behavior for more information.
1166 2.5.9 wait clause
1167 The wait clause is optional; see Section 2.16 Asynchronous Behavior for more information.
1168 2.5.10 num gangs clause
1169 The num_gangs clause is allowed on the parallel and kernels constructs. On a parallel
1170 construct, it may have one, two, or three arguments. The values of the integer expressions define
1171 the number of parallel gangs along dimensions one, two, and three that will execute the parallel
1172 region. If it has fewer than three arguments, the missing values are treated as having the value 1.
1173 The total number of gangs must be at least 1 and is the product of the values of the arguments. On a
1174 kernels construct, the num_gangs clause must have a single argument, the value of which will
1175 define the number of parallel gangs that will execute each kernel created for the kernels region.
1176 If the num_gangs clause does not appear, an implementation-defined default will be used which
1177 may depend on the code within the construct. The implementation may use a lower value than
1178 specified based on limitations imposed by the target architecture.
1179 2.5.11 num workers clause
1180 The num_workers clause is allowed on the parallel and kernels constructs. The value
1181 of the integer expression defines the number of workers within each gang that will be active after
1182 a gang transitions from worker-single mode to worker-partitioned mode. If the clause does not
1183 appear, an implementation-defined default will be used; the default value may be 1, and may be
1184 different for each parallel construct or for each kernel created for a kernels construct. The
1185 implementation may use a different value than specified based on limitations imposed by the target
1186 architecture.
1187 2.5.12 vector length clause
1188 The vector_length clause is allowed on the parallel and kernels constructs. The value
1189 of the integer expression defines the number of vector lanes that will be active after a worker transi1190 tions from vector-single mode to vector-partitioned mode. This clause determines the vector length
1191 to use for vector or SIMD operations. If the clause does not appear, an implementation-defined
35
The OpenACC
R API Version 3.3 2.5. Compute Constructs
1192 default will be used. This vector length will be used for loop constructs annotated with the vector
1193 clause, as well as loops automatically vectorized by the compiler. The implementation may use a
1194 different value than specified based on limitations imposed by the target architecture.
1195 2.5.13 private clause
1196 The private clause is allowed on the parallel and serial constructs; it declares that a copy
1197 of each item on the list will be created for each gang in all dimensions.
1198 Restrictions
1199 • See Section 2.17.1 Optional Arguments for discussion of Fortran optional arguments in private
1200 clauses.
1201 2.5.14 firstprivate clause
1202 The firstprivate clause is allowed on the parallel and serial constructs; it declares that
1203 a copy of each item on the list will be created for each gang, and that the copy will be initialized with
1204 the value of that item on the local thread when a parallel or serial construct is encountered.
1205 Restrictions
1206 • See Section 2.17.1 Optional Arguments for discussion of Fortran optional arguments in
1207 firstprivate clauses.
1208 2.5.15 reduction clause
1209 The reduction clause is allowed on the parallel and serial constructs. It specifies a
1210 reduction operator and one or more vars. It implies copy clauses as described in Section 2.6.2. For
1211 each reduction var, a private copy is created for each parallel gang and initialized for that operator.
1212 At the end of the region, the values for each gang are combined using the reduction operator, and
1213 the result combined with the value of the original var and stored in the original var. If the reduction
1214 var is an array or subarray, the array reduction operation is logically equivalent to applying that
1215 reduction operation to each element of the array or subarray individually. If the reduction var
1216 is a composite variable, the reduction operation is logically equivalent to applying that reduction
1217 operation to each member of the composite variable individually. The reduction result is available
1218 after the region.
1219 The following table lists the operators that are valid and the initialization values; in each case, the
1220 initialization value will be cast into the data type of the var. For max and min reductions, the
1221 initialization values are the least representable value and the largest representable value for that data
1222 type, respectively. At a minimum, the supported data types include Fortran logical as well as
1223 the numerical data types in C (e.g., _Bool, char, int, float, double, float _Complex,
1224 double _Complex), C++ (e.g., bool, char, wchar_t, int, float, double), and Fortran
1225 (e.g., integer, real, double precision, complex). However, for each reduction operator,
1226 the supported data types include only the types permitted as operands to the corresponding operator
1227 in the base language where (1) for max and min, the corresponding operator is less-than and (2) for
1228 other operators, the operands and the result are the same type.
36
The OpenACC
R API Version 3.3 2.6. Data Environment
C and C++ Fortran
operator initialization
value
operator initialization
value
+ 0 + 0
* 1 * 1
max least max least
min largest min largest
& ˜0 iand all bits on
| 0 ior 0
ˆ 0 ieor 0
&& 1 .and. .true.
|| 0 .or. .false.
.eqv. .true.
.neqv. .false.
1229
1230 Restrictions
1231 • A var in a reduction clause must be a scalar variable name, an aggregate variable name,
1232 an array element, or a subarray (refer to Section 2.7.1).
1233 • If the reduction var is an array element or a subarray, accessing the elements of the array
1234 outside the specified index range results in unspecified behavior.
1235 • The reduction var may not be a member of a composite variable.
1236 • If the reduction var is a composite variable, each member of the composite variable must be
1237 a supported datatype for the reduction operation.
1238 • See Section 2.17.1 Optional Arguments for discussion of Fortran optional arguments in
1239 reduction clauses.
1240 2.5.16 default clause
1241 The default clause is optional. At most one default clause may appear. It adjusts what
1242 data attributes are implicitly determined for variables used in the compute construct as described in
1243 Section 2.6.2.
1244 2.6 Data Environment
1245 This section describes the data attributes for variables. The data attributes for a variable may be
1246 predetermined, implicitly determined, or explicitly determined. Variables with predetermined data
1247 attributes may not appear in a data clause that conflicts with that data attribute. Variables with
1248 implicitly determined data attributes may appear in a data clause that overrides the implicit attribute.
1249 Variables with explicitly determined data attributes are those which appear in a data clause on a
1250 data construct, a compute construct, or a declare directive. See Section A.3.3 for recommended
1251 diagnostics related to data attributes.
1252 OpenACC supports systems with accelerators that have discrete memory from the host, systems
1253 with accelerators that share memory with the host, as well as systems where an accelerator shares
1254 some memory with the host but also has some discrete memory that is not shared with the host.
1255 In the first case, no data is in shared memory. In the second case, all data is in shared memory.
1256 In the third case, some data may be in shared memory and some data may be in discrete memory,
37
The OpenACC
R API Version 3.3 2.6. Data Environment
1257 although a single array or aggregate data structure must be allocated completely in shared or discrete
1258 memory. When a nested OpenACC construct is executed on the device, the default target device for
1259 that construct is the same device on which the encountering accelerator thread is executing. In that
1260 case, the target device shares memory with the encountering thread.
1261 2.6.1 Variables with Predetermined Data Attributes
1262 The loop variable in a C for statement or Fortran do statement that is associated with a loop
1263 directive is predetermined to be private to each thread that will execute each iteration of the loop.
1264 Loop variables in Fortran do statements within a compute construct are predetermined to be private
1265 to the thread that executes the loop.
1266 Variables declared in a C block or Fortran block construct that is executed in vector-partitioned
1267 mode are private to the thread associated with each vector lane. Variables declared in a C block
1268 or Fortran block construct that is executed in worker-partitioned vector-single mode are private to
1269 the worker and shared across the threads associated with the vector lanes of that worker. Variables
1270 declared in a C block or Fortran block construct that is executed in worker-single mode are private
1271 to the gang and shared across the threads associated with the workers and vector lanes of that gang.
1272 A procedure called from a compute construct will be annotated as seq, vector, worker, or
1273 gang, as described Section 2.15 Procedure Calls in Compute Regions. Variables declared in seq
1274 routine are private to the thread that made the call. Variables declared in vector routine are private
1275 to the worker that made the call and shared across the threads associated with the vector lanes of
1276 that worker. Variables declared in worker or gang routine are private to the gang that made the
1277 call and shared across the threads associated with the workers and vector lanes of that gang.
1278 2.6.2 Variables with Implicitly Determined Data Attributes
1279 When implicitly determining data attributes on a compute construct, the following clauses are visi1280 ble and variable accesses are exposed to the compute construct:
1281 • Visible default clause: The nearest default clause appearing on the compute construct
1282 or a lexically containing data construct.
1283 • Visible data clause: Any data clause on the compute construct, a lexically containing data
1284 construct, or a visible declare directive.
1285 • Exposed variable access: Any access to the data or address of a variable at a point within the
1286 compute construct where the variable is not private to a scope lexically enclosed within the
1287 compute construct.
1288 Note: In the argument of C’s sizeof operator, the appearance of a variable is not an exposed
1289 access because neither its data nor its address is accessed. In the argument of a reduction
1290 clause on an enclosed loop construct, the appearance of a variable that is not otherwise
1291 privatized is an exposed access to the original variable.
1292 On a compute or combined construct, if a variable appears in a reduction clause but no other
1293 data clause, it is treated as if it also appears in a copy clause. Otherwise, for any variable, the
1294 compiler will implicitly determine its data attribute on a compute construct if all of the following
1295 conditions are met:
1296 • There is no default(none) clause visible at the compute construct.
38
The OpenACC
R API Version 3.3 2.6. Data Environment
1297 • An access to the variable is exposed to the compute construct.
1298 • The variable does not appear in a data clause visible at the compute construct.
1299 An aggregate variable will be treated as if it appears either:
1300 • In a present clause if there is a default(present) clause visible at the compute con1301 struct.
1302 • In a copy clause otherwise.
1303 A scalar variable will be treated as if it appears either:
1304 • In a copy clause if the compute construct is a kernels construct.
1305 • In a firstprivate clause otherwise.
1306 Note: Any default(none) clause visible at the compute construct applies to both aggregate
1307 and scalar variables. However, any default(present) clause visible at the compute construct
1308 applies only to aggregate variables.
1309 Restrictions
1310 • If there is a default(none) clause visible at a compute construct, for any variable access
1311 exposed to the compute construct, the compiler requires the variable to appear either in an
1312 explicit data clause visible at the compute construct or in a firstprivate, private, or
1313 reduction clause on the compute construct.
1314 • If a scalar variable appears in a reduction clause on a loop construct that has a parent
1315 parallel or serial construct, and if the reduction’s access to the original variable is
1316 exposed to the parent compute construct, the variable must appear either in an explicit data
1317 clause visible at the compute construct or in a firstprivate, private, or reduction
1318 clause on the compute construct. Note: Implementations are encouraged to issue a compile1319 time diagnostic when this restriction is violated to assist users in writing portable OpenACC
1320 applications.
1321 If a C++ lambda is called in a compute region and does not appear in a data clause, then it is
1322 treated as if it appears in a copyin clause on the current construct. A variable captured by a
1323 lambda is processed according to its data types: a pointer type variable is treated as if it appears
1324 in a no_create clause; a reference type variable is treated as if it appears in a present clause;
1325 for a struct or a class type variable, any pointer member is treated as if it appears in a no_create
1326 clause on the current construct. If the variable is defined as global or file or function static, it must
1327 appear in a declare directive.
1328 2.6.3 Data Regions and Data Lifetimes
1329 Data in shared memory is accessible from the current device as well as to the local thread. Such
1330 data is available to the accelerator for the lifetime of the variable. Data not in shared memory must
1331 be copied to and from device memory using data constructs, clauses, and API routines. A data
1332 lifetime is the duration from when the data is first made available to the accelerator until it becomes
1333 unavailable. For data in shared memory, the data lifetime begins when the data is allocated and
1334 ends when it is deallocated; for statically allocated data, the data lifetime begins when the program
1335 begins and does not end. For data not in shared memory, the data lifetime begins when it is made
1336 present and ends when it is no longer present.
39
The OpenACC
R API Version 3.3 2.6. Data Environment
1337 There are four types of data regions. When the program encounters a data construct, it creates a
1338 data region.
1339 When the program encounters a compute construct with explicit data clauses or with implicit data
1340 allocation added by the compiler, it creates a data region that has a duration of the compute construct.
1341 When the program enters a procedure, it creates an implicit data region that has a duration of the
1342 procedure. That is, the implicit data region is created when the procedure is called, and exited when
1343 the program returns from that procedure invocation. There is also an implicit data region associated
1344 with the execution of the program itself. The implicit program data region has a duration of the
1345 execution of the program.
1346 In addition to data regions, a program may create and delete data on the accelerator using enter
1347 data and exit data directives or using runtime API routines. When the program executes
1348 an enter data directive, or executes a call to a runtime API acc_copyin or acc_create
1349 routine, each var on the directive or the variable on the runtime API argument list will be made live
1350 on accelerator.
1351 2.6.4 Data Structures with Pointers
1352 This section describes the behavior of data structures that contain pointers. A pointer may be a
1353 C or C++ pointer (e.g., float*), a Fortran pointer or array pointer (e.g., real, pointer,
1354 dimension(:)), or a Fortran allocatable (e.g., real, allocatable, dimension(:)).
1355 When a data object is copied to device memory, the values are copied exactly. If the data is a data
1356 structure that includes a pointer, or is just a pointer, the pointer value copied to device memory
1357 will be the host pointer value. If the pointer target object is also allocated in or copied to device
1358 memory, the pointer itself needs to be updated with the device address of the target object before
1359 dereferencing the pointer in device memory.
1360 An attach action updates the pointer in device memory to point to the device copy of the data
1361 that the host pointer targets; see Section 2.7.2. For Fortran array pointers and allocatable arrays,
1362 this includes copying any associated descriptor (dope vector) to the device copy of the pointer.
1363 When the device pointer target is deallocated, the pointer in device memory should be restored
1364 to the host value, so it can be safely copied back to host memory. A detach action updates the
1365 pointer in device memory to have the same value as the corresponding pointer in local memory;
1366 see Section 2.7.2. The attach and detach actions are performed by the copy, copyin, copyout,
1367 create, attach, and detach data clauses (Sections 2.7.4-2.7.13), and the acc_attach and
1368 acc_detach runtime API routines (Section 3.2.29). The attach and detach actions use attachment
1369 counters to determine when the pointer in device memory needs to be updated; see Section 2.6.8.
1370 2.6.5 Data Construct
1371 Summary
1372 The data construct defines vars to be allocated in the current device memory for the duration of
1373 the region, whether data should be copied from local memory to the current device memory upon
1374 region entry, and copied from device memory to local memory upon region exit.
1375 Syntax
1376 In C and C++, the syntax of the OpenACC data construct is
40
The OpenACC
R API Version 3.3 2.6. Data Environment
1377 #pragma acc data [clause-list] new-line
1378 structured block
1379 and in Fortran, the syntax is
1380 !$acc data [clause-list]
1381 structured block
1382 !$acc end data
1383 or
1384 !$acc data [clause-list]
1385 block construct
1386 [!$acc end data]
1387 where clause is one of the following:
1388 if( condition )
1389 async [( int-expr )]
1390 wait [( wait-argument )]
1391 device_type( device-type-list )
1392 copy( var-list )
1393 copyin( [readonly:]var-list )
1394 copyout( [zero:]var-list )
1395 create( [zero:]var-list )
1396 no_create( var-list )
1397 present( var-list )
1398 deviceptr( var-list )
1399 attach( var-list )
1400 default( none | present )
1401 Description
1402 Data will be allocated in the memory of the current device and copied from local memory to device
1403 memory, or copied back, as required. The data clauses are described in Section 2.7 Data Clauses.
1404 Structured reference counters are incremented for data when entering a data region, and decre1405 mented when leaving the region, as described in Section 2.6.7 Reference Counters. The device_type
1406 clause is described in Section 2.4 Device-Specific Clauses.
1407 Restrictions
1408 • At least one copy, copyin, copyout, create, no_create, present, deviceptr,
1409 attach, or default clause must appear on a data construct.
1410 • Only the async and wait clauses may follow a device_type clause.
1411 if clause
1412 The if clause is optional; when there is no if clause, the compiler will generate code to allocate
1413 space in the current device memory and move data from and to the local memory as required. When
1414 an if clause appears, the program will conditionally allocate memory in and move data to and/or
1415 from device memory. When the condition in the if clause evaluates to false, no device memory
1416 will be allocated, and no data will be moved. When the condition evaluates to true, the data will be
1417 allocated and moved as specified. At most one if clause may appear.
41
The OpenACC
R API Version 3.3 2.6. Data Environment
1418 async clause
1419 The async clause is optional; see Section 2.16 Asynchronous Behavior for more information.
1420 Note: The async clause only affects operations directly associated with this particular data con1421 struct, such as data transfers. Execution of the associated structured block or block construct remains
1422 synchronous to the local thread. Nested OpenACC constructs, directives, and calls to runtime li1423 brary routines do not inherit the async clause from this construct, and the programmer must take
1424 care to not accidentally introduce race conditions related to asynchronous data transfers.
1425 wait clause
1426 The wait clause is optional; see Section 2.16 Asynchronous Behavior for more information.
1427 default clause
1428 The default clause is optional. At most one default clause may appear. It adjusts what data
1429 attributes are implicitly determined for variables used in lexically contained compute constructs as
1430 described in Section 2.6.2.
1431 Errors
1432 • See Section 2.7.3 for errors due to data clauses.
1433 • See Sections 2.16.1 and 2.16.2 for errors due to async or wait clauses.
1434 2.6.6 Enter Data and Exit Data Directives
1435 Summary
1436 An enter data directive may be used to define vars to be allocated in the current device memory
1437 for the remaining duration of the program, or until an exit data directive that deallocates the data.
1438 They also tell whether data should be copied from local memory to device memory at the enter
1439 data directive, and copied from device memory to local memory at the exit data directive. The
1440 dynamic range of the program between the enter data directive and the matching exit data
1441 directive is the data lifetime for that data.
1442 Syntax
1443 In C and C++, the syntax of the OpenACC enter data directive is
1444 #pragma acc enter data clause-list new-line
1445 and in Fortran, the syntax is
1446 !$acc enter data clause-list
1447 where clause is one of the following:
1448 if( condition )
1449 async [( int-expr )]
1450 wait [( wait-argument )]
1451 copyin( var-list )
1452 create( [zero:]var-list )
1453 attach( var-list )
1454 In C and C++, the syntax of the OpenACC exit data directive is
42
The OpenACC
R API Version 3.3 2.6. Data Environment
1455 #pragma acc exit data clause-list new-line
1456 and in Fortran, the syntax is
1457 !$acc exit data clause-list
1458 where clause is one of the following:
1459 if( condition )
1460 async [( int-expr )]
1461 wait [( wait-argument )]
1462 copyout( var-list )
1463 delete( var-list )
1464 detach( var-list )
1465 finalize
1466 Description
1467 At an enter data directive, data may be allocated in the current device memory and copied from
1468 local memory to device memory. This action enters a data lifetime for those vars, and will make
1469 the data available for present clauses on constructs within the data lifetime. Dynamic reference
1470 counters are incremented for this data, as described in Section 2.6.7 Reference Counters. Pointers
1471 in device memory may be attached to point to the corresponding device copy of the host pointer
1472 target.
1473 At an exit data directive, data may be copied from device memory to local memory and deal1474 located from device memory. If no finalize clause appears, dynamic reference counters are
1475 decremented for this data. If a finalize clause appears, the dynamic reference counters are set
1476 to zero for this data. Pointers in device memory may be detached so as to have the same value as
1477 the original host pointer.
1478 The data clauses are described in Section 2.7 Data Clauses. Reference counting behavior is de1479 scribed in Section 2.6.7 Reference Counters.
1480 Restrictions
1481 • At least one copyin, create, or attach clause must appear on an enter data direc1482 tive.
1483 • At least one copyout, delete, or detach clause must appear on an exit data direc1484 tive.
1485 if clause
1486 The if clause is optional; when there is no if clause, the compiler will generate code to allocate or
1487 deallocate space in the current device memory and move data from and to local memory. When an
1488 if clause appears, the program will conditionally allocate or deallocate device memory and move
1489 data to and/or from device memory. When the condition in the if clause evaluates to false, no
1490 device memory will be allocated or deallocated, and no data will be moved. When the condition
1491 evaluates to true, the data will be allocated or deallocated and moved as specified.
1492 async clause
1493 The async clause is optional; see Section 2.16 Asynchronous Behavior for more information.
43
The OpenACC
R API Version 3.3 2.6. Data Environment
1494 wait clause
1495 The wait clause is optional; see Section 2.16 Asynchronous Behavior for more information.
1496 finalize clause
1497 The finalize clause is allowed on the exit data directive and is optional. When no finalize
1498 clause appears, the exit data directive will decrement the dynamic reference counters for vars
1499 appearing in copyout and delete clauses, and will decrement the attachment counters for point1500 ers appearing in detach clauses. If a finalize clause appears, the exit data directive will
1501 set the dynamic reference counters to zero for vars appearing in copyout and delete clauses,
1502 and will set the attachment counters to zero for pointers appearing in detach clauses.
1503 Errors
1504 • See Section 2.7.3 for errors due to data clauses.
1505 • See Sections 2.16.1 and 2.16.2 for errors due to async or wait clauses.
1506 2.6.7 Reference Counters
1507 When device memory is allocated for data not in shared memory due to data clauses or OpenACC
1508 API routine calls, the OpenACC implementation keeps track of that section of device memory and
1509 its relationship to the corresponding data in host memory.
1510 Each section of device memory is associated with two reference counters per device, a structured
1511 reference counter and a dynamic reference counter. The structured and dynamic reference counters
1512 are used to determine when to allocate or deallocate data in device memory. The structured reference
1513 counter for a section of memory keeps track of how many nested data regions have been entered for
1514 that data. The initial value of the structured reference counter for static data in device memory (in a
1515 global declare directive) is one; for all other data, the initial value is zero. The dynamic reference
1516 counter for a section of memory keeps track of how many dynamic data lifetimes are currently active
1517 in device memory for that section. The initial value of the dynamic reference counter is zero. Data
1518 is considered present if the sum of the structured and dynamic reference counters is greater than
1519 zero.
1520 A structured reference counter is incremented when entering each data or compute region that con1521 tain an explicit data clause or implicitly-determined data attributes for that section of memory, and
1522 is decremented when exiting that region. A dynamic reference counter is incremented for each
1523 enter data copyin or create clause, or each acc_copyin or acc_create API routine
1524 call for that section of memory. The dynamic reference counter is decremented for each exit
1525 data copyout or delete clause when no finalize clause appears, or each acc_copyout
1526 or acc_delete API routine call for that section of memory. The dynamic reference counter will
1527 be set to zero with an exit data copyout or delete clause when a finalize clause ap1528 pears, or each acc_copyout_finalize or acc_delete_finalize API routine call for
1529 the section of memory. The reference counters are modified synchronously with the local thread,
1530 even if the data directives include an async clause. When both structured and dynamic reference
1531 counters reach zero, the data lifetime in device memory for that data ends.
1532 2.6.8 Attachment Counter
1533 Since multiple pointers can target the same address, each pointer in device memory is associated
1534 with an attachment counter per device. The attachment counter for a pointer is initialized to zero
44
The OpenACC
R API Version 3.3 2.7. Data Clauses
1535 when the pointer is allocated in device memory. The attachment counter for a pointer is set to one
1536 whenever the pointer is attached to new target address, and incremented whenever an attach action
1537 for that pointer is performed for the same target address. The attachment counter is decremented
1538 whenever a detach action occurs for the pointer, and the pointer is detached when the attachment
1539 counter reaches zero. This is described in more detail in Section 2.7.2 Data Clause Actions.
1540 A pointer in device memory can be assigned a device address in two ways. The pointer can be
1541 attached to a device address due to data clauses or API routines, as described in Section 2.7.2
1542 Data Clause Actions, or the pointer can be assigned in a compute region executed on that device.
1543 Unspecified behavior may result if both ways are used for the same pointer.
1544 Pointer members of structs, classes, or derived types in device or host memory can be overwritten
1545 due to update directives or API routines. It is the user’s responsibility to ensure that the pointers
1546 have the appropriate values before or after the data movement in either direction. The behavior of
1547 the program is undefined if any of the pointer members are attached when an update of a composite
1548 variable is performed.
1549 2.7 Data Clauses
1550 Data clauses may appear on the parallel construct, serial construct, kernels construct,
1551 data construct, the enter data and exit data directives, and declare directives. In the
1552 descriptions, the region is a compute region with a clause appearing on a parallel, serial, or
1553 kernels construct, a data region with a clause on a data construct, or an implicit data region
1554 with a clause on a declare directive. If the declare directive appears in a global context,
1555 the corresponding implicit data region has a duration of the program. The list argument to each
1556 data clause is a comma-separated collection of vars. On a declare directive, the list argument
1557 of a copyin, create, device_resident, or link clause may include a Fortran common
1558 block name enclosed within slashes. On any directive, for any clause except deviceptr and
1559 present, the list argument may include a Fortran common block name enclosed within slashes
1560 if that common block name also appears in a declare directive link clause. In all cases, the
1561 compiler will allocate and manage a copy of the var in the memory of the current device, creating a
1562 visible device copy of that var, for data not in shared memory.
1563 OpenACC supports accelerators with discrete memories from the local thread. However, if the
1564 accelerator can access the local memory directly, the implementation may avoid the memory allo1565 cation and data movement and simply share the data in local memory. Therefore, a program that
1566 uses and assigns data on the host and uses and assigns the same data on the accelerator within a
1567 data region without update directives to manage the coherence of the two copies may get different
1568 answers on different accelerators or implementations.
1569 Restrictions
1570 • Data clauses may not follow a device_type clause.
1571 • See Section 2.17.1 Optional Arguments for discussion of Fortran optional arguments in data
1572 clauses.
1573 2.7.1 Data Specification in Data Clauses
1574 In C and C++, a subarray is an array name followed by an extended array range specification in
1575 brackets, with start and length, such as
1576 AA[2:n]
45
The OpenACC
R API Version 3.3 2.7. Data Clauses
1577 If the lower bound is missing, zero is used. If the length is missing and the array has known size, the
1578 size of the array is used; otherwise the length is required. The subarray AA[2:n] means elements
1579 AA[2], AA[3], . . . , AA[2+n-1].
1580 In C and C++, a two dimensional array may be declared in at least four ways:
1581 • Statically-sized array: float AA[100][200];
1582 • Pointer to statically sized rows: typedef float row[200]; row* BB;
1583 • Statically-sized array of pointers: float* CC[200];
1584 • Pointer to pointers: float** DD;
1585 Each dimension may be statically sized, or a pointer to dynamically allocated memory. Each of
1586 these may be included in a data clause using subarray notation to specify a rectangular array:
1587 • AA[2:n][0:200]
1588 • BB[2:n][0:m]
1589 • CC[2:n][0:m]
1590 • DD[2:n][0:m]
1591 Multidimensional rectangular subarrays in C and C++ may be specified for any array with any com1592 bination of statically-sized or dynamically-allocated dimensions. For statically sized dimensions, all
1593 dimensions except the first must specify the whole extent to preserve the contiguous data restriction,
1594 discussed below. For dynamically allocated dimensions, the implementation will allocate pointers
1595 in device memory corresponding to the pointers in local memory and will fill in those pointers as
1596 appropriate.
1597 In Fortran, a subarray is an array name followed by a comma-separated list of range specifications
1598 in parentheses, with lower and upper bound subscripts, such as
1599 arr(1:high,low:100)
1600 If either the lower or upper bounds are missing, the declared or allocated bounds of the array, if
1601 known, are used. All dimensions except the last must specify the whole extent, to preserve the
1602 contiguous data restriction, discussed below.
1603 Restrictions
1604 • In Fortran, the upper bound for the last dimension of an assumed-size dummy array must be
1605 specified.
1606 • In C and C++, the length for dynamically allocated dimensions of an array must be explicitly
1607 specified.
1608 • In C and C++, modifying pointers in pointer arrays during the data lifetime, either on the host
1609 or on the device, may result in undefined behavior.
1610 • If a subarray appears in a data clause, the implementation may choose to allocate memory for
1611 only that subarray on the accelerator.
1612 • In Fortran, array pointers may appear, but pointer association is not preserved in device mem1613 ory.
46
The OpenACC
R API Version 3.3 2.7. Data Clauses
1614 • Any array or subarray in a data clause, including Fortran array pointers, must be a contiguous
1615 section of memory, except for dynamic multidimensional C arrays.
1616 • In C and C++, if a variable or array of composite type appears, all the data members of the
1617 struct or class are allocated and copied, as appropriate. If a composite member is a pointer
1618 type, the data addressed by that pointer are not implicitly copied.
1619 • In Fortran, if a variable or array of composite type appears, all the members of that derived
1620 type are allocated and copied, as appropriate. If any member has the allocatable or
1621 pointer attribute, the data accessed through that member are not copied.
1622 • If an expression is used in a subscript or subarray expression in a clause on a data construct,
1623 the same value is used when copying data at the end of the data region, even if the values of
1624 variables in the expression change during the data region.
1625 2.7.2 Data Clause Actions
1626 Most of the data clauses perform one or more the following actions. The actions test or modify one
1627 or both of the structured and dynamic reference counters, depending on the directive on which the
1628 data clause appears.
1629 Present Increment Action
1630 A present increment action is one of the actions that may be performed for a present (Sec1631 tion 2.7.5), copy (Section 2.7.6), copyin (Section 2.7.7), copyout (Section 2.7.8), create
1632 (Section 2.7.9), or no_create (Section 2.7.10) clause, or for a call to an acc_copyin or
1633 acc_create (Section 3.2.18) API routine. See those sections for details.
1634 A present increment action for a var occurs only when var is already present in device memory.
1635 A present increment action for a var increments the structured or dynamic reference counter for var.
1636 Present Decrement Action
1637 A present decrement action is one of the actions that may be performed for a present (Section
1638 2.7.5), copy (Section 2.7.6), copyin (Section 2.7.7), copyout (Section 2.7.8), create (Sec1639 tion 2.7.9), no_create (Section 2.7.10), or delete (Section 2.7.11) clause, or for a call to an
1640 acc_copyout or acc_delete (Section 3.2.19) API routine. See those sections for details.
1641 A present decrement action for a var occurs only when var is already present in device memory.
1642 A present decrement action for a var decrements the structured or dynamic reference counter for
1643 var, if its value is greater than zero. If the device memory associated with var was mapped to
1644 the device using acc_map_data, the dynamic reference count may not be decremented to zero,
1645 except by a call to acc_unmap_data. If the reference counter is already zero, its value is left
1646 unchanged.
1647 Create Action
1648 A create action is one of the actions that may be performed for a copyout (Section 2.7.8) or
1649 create (Section 2.7.9) clause, or for a call to an acc_create API routine (Section 3.2.18). See
1650 those sections for details.
47
The OpenACC
R API Version 3.3 2.7. Data Clauses
1651 A create action for a var occurs only when var is not already present in device memory.
1652 A create action for a var:
1653 • allocates device memory for var; and
1654 • sets the structured or dynamic reference counter to one.
1655 Copyin Action
1656 A copyin action is one of the actions that may be performed for a copy (Section 2.7.6) or copyin
1657 (Section 2.7.7) clause, or for a call to an acc_copyin API routine (Section 3.2.18). See those
1658 sections for details.
1659 A copyin action for a var occurs only when var is not already present in device memory.
1660 A copyin action for a var:
1661 • allocates device memory for var;
1662 • initiates a copy of the data for var from the local thread memory to the corresponding device
1663 memory; and
1664 • sets the structured or dynamic reference counter to one.
1665 The data copy may complete asynchronously, depending on other clauses on the directive.
1666 Copyout Action
1667 A copyout action is one of the actions that may be performed for a copy (Section 2.7.6) or
1668 copyout (Section 2.7.8) clause, or for a call to an acc_copyout API routine (Section 3.2.19).
1669 See those sections for details.
1670 A copyout action for a var occurs only when var is present in device memory.
1671 A copyout action for a var:
1672 • performs an immediate detach action for any pointer in var;
1673 • initiates a copy of the data for var from device memory to the corresponding local thread
1674 memory; and
1675 • deallocates device memory for var.
1676 The data copy may complete asynchronously, depending on other clauses on the directive, in which
1677 case the memory is deallocated when the data copy is complete.
1678 Delete Action
1679 A delete action is one of the actions that may be performed for a present (Section 2.7.5),
1680 copyin (Section 2.7.7), create (Section 2.7.9), no_create (Section 2.7.10), or delete (Sec1681 tion 2.7.11) clause, or for a call to an acc_delete API routine (Section 3.2.19). See those sections
1682 for details.
1683 A delete action for a var occurs only when var is present in device memory.
1684 A delete action for var:
48
The OpenACC
R API Version 3.3 2.7. Data Clauses
1685 • performs an immediate detach action for any pointer in var; and
1686 • deallocates device memory for var.
1687 Attach Action
1688 An attach action is one of the actions that may be performed for a present (Section 2.7.5),
1689 copy (Section 2.7.6), copyin (Section 2.7.7), copyout (Section 2.7.8), create (Section 2.7.9),
1690 no_create (Section 2.7.10), or attach (Section 2.7.11) clause, or for a call to an acc_attach
1691 API routine (Section 3.2.29). See those sections for details.
1692 An attach action for a var occurs only when var is a pointer reference.
1693 If the pointer var is in shared memory or is not present in the current device memory, or if the
1694 address to which var points is not present in the current device memory, no action is taken. If the
1695 attachment counter for var is nonzero and the pointer in device memory already points to the device
1696 copy of the data in var, the attachment counter for the pointer var is incremented. Otherwise, the
1697 pointer in device memory is attached to the device copy of the data by initiating an update for the
1698 pointer in device memory to point to the device copy of the data and setting the attachment counter
1699 for the pointer var to one. If the pointer is a null pointer, the pointer in device memory is updated to
1700 have the same value. The update may complete asynchronously, depending on other clauses on the
1701 directive. The implementation schedules pointer updates after any data copies due to copyin actions
1702 that are performed for the same directive.
1703 Detach Action
1704 A detach action is one of the actions that may be performed for a present (Section 2.7.5),
1705 copy (Section 2.7.6), copyin (Section 2.7.7), copyout (Section 2.7.8), create (Section 2.7.9),
1706 no_create (Section 2.7.10), delete (Section 2.7.11), or detach (Section 2.7.11) clause, or
1707 for a call to an acc_detach API routine (Section 3.2.29). See those sections for details.
1708 A detach action for a var occurs only when var is a pointer reference.
1709 If the pointer var is in shared memory or is not present in the current device memory, or if the
1710 attachment counter for var for the pointer is zero, no action is taken. Otherwise, the attachment
1711 counter for the pointer var is decremented. If the attachment counter is decreased to zero, the
1712 pointer is detached by initiating an update for the pointer var in device memory to have the same
1713 value as the corresponding pointer in local memory. The update may complete asynchronously,
1714 depending on other clauses on the directive. The implementation schedules pointer updates before
1715 any data copies due to copyout actions that are performed for the same directive.
1716 Immediate Detach Action
1717 An immediate detach action is one of the actions that may be performed for a detach (Section
1718 2.7.11) clause, or for a call to an acc_detach_finalize API routine (Section 3.2.29). See
1719 those sections for details.
1720 An immediate detach action for a var occurs only when var is a pointer reference and is present in
1721 device memory.
1722 If the attachment counter for the pointer is zero, the immediate detach action has no effect. Other1723 wise, the attachment counter for the pointer set to zero and the pointer is detached by initiating an
1724 update for the pointer in device memory to have the same value as the corresponding pointer in local
49
The OpenACC
R API Version 3.3 2.7. Data Clauses
1725 memory. The update may complete asynchronously, depending on other clauses on the directive.
1726 The implementation schedules pointer updates before any data copies due to copyout actions that
1727 are performed for the same directive.
1728 2.7.3 Data Clause Errors
1729 An error is issued for a var that appears in a copy, copyin, copyout, create, and delete
1730 clause as follows:
1731 • An acc_error_partly_present error is issued if part of var is present in the current
1732 device memory but all of var is not.
1733 • An acc_error_invalid_data_section error is issued if var is a Fortran subarray
1734 with a stride that is not one.
1735 • An acc_error_out_of_memory error is issued if the accelerator device does not have
1736 enough memory for var.
1737 An error is issued for a var that appears in a present clause as follows:
1738 • An acc_error_not_present error is issued if var is not present in the current device
1739 memory at entry to a data or compute construct.
1740 • An acc_error_partly_present error is issued if part of var is present in the current
1741 device memory but all of var is not.
1742 See Section 5.2.2.
1743 2.7.4 deviceptr clause
1744 The deviceptr clause may appear on structured data and compute constructs and declare
1745 directives.
1746 The deviceptr clause is used to declare that the pointers in var-list are device pointers, so the
1747 data need not be allocated or moved between the host and device for this pointer.
1748 In C and C++, the vars in var-list must be pointer variables.
1749 In Fortran, the vars in var-list must be dummy arguments (arrays or scalars), and may not have the
1750 Fortran pointer, allocatable, or value attributes.
1751 For data in shared memory, host pointers are the same as device pointers, so this clause has no
1752 effect.
1753 2.7.5 present clause
1754 The present clause may appear on structured data and compute constructs and declare di1755 rectives. The present clause specifies that vars in var-list are in shared memory or are already
1756 present in the current device memory due to data regions or data lifetimes that contain the construct
1757 on which the present clause appears.
1758 For each var in var-list, if var is in shared memory, no action is taken; if var is not in shared memory,
1759 the present clause behaves as follows:
1760 • At entry to the region:
50
The OpenACC
R API Version 3.3 2.7. Data Clauses
1761 – An attach action is performed if var is a pointer reference, and a present increment
1762 action with the structured reference counter is performed if var is not a null pointer.
1763 • At exit from the region:
1764 – If the structured reference counter for var is zero, no action is taken.
1765 – Otherwise, a detach action is performed if var is a pointer reference, and a present decrement
1766 action with the structured reference counter is performed if var is not a null pointer. If
1767 both structured and dynamic reference counters are zero, a delete action is performed.
1768 The errors in Section 2.7.3 Data Clause Errors may be issued for this clause.
1769 2.7.6 copy clause
1770 The copy clause may appear on structured data and compute constructs and on declare direc1771 tives.
1772 For each var in var-list, if var is in shared memory, no action is taken; if var is not in shared memory,
1773 the copy clause behaves as follows:
1774 • At entry to the region:
1775 – If var is present and is not a null pointer, a present increment action with the structured
1776 reference counter is performed.
1777 – If var is not present, a copyin action with the structured reference counter is performed.
1778 – If var is a pointer reference, an attach action is performed.
1779 • At exit from the region:
1780 – If the structured reference counter for var is zero, no action is taken.
1781 – Otherwise, a detach action is performed if var is a pointer reference, and a present decrement
1782 action with the structured reference counter is performed if var is not a null pointer. If
1783 both structured and dynamic reference counters are zero, a copyout action is performed.
1784 The errors in Section 2.7.3 Data Clause Errors may be issued for this clause.
1785 For compatibility with OpenACC 2.0, present_or_copy and pcopy are alternate names for
1786 copy.