-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The new function of chgres_cube to output the results in netcdf 4 format #689
Comments
@TingLei-NOAA I was able to write netcdf 4 files. There is a way to adjust the cache size, but I just used the defaults for now. How can we test this? |
@GeorgeGayno-NOAA Great!. If you need me to test this new function, which branch should I use ? |
George,
Thank you!
I also cc this to Ming ,Shun and others.
We will keep you posted through this issue.
Ting
…______________________________
Ting Lei
Lynker at NOAA/NWS/NCEP/EMC
5830 University Research Ct., Cubicle 2765
College Park, MD 20740
***@***.***
301-683-3624
On Tue, Sep 27, 2022 at 11:53 AM GeorgeGayno-NOAA ***@***.***> wrote:
@TingLei-NOAA <https://github.com/TingLei-NOAA> Use this branch:
https://github.com/GeorgeGayno-NOAA/UFS_UTILS/tree/feature/netcdf4
—
Reply to this email directly, view it on GitHub
<#689 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APEFS7D6CO5QGC3TR26LLNLWAMJ75ANCNFSM576NGXIA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hi George,
The changes look good to me. There is no code to control the trunk size,
right? The trunk size will be decided by the system default, right?
Thanks,
Ming
On Tue, Sep 27, 2022 at 9:59 AM Ting Lei - NOAA Affiliate ***@***.***>
wrote:
… George,
Thank you!
I also cc this to Ming ,Shun and others.
We will keep you posted through this issue.
Ting
______________________________
Ting Lei
Lynker at NOAA/NWS/NCEP/EMC
5830 University Research Ct., Cubicle 2765
College Park, MD 20740
***@***.***
301-683-3624
On Tue, Sep 27, 2022 at 11:53 AM GeorgeGayno-NOAA <
***@***.***> wrote:
> @TingLei-NOAA <https://github.com/TingLei-NOAA> Use this branch:
> https://github.com/GeorgeGayno-NOAA/UFS_UTILS/tree/feature/netcdf4
>
> —
> Reply to this email directly, view it on GitHub
> <#689 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/APEFS7D6CO5QGC3TR26LLNLWAMJ75ANCNFSM576NGXIA>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
|
When using netcdf4 files, you can set the See https://docs.unidata.ucar.edu/netcdf-fortran/current/f90_datasets.html#f90-nf90_create |
Hi, Gorge,
Dusan had setup the chunk size in the fms lib as described in the
excerpt from his email
^^
I made some changes to FMS to explicitly set the chunk sizes of each
variable to be equal to its dimension lengths.
Please see my branch https://github.com/DusanJovic-NOAA/FMS/tree/chunks
V
He needed to define the chunksize for each variable as he did in the
netcdf_io.F90 in FMS lib
(one such code on hera
is /scratch2/NCEPDEV/fv3-cam/Ting.Lei/dr-dusan//FMS/fms2_io/netcdf_io.F90,
around line 928)
^^
if (present(dimensions)) then
allocate(dimids(size(dimensions)))
allocate(dimlens(size(dimensions)))
do i = 1, size(dimids)
dimids(i) = get_dimension_id(fileobj%ncid,
trim(dimensions(i)),msg=append_error_msg)
dimlens(i) = get_dimension_len(fileobj%ncid,
dimids(i),msg=append_error_msg)
enddo
err = nf90_def_var(fileobj%ncid, trim(variable_name), vtype, dimids,
varid, chunksizes=dimlens)
deallocate(dimids)
deallocate(dimlens)
V
In the above, the chunk for each var is defined to the whole block of this
multi-dimentional array.
This kind of setup give us the significant speeding up for GSI processing
with them.
Thank you!
Ting
…______________________________
Ting Lei
Lynker at NOAA/NWS/NCEP/EMC
5830 University Research Ct., Cubicle 2765
College Park, MD 20740
***@***.***
301-683-3624
On Tue, Sep 27, 2022 at 1:29 PM GeorgeGayno-NOAA ***@***.***>
wrote:
Hi George, The changes look good to me. There is no code to control the
trunk size, right? The trunk size will be decided by the system default,
right? Thanks, Ming On Tue, Sep 27, 2022 at 9:59 AM Ting Lei - NOAA
Affiliate *@*.***> wrote:
… <#m_-4454820830217294034_>
George, Thank you! I also cc this to Ming ,Shun and others. We will keep
you posted through this issue. Ting
When using netcdf4 files, you can set the cache_size, cache_nelems and
cach-preemption in the call to nf90_create. I am using the default values
as I don't know how to set them for your process. Do you have an idea of
how to set these arguments?
See
https://docs.unidata.ucar.edu/netcdf-fortran/current/f90_datasets.html#f90-nf90_create
—
Reply to this email directly, view it on GitHub
<#689 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APEFS7HAEJ2EOITT6KN2F7DWAMVHBANCNFSM576NGXIA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@TingLei-NOAA I added chunking to the atmospheric file at 2038e2e. Can you please test my branch and check for performance improvements. Here is a check (using ncdump -h -s) of one of the wind records. The chunk sizes are set to the length of each dimension:
|
@GeorgeGayno-NOAA That is great! Just one question, how shall we define the parameters like i_target_out ? I didn't find out from your PR. |
That is the 'i' dimension of the output grid. That is set by the user. My test used a C96 grid. |
George, Thanks for your anwer. I think this is exactly what we need.
What is your plan to push it to the main branch ?
Or, we can use this branch for being now and wait and see when this change
would be pushed into the main branch ?
Thanks a lot for your further clarification.
Ting
…______________________________
Ting Lei
Physical Scientist, Contractor with Lynker in support of
EMC/NCEP/NWS/NOAA
5830 University Research Ct., Cubicle 2765
College Park, MD 20740
***@***.***
301-683-3624
On Thu, Sep 29, 2022 at 3:13 PM GeorgeGayno-NOAA ***@***.***> wrote:
@GeorgeGayno-NOAA <https://github.com/GeorgeGayno-NOAA> That is great!
Just one question, how shall we define the parameters like i_target_out ? I
didn't find out from your PR.
That is the 'i' dimension of the output grid. That is set by the user. My
test used a C96 grid.
—
Reply to this email directly, view it on GitHub
<#689 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APEFS7DAGAZVMFTKTMVXTS3WAXS3JANCNFSM576NGXIA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@TingLei-NOAA We need to prove that these changes improve performance before merging. And I only added chunking to the atmospheric file so far. The surface and lateral boundary files do not have chunking. Come up with a testing strategy and I can help you. |
George,
Got it!
I will set up a comparison case to show its' benefit for GSI as soon as
possible and let you know.
Than you!
Ting
…______________________________
Ting Lei
Physical Scientist, Contractor with Lynker in support of
EMC/NCEP/NWS/NOAA
5830 University Research Ct., Cubicle 2765
College Park, MD 20740
***@***.***
301-683-3624
On Thu, Sep 29, 2022 at 3:48 PM GeorgeGayno-NOAA ***@***.***> wrote:
@TingLei-NOAA <https://github.com/TingLei-NOAA> We need to prove that
these changes improve performance before merging. And I only added chunking
to the atmospheric file so far. The surface and lateral boundary files do
not have chunking.
Come up with a testing strategy and I can help you.
—
Reply to this email directly, view it on GitHub
<#689 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APEFS7ATPWNG3KYV5PRWYQLWAXXBHANCNFSM576NGXIA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Ting,
Thank you for updating this. We will make sure to merge the changes to RRFS
repo.
Shun
On Tue, Sep 27, 2022 at 11:59 AM Ting Lei - NOAA Affiliate <
***@***.***> wrote:
… George,
Thank you!
I also cc this to Ming ,Shun and others.
We will keep you posted through this issue.
Ting
______________________________
Ting Lei
Lynker at NOAA/NWS/NCEP/EMC
5830 University Research Ct., Cubicle 2765
College Park, MD 20740
***@***.***
301-683-3624
On Tue, Sep 27, 2022 at 11:53 AM GeorgeGayno-NOAA <
***@***.***> wrote:
> @TingLei-NOAA <https://github.com/TingLei-NOAA> Use this branch:
> https://github.com/GeorgeGayno-NOAA/UFS_UTILS/tree/feature/netcdf4
>
> —
> Reply to this email directly, view it on GitHub
> <#689 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/APEFS7D6CO5QGC3TR26LLNLWAMJ75ANCNFSM576NGXIA>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
|
Hi, George,
I am trying to test the new chgres_cube in the GSL's RRFS 3km conus runs.
But seems the namelist with it is not compatible with the new chgres_cube.
Would you please point me to a one working for your branch?
Thank you!
Ting
…______________________________
Ting Lei
Physical Scientist, Contractor with Lynker in support of
EMC/NCEP/NWS/NOAA
5830 University Research Ct., Cubicle 2765
College Park, MD 20740
***@***.***
301-683-3624
On Thu, Sep 29, 2022 at 3:55 PM Ting Lei - NOAA Affiliate ***@***.***>
wrote:
George,
Got it!
I will set up a comparison case to show its' benefit for GSI as soon as
possible and let you know.
Than you!
Ting
______________________________
Ting Lei
Physical Scientist, Contractor with Lynker in support of
EMC/NCEP/NWS/NOAA
5830 University Research Ct., Cubicle 2765
College Park, MD 20740
***@***.***
301-683-3624
On Thu, Sep 29, 2022 at 3:48 PM GeorgeGayno-NOAA ***@***.***>
wrote:
> @TingLei-NOAA <https://github.com/TingLei-NOAA> We need to prove that
> these changes improve performance before merging. And I only added chunking
> to the atmospheric file so far. The surface and lateral boundary files do
> not have chunking.
>
> Come up with a testing strategy and I can help you.
>
> —
> Reply to this email directly, view it on GitHub
> <#689 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/APEFS7ATPWNG3KYV5PRWYQLWAXXBHANCNFSM576NGXIA>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
|
What namelist error are you getting? Can I look at the log file and scripts? |
Ting,
Please delete "fix_dir_input_grid" and try.
Thanks,
Ming
On Tue, Oct 11, 2022 at 6:07 AM GeorgeGayno-NOAA ***@***.***>
wrote:
… Hi, George, I am trying to test the new chgres_cube in the GSL's RRFS 3km
conus runs. But seems the namelist with it is not compatible with the new
chgres_cube. Would you please point me to a one working for your branch?
Thank you! Ting
… <#m_-4987553524089041708_>
______________________________ Ting Lei Physical Scientist, Contractor
with Lynker in support of EMC/NCEP/NWS/NOAA 5830 University Research Ct.,
Cubicle 2765 College Park, MD 20740 *@*.*** 301-683-3624 On Thu, Sep 29,
2022 at 3:55 PM Ting Lei - NOAA Affiliate *@*.
*> wrote: George, Got it! I will set up a comparison case to show its'
benefit for GSI as soon as possible and let you know. Than you! Ting
______________________________ Ting Lei Physical Scientist, Contractor with
Lynker in support of EMC/NCEP/NWS/NOAA 5830 University Research Ct.,
Cubicle 2765 College Park, MD 20740 @.* 301-683-3624 On Thu, Sep 29, 2022
at 3:48 PM GeorgeGayno-NOAA *@*.*> wrote: > @TingLei-NOAA
<https://github.com/TingLei-NOAA> https://github.com/TingLei-NOAA
<https://github.com/TingLei-NOAA> We need to prove that > these changes
improve performance before merging. And I only added chunking > to the
atmospheric file so far. The surface and lateral boundary files do > not
have chunking. > > Come up with a testing strategy and I can help you. > >
— > Reply to this email directly, view it on GitHub > <#689 (comment)
<#689 (comment)>>,
> or unsubscribe >
https://github.com/notifications/unsubscribe-auth/APEFS7ATPWNG3KYV5PRWYQLWAXXBHANCNFSM576NGXIA
<https://github.com/notifications/unsubscribe-auth/APEFS7ATPWNG3KYV5PRWYQLWAXXBHANCNFSM576NGXIA>
> . > You are receiving this because you were mentioned.Message ID: > @.*>
>
What namelist error are you getting? Can I look at the log file and
scripts?
—
Reply to this email directly, view it on GitHub
<#689 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABVV23TOWAMK3KQMZIN6MG3WCVJ7JANCNFSM576NGXIA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
George ,
I am re running jobs to finish a chgres-cube case . After it is finished, I
will send the links you need if it still fails using Ming's fix.
Ming,
Thanks. I will see if it fix the problem.
Regards,
Ting
…______________________________
Ting Lei
Physical Scientist, Contractor with Lynker in support of
EMC/NCEP/NWS/NOAA
5830 University Research Ct., Cubicle 2765
College Park, MD 20740
***@***.***
301-683-3624
On Tue, Oct 11, 2022 at 11:14 AM Ming Hu ***@***.***> wrote:
Ting,
Please delete "fix_dir_input_grid" and try.
Thanks,
Ming
On Tue, Oct 11, 2022 at 6:07 AM GeorgeGayno-NOAA ***@***.***>
wrote:
> Hi, George, I am trying to test the new chgres_cube in the GSL's RRFS 3km
> conus runs. But seems the namelist with it is not compatible with the new
> chgres_cube. Would you please point me to a one working for your branch?
> Thank you! Ting
> … <#m_-4987553524089041708_>
> ______________________________ Ting Lei Physical Scientist, Contractor
> with Lynker in support of EMC/NCEP/NWS/NOAA 5830 University Research Ct.,
> Cubicle 2765 College Park, MD 20740 *@*.*** 301-683-3624 On Thu, Sep 29,
> 2022 at 3:55 PM Ting Lei - NOAA Affiliate *@*.
> *> wrote: George, Got it! I will set up a comparison case to show its'
> benefit for GSI as soon as possible and let you know. Than you! Ting
> ______________________________ Ting Lei Physical Scientist, Contractor
with
> Lynker in support of EMC/NCEP/NWS/NOAA 5830 University Research Ct.,
> Cubicle 2765 College Park, MD 20740 @.* 301-683-3624 On Thu, Sep 29, 2022
> at 3:48 PM GeorgeGayno-NOAA *@*.*> wrote: > @TingLei-NOAA
> <https://github.com/TingLei-NOAA> https://github.com/TingLei-NOAA
> <https://github.com/TingLei-NOAA> We need to prove that > these changes
> improve performance before merging. And I only added chunking > to the
> atmospheric file so far. The surface and lateral boundary files do > not
> have chunking. > > Come up with a testing strategy and I can help you. >
>
> — > Reply to this email directly, view it on GitHub > <#689 (comment)
> <
#689 (comment)
>>,
> > or unsubscribe >
>
https://github.com/notifications/unsubscribe-auth/APEFS7ATPWNG3KYV5PRWYQLWAXXBHANCNFSM576NGXIA
> <
https://github.com/notifications/unsubscribe-auth/APEFS7ATPWNG3KYV5PRWYQLWAXXBHANCNFSM576NGXIA
>
> > . > You are receiving this because you were mentioned.Message ID: >
@.*>
> >
>
> What namelist error are you getting? Can I look at the log file and
> scripts?
>
> —
> Reply to this email directly, view it on GitHub
> <
#689 (comment)
>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/ABVV23TOWAMK3KQMZIN6MG3WCVJ7JANCNFSM576NGXIA
>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
—
Reply to this email directly, view it on GitHub
<#689 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APEFS7DQ2MXN4BGPE2HXDKLWCV735ANCNFSM576NGXIA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@GeorgeGayno-NOAA
RESOURCE STATISTICS************** |
@TingLei-NOAA So, should we close this issue? Should we update chgres to output netcdf4 (without adding chunking)? |
Contiguous storage will generally be faster than chunked storage to write,
but does not allow compression. Are you sure you don't want compression?
Switching from netCDF-4 to netCDF-4 classic and back will have no impact on
performance. The only difference is that the netCDF-4 classic format will
not allow you to create anything from the enhanced data model - that is, a
netcdf-4 classic file is a netCDF-4/HDF5 file that does not use any of the
new types, or multiple unlimited dimensions. It uses only the classic model
of netCDF.
…On Fri, Oct 14, 2022 at 2:42 PM GeorgeGayno-NOAA ***@***.***> wrote:
@TingLei-NOAA <https://github.com/TingLei-NOAA> So, should we close this
issue? Should we update chgres to output netcdf4 (without adding chunking)?
—
Reply to this email directly, view it on GitHub
<#689 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJIOMMCF277EVAWSMPGTEKLWDFILVANCNFSM576NGXIA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
@edwardhartnett Thanks a lot for your comments/explanation. We need netcdf 4 because the current GSI parallel IO for fv3-lam is the parallelization based on hdf 5, namely, only working for netcdf 4. But if we need compression in the near future, I have no idea. @hu5970 and Shun Liu could be a better position to give the answer. |
@TingLei-NOAA You said "we need netcdf 4" for the GSI. If I close this issue, chgres will continue to output "netcdf4 classic". Is that what you want? |
@george Gayno - NOAA Affiliate ***@***.***> sorry for the
confusion I caused. Your first change (make the output to be netcdf 4 with continuous storage is definitely what we need. Thanks.
…______________________________
Ting Lei
Physical Scientist, Contractor with Lynker in support of
EMC/NCEP/NWS/NOAA
5830 University Research Ct., Cubicle 2765
College Park, MD 20740
***@***.***
301-683-3624
On Mon, Oct 17, 2022 at 2:21 PM GeorgeGayno-NOAA ***@***.***> wrote:
@edwardhartnett <https://github.com/edwardhartnett> Thanks a lot for your
comments/explanation. We need netcdf 4 because the current GSI parallel IO
for fv3-lam is the parallelization based on hdf 5, namely, only working for
netcdf 4. But if we need compression in the near future, I have no idea.
@hu5970 <https://github.com/hu5970> and Shun Liu could be a better
position to give the answer. @GeorgeGayno-NOAA
<https://github.com/GeorgeGayno-NOAA> , from the purpose of this issue,
yes, this issue can be closed. The decision on whether to include the
second change (chunk size setup ) is up to you with Ed's information.
Thanks!
@TingLei-NOAA <https://github.com/TingLei-NOAA> You said "we need netcdf
4" for the GSI. If I close this issue, chgres will continue to output
"netcdf4 classic". Is that what you want?
—
Reply to this email directly, view it on GitHub
<#689 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APEFS7HDHFB2QRVSJAWDO7DWDWKLRANCNFSM576NGXIA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@TingLei-NOAA Ok. We can always revisit chunking later. |
Compiled the branch at 05b976e on Cactus and ran the consistency tests. All tests failed. None of the data values were different. Only the global attributes. Example:
|
Compiled the branch at 05b976e on Hera, Jet and Orion. All tests failed as they did on Cactus. This is the expected result. Will submit a PR. |
@TingLei-NOAA Please review the PR - #704. |
OK, I'm not sure if this is relevant but there is some confusion here about netCDF-4 vs. netCDF-4 classic. A netCDF-4 classic file is a netCDF-4 file that adheres to the netCDF classic data model. A netCDF-4 classic file is a netCDF-4 file and can easily be read by any netCDF-4 application. So you do not need to turn off the NC_CLASSIC_MODEL flag in order to improve performance. If you take away that flag, netCDF will allow you to create elements of the enhanced model in the file. For example, in a file with NC_CLASSIC_MODEL, there can only be one unlimited dimension. If you try and create a second unlimited dimension, you will get an error. But create a file without NC_CLASSIC_MODEL and you will be able to create as many unlimited dimensions as you want. In both cases, a netCDF4/HDF5 file results, and both files can be read by any netCDF program. So the difference between classic and not is simply that classic files restrict what you can add to the file, in a way that exactly matches the behavior of netCDF classic. Without the NC_CLASSIC_MODEL, netCDF allows you to use features that were not present in classic netCDF, including multiple dimensions, user-defined types, and unsigned integer types. But there is no performance difference between netCDF-4 files created with or without NC_CLASSIC_MODEL. |
@edwardhartnett Thanks a lot for your information, very helpful! |
Previously, the coldstart files were netcdf4-classic. Fixes #689.
Currently, the chgres_cube is generating results in netcdf 4 classic with hardwired setup.
This issue is opened for adding the function to generate netcdf 4 files. There are two reasons for this request: 1. The chgres_cube generated FV3-LAM cold start files would be read in and updated by GSI. Currently the GSI IO interface for fv3-lam (including the cold start files generated by chgres_cube ) is using the parallel IO of netcdf 4. When the netcdf 4 classic files are used for GSI, some minor differences would generated in the final analysis fields.
Hence, it is expected chgres_cube can create netcdf 4 files. It would be very helpful if this new function with chgres_cube could take care of the chunk size /shape in the generated netcdf 4 files, because this has an significant impact on the performance of GSI parallel IO for fv3-lam and it could also impact the IO for FV3 model.
2. For the fv3-lam forecast model , It was found the cold start files in netcdf 4 class would cause slower initialization part compared with runs using restart files in netcdf 4 (though the reasons are still to be identified).
Hence , that will be very helpful if chgres_cube would be implemented with the function to generate files in netcdf 4 format .
I also cc our colleagues directly involved in this issue Shun Liu , Ming Hu (@hu5970 ) and Eric Rodger . And they can clarify on this problem.
The text was updated successfully, but these errors were encountered: