Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing conforming rt fes #638

Merged
merged 13 commits into from
Sep 6, 2021
Merged

Optimizing conforming rt fes #638

merged 13 commits into from
Sep 6, 2021

Conversation

amartinhuertas
Copy link
Member

@amartinhuertas amartinhuertas commented Aug 13, 2021

see issue #637

    function get_cell_shapefuns(model::DiscreteModel,
                                cell_reffe::AbstractArray{<:GenericRefFE{RaviartThomas}},
                                ::DivConformity)

as per fverdugo comments.
@codecov-commenter
Copy link

codecov-commenter commented Aug 13, 2021

Codecov Report

Merging #638 (cbd25b6) into master (81c055e) will decrease coverage by 0.00%.
The diff coverage is 89.09%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #638      +/-   ##
==========================================
- Coverage   87.70%   87.70%   -0.01%     
==========================================
  Files         134      134              
  Lines       14315    14320       +5     
==========================================
+ Hits        12555    12559       +4     
- Misses       1760     1761       +1     
Impacted Files Coverage Δ
src/FESpaces/AffineFEOperators.jl 100.00% <ø> (ø)
src/Fields/ArrayBlocks.jl 87.98% <0.00%> (-0.42%) ⬇️
src/Geometry/BoundaryDiscreteModels.jl 91.66% <ø> (ø)
src/TensorValues/SymTensorValueTypes.jl 67.08% <75.00%> (ø)
src/TensorValues/ThirdOrderTensorValueTypes.jl 56.14% <80.00%> (ø)
src/FESpaces/DivConformingFESpaces.jl 100.00% <100.00%> (+3.57%) ⬆️
src/Geometry/SkeletonTriangulations.jl 96.17% <100.00%> (ø)
src/TensorValues/SymFourthOrderTensorValueTypes.jl 77.77% <100.00%> (ø)
src/TensorValues/TensorValueTypes.jl 80.26% <100.00%> (ø)
src/TensorValues/VectorValueTypes.jl 80.43% <100.00%> (ø)
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 87ff667...cbd25b6. Read the comment docs.

@amartinhuertas
Copy link
Member Author

Hi @fverdugo ... the tests in this PR failed due to VERY strange error:

https://github.com/gridap/Gridap.jl/pull/638/checks?check_run_id=3321610616#step:6:131

In particular, the symbol _transform_rt_shapefuns no longer exists in the source code! (I grep it, and there is no such symbol in the whole source directory).

Can it bit some sort of dirty cache in Github actions or something like that? Have you ever find something similar?

@amartinhuertas
Copy link
Member Author

amartinhuertas commented Aug 13, 2021

@fverdugo
Copy link
Member

checks are not displayed. Perhaps a dummy commit to re-run them?

@amartinhuertas
Copy link
Member Author

checks are not displayed. Perhaps a dummy commit to re-run them?

Ok. I have pushed a new commit and the issue seems to have disappear ... weird !

function get_cell_dof_basis(model::DiscreteModel,
                            cell_reffe::AbstractArray{<:GenericRefFE{RaviartThomas}},
                            ::DivConformity)

To this end:
  (1) Replaced a function by a new Map instance.
  (2) I early evaluated the Jacobian at all integration points
      (instead of on a cell by cell basis).

Did not observe signficant performance improvements, though.
@amartinhuertas
Copy link
Member Author

@fverdugo ... FYI ... this PR is ready for review

@@ -296,6 +296,14 @@ function lazy_map(::typeof(evaluate),a::LazyArray{<:Fill{<:BlockMap}},x::Abstrac
lazy_map(k,args...)
end

function lazy_map(k::Broadcasting,a::LazyArray{<:Fill{<:BlockMap}})
Copy link
Member

@fverdugo fverdugo Aug 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only part that I am not sure about in this PR.

Perhaps replacing Broadcasting by BroadcastingFieldOpMap makes more sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only part that I am not sure about in this PR.

Ok, this is what I was also wondering; see #637 (comment)

Perhaps replacing Broadcasting by BroadcastingFieldOpMap makes more sense.

To put you in context (if you were not already), this lazy_map overload is currently triggered from this line

https://github.com/gridap/Gridap.jl/blob/master/src/ReferenceFEs/RaviartThomasRefFEs.jl#L392

If I understand correctly, you want to defer this "change of order" of operations till evaluation, right? Apart from consistency with other parts of the code, what are the benefit of this? (for my understanding).

@amartinhuertas
Copy link
Member Author

Also @fverdugo ... do u have any insight on this question? #637 (comment)

@fverdugo
Copy link
Member

My concern is that this

function lazy_map(k::Broadcasting,a::LazyArray{<:Fill{<:BlockMap}})
  args = map(i->lazy_map(k,i),a.args)
  bm = a.maps.value
  lazy_map(bm,args...)
end

is only valid when k represents a linear operation.

Unfortunately. BroadcastingFieldOp will not fix it since you are broadcasting the gradient.

I would say that v in

∇v = lazy_map(Broadcasting(∇),v)

is not "multifield" yet. Thus, ArrayBlock and BlockMap should not be involved here.

If v is "multifield" for some reason. I would try to fix this and then you will not need to introduce the new method dispatch def above.

@fverdugo
Copy link
Member

Also @fverdugo ... do u have any insight on this question? #637 (comment)

Assembly routines can allocate large amounts of memory (e.g. intermediate sparse representations of your matrix). The explicit call to gc() is to free them as soon as possible.

In any case, after commenting the gc call, there is still a large plateau in #637 (comment)

Do you know what is causing this? perhaps is also related with gc

@amartinhuertas
Copy link
Member Author

Assembly routines can allocate large amounts of memory (e.g. intermediate sparse representations of your matrix). The explicit call to gc() is to free them as soon as possible.

Ok, this is what I thought. Did you observe the positive impact of gc in memory consumption? Have you studied this in detail? Perhaps (this is speculation, I am not an expert in Julia internals) by forcing the gc to enter in a particular moment in time, instead of letting it decide when it is best to enter, is causing performance issues.

In any case, after commenting the gc call, there is still a large plateau in #637 (comment)
Do you know what is causing this? perhaps is also related with gc

Ok, that is a good observation. This plateu was also present in the flame graphs previous to deactivating gc and occurs during the solve stage. As you say, looking at the RT solve column before and after gc, there has been an increase of the computational time of RT solve, but this is increase is not as significant as the one in Assembly when we have explicit gc.

@amartinhuertas
Copy link
Member Author

Looking at the Total column helps in determining the best compromise.

@amartinhuertas
Copy link
Member Author

is only valid when k represents a linear operation.

by "linear" you mean k(x+y)=k(x)+k(y) and k(alphax)=alphak(x) ?

I do not see the relantionship among being linear in this sense and the impossibility/uncorrectness of defining the result of broadcasting an operation to a vector block as the vector block of broadcasting the operation to its blocks. (Sure I am missing something.)

If v is "multifield" for some reason. I would try to fix this and then you will not need to introduce the new method dispatch def above.

Ok, let me explore this and I will come back to you if I find a solution.

@fverdugo
Copy link
Member

If k is not linear linear, i.e., k(ax) != ak(x), zero blocks in the input can result in non-zero blocks in the output.

@amartinhuertas
Copy link
Member Author

I would say that v in XXX is not "multifield" yet. Thus, ArrayBlock and BlockMap should not be involved here.
If v is "multifield" for some reason. I would try to fix this and then you will not need to introduce the new method dispatch def above.

@fverdugo ... I have been thinking carefully about this comment. While we may try to achieve this, let me stress that the comment is NOT consistent with what Gridap is currently doing in analagous situations. To illustrate my point, I wrote the following MWE:

using Gridap
using Gridap.CellData
n = 3
domain = (0,1,0,1)
partition = (n,n)
model = simplexify(CartesianDiscreteModel(domain, partition))

order = 1
reffeᵤ = ReferenceFE(lagrangian,Float64,order)
reffeₚ = ReferenceFE(lagrangian,Float64,order)
V = TestFESpace(model,reffeᵤ,conformity=:H1)
Q = TestFESpace(model,reffeₚ,conformity=:H1)
Y = MultiFieldFESpace([V,Q])

v=get_fe_basis(V)
y=get_fe_basis(Y)
vb,_=y
(vb)

and debugged carefully the last line. This line triggers the following line:

cell_∇a = lazy_map(Broadcasting(∇),get_data(a))

which in turns triggers:

function return_cache(k::Broadcasting,f::ArrayBlock{A,N}) where {A,N}

Perhaps I am missing something, but I do not see how come this is different from what I am trying to achieve with the method dispatch I have introduced. The difference among

cell_∇a = lazy_map(Broadcasting(∇),get_data(a))

and my particular case, is that in the former the array to which we apply the Broadcasting(gradient) operation is a Fill array of VectorBlock instances, while in the latter, the array to which we apply Broadcasting(gradient) operation is a LazyArray with BlockMap as a kernel. I guess that this is a new case that had not been considered before, and this is why we need the definition for optimization purposes.

If k is not linear linear, i.e., k(ax) != ak(x), zero blocks in the input can result in non-zero blocks in the output.

See my comment above. Isn't

function return_cache(k::Broadcasting,f::ArrayBlock{A,N}) where {A,N}

violating the property you are referring to?

@amartinhuertas
Copy link
Member Author

amartinhuertas commented Aug 19, 2021

To put you in context (if you were not already), this lazy_map overload is currently triggered from this line

https://github.com/gridap/Gridap.jl/blob/master/src/ReferenceFEs/RaviartThomasRefFEs.jl#L392

Ok. I have now realized (I have run the debugger to double check) that this statement I am quoting is WRONG (sorry about that!).

In particular, the new lazy_map overload is triggered from this line:

cell_∇a = lazy_map(Broadcasting(∇),get_data(a))

Here get_data(a) is a LazyArray + BlockMap kernel. (instead of a Fill array of VectorBlocks as with Lagrangian FEs).

@fverdugo
Copy link
Member

In particular, the new lazy_map overload is triggered from this line:

Ok I see! Then we can add the new method overload you proposed but restricted to the gradient to be on the safe side (i.e. to avoid using it with non linear kernels k). Do you think it's enough for your use case?

function lazy_map(k::Broadcasting{typeof(gradient)},a::LazyArray{<:Fill{<:BlockMap}})
  args = map(i->lazy_map(k,i),a.args)
  bm = a.maps.value
  lazy_map(bm,args...)
end

Broadcasting{typeof(gradient)} kernel
@amartinhuertas
Copy link
Member Author

Then we can add the new method overload you proposed but restricted to the gradient to be on the safe side (i.e. to avoid using it with non linear kernels k)

Ok, deal.

Is there any example of a non-linear kernel in Gridap.jl used in conjunction with Broadcasting or it is a future possibility?

I want to make sure I have understood the point. (to avoid introducing the same issue in the future).

Do you think it's enough for your use case?

Yes. (I have double checked).

It remains to decide what to do with the explicit call to GC(). It is currently commented in this PR.

@fverdugo
Copy link
Member

fverdugo commented Sep 6, 2021

We can remove the call to gc() as you propose. This can be readily merged, right?

@amartinhuertas
Copy link
Member Author

I think so. It can be merged.

@fverdugo fverdugo merged commit 9086208 into master Sep 6, 2021
@fverdugo fverdugo deleted the optimizing_conforming_rt_fes branch September 6, 2021 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants