-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check efficiency of RT implementation #637
Comments
I think that we need to replace lazy_map(_transform_rt_shapefuns,
cell_reffe,
get_cell_map(Triangulation(model)),
lazy_map(Broadcasting(constant_field), sign_flip)) by cell_fs = lazy_map(get_shape_functions,cell_reffe)
lazy_map(ContraPiolaMap(),cell_fs,
get_cell_map(Triangulation(model)),
lazy_map(Broadcasting(constant_field), sign_flip)) and remove Perhaps, Idem with the celldofs |
Base line commit 67e5ee1 (2D Darcy on a box)Profile data attached below
|
After first optimization pointed by @fverdugo above (92a69ab)We have improved performance. The width of the mountains on the left hand side panel (assembly) have been reduced. Still there is type instability in the evaluation of the Monomial Basis (I will take a look). It is surprising the huge amount of time spent in the explicit GC after assembly. Anyway, I guess @fverdugo is anyway aware of this ...
|
After marationian sessions of profiling, I think I have detected a performance pitfall in the current version of Gridap when working with differential operators (eg., gradient) applied to ArrayBlocks of fields. The pitfall is reproduced with the mwe below. I have come up with a solution in commit dc4350f. I am concerned about the following:
using Test
using Gridap
import Gridap: ∇, divergence
using LinearAlgebra
using Gridap.CellData
using Profile
using ProfileView
u(x) = VectorValue(2*x[1],x[1]+x[2])
divergence(::typeof(u)) = (x) -> 3
p(x) = x[1]-x[2]
∇p(x) = VectorValue(1,-1)
∇(::typeof(p)) = ∇p
f(x) = u(x) + ∇p(x)
domain = (0,1,0,1)
partition = (100,100)
order = 1
model = CartesianDiscreteModel(domain,partition)
V = FESpace(model,ReferenceFE(raviart_thomas,Float64,order),conformity=:Hdiv,
dirichlet_tags=[5,6])
Q = FESpace(model,ReferenceFE(lagrangian,Float64,order); conformity=:L2)
U = TrialFESpace(V,u)
P = TrialFESpace(Q)
Y = MultiFieldFESpace([V, Q])
X = MultiFieldFESpace([U, P])
trian = Triangulation(model)
degree = 2
dΩ = Measure(trian,degree)
points=get_cell_points(dΩ.quad)
v=get_fe_basis(U)
pb=get_trial_fe_basis(Q)
div_v_q=∫(divergence(v)*q)dΩ
div_v_q_auto=div_v_q.dict[dΩ.quad.trian]
div_v_q_man=lazy_map(evaluate,get_data(divergence(v)*pb),get_data(points))
a((u, p),(v, q)) = ∫( (∇⋅v)*p )*dΩ
Y = MultiFieldFESpace([V, Q])
X = MultiFieldFESpace([U, P])
y=get_fe_basis(Y)
x=get_trial_fe_basis(X)
dc=a(x,y)
div_v_q_auto_block=dc.dict[dΩ.quad.trian]
@noinline function myloop!(cache,arr)
l = 0
for i in 1:length(arr)
ai = getindex!(cache,arr,i)
l += length(ai)
end
l
end
# Performance pitfall is here
cache = array_cache(div_v_q_auto_block);
@profile myloop!(cache,div_v_q_auto_block)
# Without blocks it is OK, no performance pitfall.
cache = array_cache(div_v_q_auto);
@profile myloop!(cache,div_v_q_auto) |
After second optimization (dc4350f)We have improved the performance of the assembly even more. An additional mountain has disappearead on the left hand side of the flamegraph. We are getting there!
|
the "mountains" look very nice now! If you are able to fix the plateau, you would easily get a 2x improvement. |
The plateau is GC (explicit call after assembly). Do u think this is solvable? |
which is the size of the mesh in this example? |
100x100 quads |
this CG time seems a lot! what does happen if you comment the line? are you running with --check-bounds=no ? |
No, I was not. Results with --check-bounds=no in the table below. 5-8% improvements, the GC plateu still there, though.
|
The assemly times improved dramatically! Why did you add that GC() line after assembly? (I guess to reduce permanent memory allocation, right?)
|
New optimization. Using DIV and integral over the reference domain. Only very mild improvements observed. I would say this is because the optimization with CartesianGrids ... the Jacobian is the same for all cells, and this is exploited. For unstructured meshes I would expect even higher improvements.
|
I looked at this, and I came with the optimization in commit a250431. However, I did not observe significant performance improvements. This might be related to the fact that in the examples I have been playing with use AffineMap, CartesianMaps, etc., i.e., for general unstructured mappings we might observe an improvement. For the records, the code I used to measure the improvement is the following one: using Test
using Gridap
using Gridap.Geometry
using Gridap.ReferenceFEs
using Gridap.FESpaces
using Gridap.CellData
using Gridap.TensorValues
using Gridap.Fields
using Gridap.Io
using Gridap.CellData
using Profile
using ProfileView
using BenchmarkTools
order = 1
reffe = ReferenceFE(raviart_thomas,order)
domain =(0,1,0,1)
partition = (100,100)
model = CartesianDiscreteModel(domain,partition) |> simplexify
V = FESpace(model,reffe,conformity=DivConformity())
U = TrialFESpace(V)
v(x) = VectorValue(-0.5*x[1]+1.0,-0.5*x[2])
vh = interpolate(v,V)
e = v - vh
Ω = Triangulation(model)
dΩ = Measure(Ω,2*order)
el2 = sqrt(sum( ∫( e⋅e )*dΩ ))
@test el2 < 1.0e-10
@noinline function myloop!(cache,arr)
l = 0
for i in 1:length(arr)
ai = getindex!(cache,arr,i)
l += length(ai)
end
l
end
array=Gridap.FESpaces._cell_vals(V,v)
@time cache = array_cache(array);
@benchmark myloop!(cache,array) |
By looking at
Gridap.jl/src/FESpaces/DivConformingFESpaces.jl
Line 116 in 67e5ee1
I would say that we are reevaluating the reference polynomials each time we visit a new cell.
The text was updated successfully, but these errors were encountered: