Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type instability when reading a BAM file #18

Open
phaverty opened this issue Apr 18, 2018 · 3 comments
Open

Type instability when reading a BAM file #18

phaverty opened this issue Apr 18, 2018 · 3 comments

Comments

@phaverty
Copy link

I don't know if this is a significant bottleneck, but I noticed some type instability when looping over BAM file records:

function read_bam(bam_file)
# iterate over BAM records
chr = String[]
left_pos = Int64[]
right_pos = Int64[]
for record in reader
push!(chr, BAM.refname(record))
push!(left_pos, leftposition(record))
push!(right_pos, rightposition(record))
end
DataFrame(chr = chr, left = left_pos, right = right_pos)
end

@code_warntype read_bam(some_file)

!julia> @code_warntype read_bam(bam_file)
Variables:
#self#
bam_file
record::ANY
#temp#::ANY
chr::Array{String,1}
left_pos::Array{Int64,1}
right_pos::Array{Int64,1}
itemT

Body:
begin
chr::Array{String,1} = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Array{String,1}, svec(Any, Int64), Array{String,1}, 0, 0, 0)) # line 4:
left_pos::Array{Int64,1} = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Array{Int64,1}, svec(Any, Int64), Array{Int64,1}, 0, 0, 0)) # line 5:
right_pos::Array{Int64,1} = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Array{Int64,1}, svec(Any, Int64), Array{Int64,1}, 0, 0, 0)) # line 6:
SSAValue(0) = Main.reader
#temp#::ANY = (Base.start)(SSAValue(0))::ANY
9:
unless !((Base.done)(SSAValue(0), #temp#::ANY)::ANY)::ANY goto 39
SSAValue(1) = (Base.next)(SSAValue(0), #temp#::ANY)::ANY
record::ANY = (Core.getfield)(SSAValue(1), 1)::ANY
#temp#::ANY = (Core.getfield)(SSAValue(1), 2)::ANY # line 7:
SSAValue(5) = ($(QuoteNode(BioAlignments.BAM.refname)))(record::ANY)::String
$(Expr(:inbounds, false))
# meta: location array.jl push! 652
SSAValue(6) = (Base.bitcast)(UInt64, (Base.check_top_bit)(1)::Int64)
$(Expr(:foreigncall, :(:jl_array_grow_end), Void, svec(Any, UInt64), :(chr), 0, SSAValue(6), 0)) # line 653:
# meta: location abstractarray.jl endof 134
# meta: location abstractarray.jl linearindices 99
# meta: location abstractarray.jl indices1 71
# meta: location abstractarray.jl indices 64
SSAValue(9) = (Base.arraysize)(chr::Array{String,1}, 1)::Int64
# meta: pop location
# meta: pop location
# meta: pop location
# meta: pop location
(Base.arrayset)(chr::Array{String,1}, SSAValue(5), (Base.select_value)((Base.slt_int)(SSAValue(9), 0)::Bool, 0, SSAValue(9))::Int64)::Array{String,1}
# meta: pop location
$(Expr(:inbounds, :pop)) # line 8:
(Main.push!)(left_pos::Array{Int64,1}, (Main.leftposition)(record::ANY)::ANY)::Array{Int64,1} # line 9:
(Main.push!)(right_pos::Array{Int64,1}, (Main.rightposition)(record::ANY)::ANY)::Array{Int64,1}
37:
goto 9
39: # line 11:
return $(Expr(:invoke, MethodInstance for (::Core.#kw#Type)(::Array{Any,1}, ::Type{DataFrames.DataFrame}), :($(QuoteNode(Type))), :($(Expr(:invoke, MethodInstance for vector_any(::Any, ::Vararg{Any,N} where N), :(Base.vector_any)$
end::DataFrames.DataFrame

I think the ANY type for record and temp indicate some type instability

@TransGirlCodes
Copy link
Member

Thats very interesting that record should be of type ANY. At first glance, I think that should not be happening. Thanks for flagging this!

@phaverty
Copy link
Author

phaverty commented Apr 19, 2018 via email

@phaverty
Copy link
Author

When following the iteration suggestion in the "Performance Tips" section, no type instability is observed. e.g. for:

reader = open(BAM.Reader, "data.bam")
record = BAM.Record()
while !eof(reader)
read!(reader, record)
# do something
end

Variables:
#self#
genome_name::String
reader::BioAlignments.BAM.Reader{IOStream}
info::GenomicVectors.GenomeInfo{Int64}
chr::Array{String,1}
left_pos::Array{Int64,1}
right_pos::Array{Int64,1}
record::BioAlignments.BAM.Record
itemT@_9
itemT@_10
itemT@_11

Body:

Maybe just update the docs to make this the standard way of looping over a BAM?

@CiaranOMara CiaranOMara transferred this issue from BioJulia/BioAlignments.jl Jul 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants