Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some TU Datasets do not work - Error when loading #195

Closed
chrisn-pik opened this issue Dec 19, 2022 · 6 comments · Fixed by #196 or #203
Closed

Some TU Datasets do not work - Error when loading #195

chrisn-pik opened this issue Dec 19, 2022 · 6 comments · Fixed by #196 or #203

Comments

@chrisn-pik
Copy link

Unfortunately, some TU Datasets can not be loaded. The following example illustrates the problem for dataset "Cuneiform":

using MLDatasets
tudata = TUDataset("Cuneiform")

The following error message occurs:

ERROR: at row 1, column 1 : ErrorException("file entry \"0,\" cannot be converted to Int64")
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] dlm_fill(T::DataType, offarr::Vector{Vector{Int64}}, dims::Tuple{Int64, Int64}, has_header::Bool, sbuff::String, auto::Bool, eol::Char)
    @ DelimitedFiles /usr/share/julia/stdlib/v1.8/DelimitedFiles/src/DelimitedFiles.jl:515
  [3] readdlm_string(sbuff::String, dlm::Char, T::Type, eol::Char, auto::Bool, optsd::Dict{Symbol, Union{Char, Integer, Tuple{Integer, Integer}}})
    @ DelimitedFiles /usr/share/julia/stdlib/v1.8/DelimitedFiles/src/DelimitedFiles.jl:471
  [4] readdlm_auto(input::String, dlm::Char, T::Type, eol::Char, auto::Bool; opts::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ DelimitedFiles /usr/share/julia/stdlib/v1.8/DelimitedFiles/src/DelimitedFiles.jl:247
  [5] readdlm_auto
    @ /usr/share/julia/stdlib/v1.8/DelimitedFiles/src/DelimitedFiles.jl:233 [inlined]
  [6] #readdlm#6
    @ /usr/share/julia/stdlib/v1.8/DelimitedFiles/src/DelimitedFiles.jl:226 [inlined]
  [7] readdlm
    @ /usr/share/julia/stdlib/v1.8/DelimitedFiles/src/DelimitedFiles.jl:226 [inlined]
  [8] #readdlm#1
    @ /usr/share/julia/stdlib/v1.8/DelimitedFiles/src/DelimitedFiles.jl:57 [inlined]
  [9] readdlm
    @ /usr/share/julia/stdlib/v1.8/DelimitedFiles/src/DelimitedFiles.jl:57 [inlined]
 [10] TUDataset(name::String; dir::Nothing)
    @ MLDatasets ~/.julia/packages/MLDatasets/OYCcg/src/datasets/graphs/tudataset.jl:74
 [11] TUDataset(name::String)
    @ MLDatasets ~/.julia/packages/MLDatasets/OYCcg/src/datasets/graphs/tudataset.jl:54

caused by: file entry "0," cannot be converted to Int64
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] store_cell(dlmstore::DelimitedFiles.DLMStore{Int64}, row::Int64, col::Int64, quoted::Bool, startpos::Int64, endpos::Int64)
    @ DelimitedFiles /usr/share/julia/stdlib/v1.8/DelimitedFiles/src/DelimitedFiles.jl:388
  [3] dlm_fill(T::DataType, offarr::Vector{Vector{Int64}}, dims::Tuple{Int64, Int64}, has_header::Bool, sbuff::String, auto::Bool, eol::Char)
    @ DelimitedFiles /usr/share/julia/stdlib/v1.8/DelimitedFiles/src/DelimitedFiles.jl:510
  [4] readdlm_string(sbuff::String, dlm::Char, T::Type, eol::Char, auto::Bool, optsd::Dict{Symbol, Union{Char, Integer, Tuple{Integer, Integer}}})
    @ DelimitedFiles /usr/share/julia/stdlib/v1.8/DelimitedFiles/src/DelimitedFiles.jl:471
  [5] readdlm_auto(input::String, dlm::Char, T::Type, eol::Char, auto::Bool; opts::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ DelimitedFiles /usr/share/julia/stdlib/v1.8/DelimitedFiles/src/DelimitedFiles.jl:247
  [6] readdlm_auto
    @ /usr/share/julia/stdlib/v1.8/DelimitedFiles/src/DelimitedFiles.jl:233 [inlined]
  [7] #readdlm#6
    @ /usr/share/julia/stdlib/v1.8/DelimitedFiles/src/DelimitedFiles.jl:226 [inlined]
  [8] readdlm
    @ /usr/share/julia/stdlib/v1.8/DelimitedFiles/src/DelimitedFiles.jl:226 [inlined]
  [9] #readdlm#1
    @ /usr/share/julia/stdlib/v1.8/DelimitedFiles/src/DelimitedFiles.jl:57 [inlined]
 [10] readdlm
    @ /usr/share/julia/stdlib/v1.8/DelimitedFiles/src/DelimitedFiles.jl:57 [inlined]
 [11] TUDataset(name::String; dir::Nothing)
    @ MLDatasets ~/.julia/packages/MLDatasets/OYCcg/src/datasets/graphs/tudataset.jl:74
 [12] TUDataset(name::String)
    @ MLDatasets ~/.julia/packages/MLDatasets/OYCcg/src/datasets/graphs/tudataset.jl:54
  

Other problems arise for Fingerprint:

tudata = TUDataset("Fingerprint")
AssertionError: all(sort(unique(graph_indicator)) .== 1:length(unique(graph_indicator)))
Stacktrace:
 [1] TUDataset(name::String; dir::Nothing)
   @ MLDatasets ~/.julia/packages/MLDatasets/OYCcg/src/datasets/graphs/tudataset.jl:66
 [2] TUDataset(name::String)
   @ MLDatasets ~/.julia/packages/MLDatasets/OYCcg/src/datasets/graphs/tudataset.jl:54
 [3] top-level scope
   @ REPL[5]:1
@Dsantra92
Copy link
Collaborator

Thanks @chrisn-pik for reporting this!!

@CarloLucibello
Copy link
Member

@chrisn-pik can you confirm that everything works on last tagged version?

@chrisn-pik
Copy link
Author

@CarloLucibello Thanks for the quick fix. I do not get an error message in case of Cuneiform now, however, I still can not use Fingerprint, see #195 (comment).

@Dsantra92 Dsantra92 reopened this Dec 30, 2022
@Dsantra92
Copy link
Collaborator

In our code, we expect graph labels to be numbered serially. Seems like it is not the case and is not specified in the format. Will be making a pr shortly.

@Dsantra92
Copy link
Collaborator

Looks like there must be some discrepancy in this particular dataset.

using DelimitedFiles: readdlm
readdlm("~/.julia/datadeps/TUDataset/Fingerprint/Fingerprint_graph_indicator.txt", Int) |> unique |> length

gives 2149. Which means there are 2149 graphs with nodes. But the original docs and other files like graph_labels.txt say it's 2800. The error in loading the dataset is to be expected.

Removing the check and changing a few lines can load the data, but should we?
cc: @CarloLucibello

@CarloLucibello
Copy link
Member

we can turn the error into a warning

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants