Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submit to CRAN #9

Closed
meztez opened this issue Nov 17, 2020 · 27 comments
Closed

Submit to CRAN #9

meztez opened this issue Nov 17, 2020 · 27 comments

Comments

@meztez
Copy link
Owner

meztez commented Nov 17, 2020

At some point in the future, not now, now is not the time.

@jeroen
Copy link
Contributor

jeroen commented Dec 11, 2023

Before you can submit to CRAN you will have to request them to add the prerequisites on their Debian servers, otherwise you can't pass the incoming checks.

I think the best way is to send an email to CRAN@R-project.org requesting to add the libgrpc++-dev and protobuf-compiler-grpc packages to their server. The other deps are already there, see this list.

@meztez
Copy link
Owner Author

meztez commented Dec 11, 2023

Sent

@gaborcsardi
Copy link
Collaborator

I suspect the we'll need to fix this as well: https://github.com/meztez/bigrquerystorage/actions/runs/7168946042/job/19518216883#step:8:125

Files which contain pragma(s) suppressing diagnostics:
  'src/google/api/annotations.pb.h' 'src/google/api/client.pb.h'
  'src/google/api/field_behavior.pb.h' 'src/google/api/http.pb.h'
  'src/google/api/launch_stage.pb.h' 'src/google/api/resource.pb.h'
  'src/google/cloud/bigquery/storage/v1/arrow.pb.h'
  'src/google/cloud/bigquery/storage/v1/avro.pb.h'
  'src/google/cloud/bigquery/storage/v1/protobuf.pb.h'
  'src/google/cloud/bigquery/storage/v1/storage.pb.h'
  'src/google/cloud/bigquery/storage/v1/stream.pb.h'
  'src/google/cloud/bigquery/storage/v1/table.pb.h'
  'src/google/rpc/status.pb.h'

E.g. by porting the fix from the old configure.R before Jeroen's PR.

@jeroen
Copy link
Contributor

jeroen commented Dec 12, 2023

I didn't see these warnings on winbuilder so they may not be checked on Windows: https://win-builder.r-project.org/0GJ2L4z4Zm9u/

@jeroen
Copy link
Contributor

jeroen commented Dec 15, 2023

CRAN has added libgrpc++-dev and protobuf-compiler-grpc: r-devel/rcheckserver@380349a

I think your package is ready to be submitted.

Note that it may still take a while, the first submission will be scheduled for manual review by a cran assistent which takes a while and sometimes requires a few rounds of resubmitting to some requests.

@meztez
Copy link
Owner Author

meztez commented Dec 22, 2023

@jeroen @gaborcsardi

Got this answer from CRAN. I'll look into it after the holidays.


Prof Brian Ripley ripley@stats.ox.ac.uk
06 h 10 (il y a 2 heures)
à Bruno, CRAN, Tomas

This downloads pre-compiled software on macOS, contrary to the CRAN
policy (and Dr Ligges confirms permission was not given).

The README says

'Windows

The package will automatically download a static build of the system
requirements during installation.'

although that is not clear from the check log. That too would be a
policy violation.

The package has been removed from CRAN.


@jeroen
Copy link
Contributor

jeroen commented Dec 22, 2023

Ugh, I guess you have to ask them for permission to download libs (grpc is not available on their mac/win servers)

@meztez
Copy link
Owner Author

meztez commented Dec 22, 2023

@jeroen Oh, well I guess I'll ask for permission then. Thanks for the input, it gives me confidence to follow up with CRAN. I don't know if I would have taken that step by myself.

@jeroen
Copy link
Contributor

jeroen commented Dec 22, 2023

Well it is not certain they will agree. Uwe Ligges is usually helpful and but Brian Ripley can be very unreasonable. So the best path is probably to ask Uwe and explain him the grpc library is not available on their Mac or Windows servers.

@gaborcsardi
Copy link
Collaborator

the grpc library is available on their Mac or Windows servers.

not available, you mean?

Btw. I just opened https://bugs.r-project.org/show_bug.cgi?id=18641

@jeroen
Copy link
Contributor

jeroen commented Dec 22, 2023

Yes sorry typo 🤦

By the way, in the mean while users can install binary packages from: https://meztez.r-universe.dev/bigrquerystorage

@meztez
Copy link
Owner Author

meztez commented Dec 22, 2023

I sent this:

Hi Dr Ligges,

In your opinion, what would be the best way forward to respect CRAN policy and get the package published?

Happy holidays to you and the people in your life.

Bruno

Context :

Some members of the R community have communicated to me their wish to see package bigrquerystorage (https://github.com/meztez/bigrquerystorage) published to CRAN. Github users -jeroen, -gaborcsardi and -hadley have helped
me with compatibility on all platforms. (r-dbi/bigrquery#425, r-dbi/bigrquery#381)

It is a companion package to bigrquery that uses the BigQuery Storage API based on their gRPC protocol instead of their regular REST API. It provides better performance for larger datasets.

Unfortunately, it has a dependency (gRPC) that is not yet available on Windows and Mac CRAN servers, as Jeroen tells me. To remedy that, I understand the configure script was modified to install precompiled binaries for Mac and Windows. I was advised to ask for explicit permission after receiving Dr Ripley's communication.

On Mac

curl -sfL "https://autobrew.github.io/scripts/grpc" > autobrew
):

On Windows https://github.com/meztez/bigrquerystorage/blob/main/tools/winlibs.R

  url <- if(grepl("aarch", R.version$platform)){
    "https://github.com/r-windows/bundles/releases/download/grpc-1.51.1/grpc-1.51.1-clang-aarch64.tar.xz"
  } else if(grepl("clang", Sys.getenv('R_COMPILED_BY'))){
    "https://github.com/r-windows/bundles/releases/download/grpc-1.51.1/grpc-1.51.1-clang-x86_64.tar.xz"
  } else if(getRversion() >= "4.3") {
    "https://github.com/r-windows/bundles/releases/download/grpc-1.51.1/grpc-1.51.1-ucrt-x86_64.tar.xz"
  } else {
    "https://github.com/rwinlib/grpc/archive/refs/tags/1.33.2-2.tar.gz"
  }
  download.file(url, basename(url), quiet = TRUE)

@gaborcsardi
Copy link
Collaborator

@meztez https://bugs.r-project.org/show_bug.cgi?id=18641 has been fixed and Rtools43 has grpc now. So we could try to give this another go, if you can gather the strength for another submission.

We would need to update the build process on Windows slightly, and then re-submit. I can submit a PR for the changes.

Also, if you don't have strength to do it, I completely understand. In this case I can do the first submission, and then once it is safely on CRAN, pass the package to you in the next submission.

@eitsupi
Copy link
Contributor

eitsupi commented Feb 6, 2024

Note that arrow may be removed from CRAN soon (apache/arrow#39806).
Perhaps you would prefer to make your submission after the arrow issue is resolved.......

It might also be an interesting improvement to remove arrow from the required dependencies and replace it with nanoarrow.

@meztez
Copy link
Owner Author

meztez commented Feb 6, 2024

@eitsupi

Note that arrow may be removed from CRAN soon (apache/arrow#39806). Perhaps you would prefer to make your submission after the arrow issue is resolved.......

It might also be an interesting improvement to remove arrow from the required dependencies and replace it with nanoarrow.

The arrow dependencies that would have to be replaced are:

rdr <- RecordBatchStreamReader$create(unlist(raws))
,
tb <- rdr$read_table()
,
tb <- Table$create(

I've taken a look at nanoarrow and could not find drop in replacement at first glance. Would you have further suggestion on how this could be done?

What would be the benefits of using nanoarrow in this instance?

Thank you

@eitsupi
Copy link
Contributor

eitsupi commented Feb 6, 2024

What would be the benefits of using nanoarrow in this instance?

The biggest advantage is that there is no need to have the huge and time-consuming to build arrow package as a dependency. The R ADBC-related packages and the Arrow-related methods of the DBI package are all based on nanoarrow.
https://arrow.apache.org/adbc/current/r/index.html
https://dbi.r-dbi.org/articles/DBI-arrow.html

I've taken a look at nanoarrow and could not find drop in replacement at first glance. Would you have further suggestion on how this could be done?

The correspondence between the objects of the arrow package and those of the nanoarrow package may be understood from this file.
https://github.com/apache/arrow-nanoarrow/blob/a41fb2be59f44b6bd6bf5a014aef253041fdc10f/r/R/pkg-arrow.R

Since the Table equivalent object exists only in Apache Arrow's C++ implementation, I believe nanoarrow uses the struct type nanoarrow_array_stream to exchange data in Table format.

@meztez
Copy link
Owner Author

meztez commented Feb 6, 2024

@eitsupi I've tried to work out how to use a two elements raws list (schema, data) and turn into a data.frame using nanoarrow. I cannot figure it out at the moment, I'm sorry.

> dt <- as_nanoarrow_array(raws[[2]], schema = infer_nanoarrow_schema(raws[[1]])) |> as.data.frame()
Error in as.data.frame.nanoarrow_array(as_nanoarrow_array(raws[[2]], schema = infer_nanoarrow_schema(raws[[1]]))) : 
  Can't convert array with type uint8 to data.frame()
> 

@thisisnic
Copy link

@paleolimbot may be able to help here

@paleolimbot
Copy link

I'm guessing here that raws[[1]] is the serialized record batch and raws[[2]] is the serialized schema? nanoarrow has an IPC reader but it hasn't been wired up in the R package yet. Until it has, I'm not sure it can help you here!

@gaborcsardi
Copy link
Collaborator

@paleolimbot What exactly do you mean by "IPC reader"? Does that basically mean reading the data from a socket?

Maybe we could do something simple, like writing all the data out to disk, and then reading it from there?

@meztez
Copy link
Owner Author

meztez commented Feb 13, 2024

Could we push directly to a nanoarrow object instead of doing this C++ vector push to R raws to arrow table?

I'm thinking pushing directly into a nanoarrow object could save some time?

Do we need arrow to convert back into a R data.frame?

@meztez
Copy link
Owner Author

meztez commented Feb 18, 2024

@paleolimbot
Copy link

Sorry I missed the ping!

What exactly do you mean by "IPC reader"?

I believe that "IPC" (which can be used for inter-process communication, like via a socket, but works anywhere you might want to stream a table from a server to a client) is the format that you are currently using the arrow package to read (because when I browsed the code I saw a RecordBatchStreamReader). In order to use nanoarrow to replace that bit of code, I (or somebody) would need to add a binding for nanoarrow (C)'s "ArrowIpcArrayStreamReader". The Python version of that is here: apache/arrow-nanoarrow#388 .

Could we push directly to a nanoarrow object

It seems like your input here is a fully downloaded raw vector of the entire stream. When the IPC reader has R bindings, that would give you a nanoarrow_array_stream, on which you could call as.data.frame(). Technically you wouldn't have to materialize the bytes in advance (you could stream the input).

Until arrow is back on CRAN, I think we will have to hold.

I also had trouble submitting a small nanoarrow patch because it had arrow in Suggests 🙁

@paleolimbot
Copy link

I gave the implementation in R a go, which should make it into the next release in the next two months or so ( apache/arrow-nanoarrow#390 ). You should be able to nanoarrow::read_nanoarrow(bqs_ipc_stream(...)) |> as.data.frame() (no pressure).

@gaborcsardi
Copy link
Collaborator

In the meanwhile, it seems like arrow is back on CRAN [1], so we can also submit the current version, and then later switch to nanoarrow.

[1] https://fosstodon.org/deck/@nic_crane@mastodon.social/111981942640349340

@meztez
Copy link
Owner Author

meztez commented Feb 23, 2024

In the meanwhile, it seems like arrow is back on CRAN [1], so we can also submit the current version, and then later switch to nanoarrow.

[1] https://fosstodon.org/deck/@nic_crane@mastodon.social/111981942640349340

Let's hope third time is the charm.

@meztez
Copy link
Owner Author

meztez commented Feb 26, 2024

Thanks,
on its way to CRAN.

Best,
Benjamin Altmann

@meztez meztez closed this as completed Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants