Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read file using Box file path #223

Open
rrpaleja opened this issue Dec 13, 2021 · 5 comments
Open

Read file using Box file path #223

rrpaleja opened this issue Dec 13, 2021 · 5 comments

Comments

@rrpaleja
Copy link

Hello! Is there a function to read a file using the Box file path? E.g. if the file path is "folder1/folder2/temp_file.csv"

I didn't see one and wrote a quick recursive function to get the file ID from a file path:

library(data.table)
file_path <- 'folder1/folder2/temp_file.csv'

split_file_path <- strsplit(file_path, "/")[[1]]

dir_id_last = box_getwd()
for (i in seq(1,length(split_file_path))) {
   
  ls_list <- box_ls(dir_id = dir_id_last, limit = 1000, fields = 'name')
  dir_id_last <- rbindlist(ls_list)[name == split_file_path[i], id]
    
}

There is probably a better (more efficient and cleaner) way to create this function. Do you think this is something that could be implemented?

Thanks

@nathancday
Copy link
Member

nathancday commented Dec 13, 2021 via email

@tamuanand
Copy link

Hi @rrpaleja

In your example below, are you reading the file folder1/folder2/temp_file.csv via the BoxURL or via the box folder that might be mounted/synched on your laptop?

Hello! Is there a function to read a file using the Box file path? E.g. if the file path is "folder1/folder2/temp_file.csv"

I didn't see one and wrote a quick recursive function to get the file ID from a file path:

library(data.table)
file_path <- 'folder1/folder2/temp_file.csv'

split_file_path <- strsplit(file_path, "/")[[1]]

dir_id_last = box_getwd()
for (i in seq(1,length(split_file_path))) {
   
  ls_list <- box_ls(dir_id = dir_id_last, limit = 1000, fields = 'name')
  dir_id_last <- rbindlist(ls_list)[name == split_file_path[i], id]
    
}

There is probably a better (more efficient and cleaner) way to create this function. Do you think this is something that could be implemented?

Thanks

@ijlyttle
Copy link
Member

This use case sounds like it would be suited to {boxrdrive}: https://github.com/r-box/boxrdrive - it is based on the path rather than the folder_id.

@seth127
Copy link

seth127 commented Feb 23, 2022

Hello. Loving this package, by the way.

I'm looking for the same thing here (specifically, passing a file path to box_ul() and box_dl()). Would there be any interest in me making a PR if I get an implementation that I like? If so, any tips or gotchas to consider in the implementation?

Also, would there be other functions beyond box_ul(), box_dl() (and probably box_ls()) that you would think should have this option?

@IsabelFE
Copy link

Hello, I've created a couple of functions to work around this issue and being able to use directory names and file names in my code:

  • The custom function box_subfolder() provides the folder ID for any subfolder, given the parent folder ID and its name. If the named folder does not exist, it gets created. See code at the end.
  • The custom box_read_file() function allows to read a file by its name instead of ID. See code at the end.

I start my project defining a project_folder_id, that is saved in the .Renviron file. Then I made as many folders and subfolders as I need. In this example a "data" folder, with 2 subfolders ("dataframes" and "outputs") and inside "outputs" another subfolder is "plots". Like this:

# Set Box folder ID for the parent data folder for the project
project_folder_id <- Sys.getenv("PROJECT_FOLDER_ID")

# Create or retrieve the subfolder IDs for each desired subfolder
data_folder_id <- box_subfolder(project_folder_id, "data")
dataframes_folder_id <- box_subfolder(data_folder_id, "dataframes")
outputs_folder_id <- box_subfolder(data_folder_id, "outputs")
plots_folder_id <- box_subfolder(outputs_folder_id, "plots")

Now I can read files with my other custom function using their name rather than ID:

dataframe_csv <- box_read_file(dataframes_folder_id, "values.csv")

And write files back to the any of the saved folder IDs:

box_write(dataframe_csv, dir_id = dataframes_folder_id, file_name = "dataframe.csv")

I hope this is useful for other people. Not sure if this is the best way to do it, any suggestions are welcome.

box_subfolder()

box_subfolder <- function(parent_folder_id, subfolder_name) {
  # Get the list of items in the parent folder
  items <- box_ls(parent_folder_id)
  
  # Initialize a variable to store the ID of the existing folder if found
  existing_folder_id <- NULL
  
  # Iterate over all items in the list
  for (item in items) {
    # Check if the current item has the 'name' and 'type' attributes
    if ("name" %in% names(item) && "type" %in% names(item)) {
      # Check if the item is a folder and matches the subfolder name
      if (item$type == "folder" && item$name == subfolder_name) {
        existing_folder_id <- item$id
        break
      }
    } else {
      warning("Item does not have the expected 'name' and 'type' attributes.")
    }
  }
  
  # If an existing folder was found, return its ID with a message
  if (!is.null(existing_folder_id)) {
    message("The folder '", subfolder_name, "' already exists with ID: ", existing_folder_id)
    return(existing_folder_id)
  } else {
    # Create a new folder if it doesn't exist, and return its ID with a message
    new_folder <- box_dir_create(dir_name = subfolder_name, parent_dir_id = parent_folder_id)
    message("Created a new folder '", subfolder_name, "' with ID: ", new_folder$id)
    return(new_folder$id)
  }
}

box_read_file()

box_read_file <- function(folder_id, file_name) {
  # Get the list of items in the folder
  items <- box_ls(folder_id)
  
  # Initialize a variable to store the ID of the existing file if found
  file_id <- NULL
  
  # Iterate over all items in the list
  for (item in items) {
    # Check if the current item has the 'name' and 'type' attributes
    if ("name" %in% names(item) && "type" %in% names(item)) {
      # Check if the item is a file and matches the file name
      if (item$type == "file" && item$name == file_name) {
        file_id <- item$id
        break
      }
    } else {
      warning("Item does not have the expected 'name' and 'type' attributes.")
    }
  }
  # If an existing file was found, get its ID and read the file
  if (!is.null(file_id)) {
    file_data <- box_read(file_id, trust = TRUE)
    return(file_data)
  } else {
    stop("File not found")
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants