Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check up on missing content-type header #127

Closed
sckott opened this issue Dec 15, 2016 · 15 comments
Closed

check up on missing content-type header #127

sckott opened this issue Dec 15, 2016 · 15 comments
Milestone

Comments

@sckott
Copy link
Contributor

sckott commented Dec 15, 2016

see CrossRef/rest-api-doc#172

@sckott
Copy link
Contributor Author

sckott commented Feb 14, 2017

not fixed yet

@legg0028
Copy link

Is there a way to work around this? I am getting the error:
Error in if (x$headers$content-type == "text/plain") { : argument is of length zero

I can't find a way to work around this error and it has really stopped my project from moving forward

@sckott
Copy link
Contributor Author

sckott commented Mar 31, 2017

@legg0028 Can you show me the code you used that generated that error?

@legg0028
Copy link

legg0028 commented Mar 31, 2017

@sckott any help is appreciated :) Try to ignore my basic coding skills haha

# Get the total number of articles (for a publisher) found
publisher_meta <- cr_works(filter = c(publisher_name = "Springer Nature",
                                      type = "journal-article"),
                     limit = 0,)

totNumFound <- publisher_meta$meta$total_results
numLimit <- 1000 #How many articles to attempt to get information for in each loop, 1000 is the max
cursor = "*"

downloadDone <- FALSE

# Get information for articles in the specific journal
while (downloadDone != TRUE) {
  # Get information for articles from the publisher
  journals <- cr_works_(filter = c(publisher_name = "Springer Nature", type = "journal-article"),
                        cursor = cursor,
                        cursor_max = 3000,
                        limit = numLimit)
  for (i in 1:length(journals)) {
    temp <- fromJSON(journals[[i]])
    temp <- temp$message$items
    
    # Check if we finished getting all our article data
    if (length(title) >= totNumFound) {
      downloadDone <- TRUE
    } else {
      assign("cursor", fromJSON(journals[[length(journals)]])$message$'next-cursor', .GlobalEnv)
    }
  }
}

@sckott
Copy link
Contributor Author

sckott commented Mar 31, 2017

thx, but title is undefined - and where does the error occur exactly in the code above. the call to cr_works at the top or the call to cr_works_

@legg0028
Copy link

legg0028 commented Mar 31, 2017

@sckott sorry I tried to trim the code for simplicity. The error occurs at the 'cr_works_', within the loop. The thing is, this code can run for hours before the error even occurs. That is, the 'while' loop can iterate many times before actually producing an error. Here is the unamended code, with 'title' defined.

# ----------------------------------------------------------------------------------
# ----- Get a list of all ISSNs from a publisher -----------------------------------
# ----------------------------------------------------------------------------------
# Get the total number of articles (for a publisher) found
publisher_meta <- cr_works(filter = c(publisher_name = "Springer Nature",
                                      type = "journal-article"),
                     limit = 0)

totNumFound <- publisher_meta$meta$total_results
numLimit <- 1000 #How many articles to attempt to get information for in each loop, 1000 is the max
cursor = "*"

downloadDone <- FALSE

# Set up the lists that we will populate
publisher <- list()
volume <- list()
issue <- list()
page <- list()
journal_name <- list()
DOI <- list()
title <- list()
author <- list()
ISSN <- list()
subject <- list()

# Get information for articles in the specific journal
while (downloadDone != TRUE) {
  # Get information for articles from the publisher
  journals <- cr_works_(filter = c(publisher_name = "Springer Nature", type = "journal-article"),
                        cursor = cursor,
                        cursor_max = 3000,
                        limit = numLimit)
  for (i in 1:length(journals)) {
    temp <- fromJSON(journals[[i]])
    temp <- temp$message$items
    
    assign("publisher", append(publisher, temp$publisher), .GlobalEnv)
    assign("volume", append(volume, temp$volume), .GlobalEnv)
    assign("issue", append(issue, temp$issue), .GlobalEnv)
    assign("page", append(page, temp$page), .GlobalEnv)
    assign("journal_name", append(journal_name, temp$'container-title'), .GlobalEnv)
    assign("DOI", append(DOI, temp$DOI), .GlobalEnv)
    assign("title", append(title, temp$title), .GlobalEnv)
    assign("author", append(author, temp$author), .GlobalEnv)
    assign("ISSN", append(ISSN, temp$ISSN), .GlobalEnv)
    assign("subject", append(subject, temp$subject), .GlobalEnv)
    
    # Check if we finished getting all our article data
    if (length(title) >= totNumFound) {
      downloadDone <- TRUE
    } else {
      assign("cursor", fromJSON(journals[[length(journals)]])$message$'next-cursor', .GlobalEnv)
    }
  }
}

# Consolodate all info in to one data.frame
articleData <- as.data.frame(cbind(publisher = as.character(publisher),
                                   volume = as.character(volume),
                                   issue = as.character(issue),
                                   page = as.character(page),
                                   journal_name = as.character(journal_name),
                                   DOI = as.character(DOI),
                                   title = as.character(title),
                                   author = as.character(author),
                                   ISSN = as.character(ISSN),
                                   subject = as.character(subject)),
                             stringsAsFactors = FALSE)

sckott added a commit that referenced this issue Apr 1, 2017
first check for missing content-type header - and pass if not there

also bump license year to 2017
update all man files for new roxyge2n, and require that new version
@sckott
Copy link
Contributor Author

sckott commented Apr 1, 2017

@legg0028 try again after reinstalling. I wasn't able to reproduce the error - I didn't run the entire thing, as I think I know what was wrong, so pushed a change that should fix it - let me know if you can verify that you don't get error any longer

@legg0028
Copy link

legg0028 commented Apr 1, 2017

@sckott still testing the code. I have the cr_works_ function in a tryCatch function now so I should get all errors that occur. Only 4% done so far but I did get this error:
simpleWarning: 500: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://mds4:8983/solr/crmds1: parsing error - (works)

The updated code I am using is as follows:

# ----------------------------------------------------------------------------------
# ----- Get a list of all ISSNs from a publisher -----------------------------------
# ----------------------------------------------------------------------------------
# Get the total number of articles (for a publisher) found
publisher_meta <- cr_works(filter = c(publisher_name = "Springer Nature",
                                      type = "journal-article"),
                     limit = 0)

totNumFound <- publisher_meta$meta$total_results
numLimit <- 1000 #How many articles to attempt to get information for in each loop, 1000 is the max
cursor = "*"

downloadDone <- FALSE

# Set up the lists that we will populate
publisher <- list()
volume <- list()
issue <- list()
page <- list()
journal_name <- list()
DOI <- list()
title <- list()
author <- list()
ISSN <- list()
subject <- list()

# Get information for articles in the specific journal
while (downloadDone != TRUE) {
  wasWarning <- FALSE
  wasError <- FALSE
  
  # Get information for articles from the publisher
  tryCatch({
    journals <- cr_works_(filter = c(publisher_name = "Springer Nature", type = "journal-article"),
                          cursor = cursor,
                          cursor_max = 3000,
                          limit = numLimit)
  }, warning = function(w) {
    print(w)
    wasWarning <- TRUE
  }, error = function(e) {
    print(e)
    wasError <- TRUE
  }, finally = {
    if ((wasWarning != TRUE) & (wasError != TRUE)) {
      for (i in 1:length(journals)) {
        temp <- fromJSON(journals[[i]])
        temp <- temp$message$items
        
        assign("publisher", append(publisher, temp$publisher), .GlobalEnv)
        assign("volume", append(volume, temp$volume), .GlobalEnv)
        assign("issue", append(issue, temp$issue), .GlobalEnv)
        assign("page", append(page, temp$page), .GlobalEnv)
        assign("journal_name", append(journal_name, temp$'container-title'), .GlobalEnv)
        assign("DOI", append(DOI, temp$DOI), .GlobalEnv)
        assign("title", append(title, temp$title), .GlobalEnv)
        assign("author", append(author, temp$author), .GlobalEnv)
        assign("ISSN", append(ISSN, temp$ISSN), .GlobalEnv)
        assign("subject", append(subject, temp$subject), .GlobalEnv)
        
        # Check if we finished getting all our article data
        if (length(title) >= totNumFound) {
          downloadDone <- TRUE
        } else {
          assign("cursor", fromJSON(journals[[length(journals)]])$message$'next-cursor', .GlobalEnv)
        }
      }
    }
  })
  # Progress bar
  pb <- txtProgressBar(min = 0, max = totNumFound, style = 3)
  setTxtProgressBar(pb,length(title))
}
close(pb)

@sckott
Copy link
Contributor Author

sckott commented Apr 1, 2017

that appears to be a warning, not an error, which means code still proceeds.

@legg0028
Copy link

legg0028 commented Apr 3, 2017

@sckott The code has been running for 24hours and it looks like the error has not occurred again. Issue should be fixed but I will let you know if anything does go wrong. Thanks for all your help

@sckott
Copy link
Contributor Author

sckott commented Apr 4, 2017

@legg0028 did it work this time?

@legg0028
Copy link

legg0028 commented Apr 6, 2017

@sckott I have not gotten an error but R crashes at about 50% (with no error returned). The .csv I am exporting is above 3 million lines long so maybe that has something to do with it. I will continue testing and get back to you.

@sckott
Copy link
Contributor Author

sckott commented Apr 6, 2017

if that's the case you may want to write (append) to a file as the loop runs so even on fail you don't lose any data

@legg0028
Copy link

legg0028 commented Apr 6, 2017

@sckott yea that is exactly what I am doing. Appending to the .csv every 5000 results and saving the cursor value (for deep paging) in a .txt file. That way if it crashes or whatever I just run it again and it picks up from where it left off

@legg0028
Copy link

legg0028 commented Apr 7, 2017

Looks like the issue is sorted, ran the code through with no issue

Thanks for all your help

@sckott sckott added this to the v0.7 milestone Apr 7, 2017
@sckott sckott closed this as completed Apr 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants