check up on missing content-type header #127

sckott · 2016-12-15T15:24:50Z

sckott · 2017-02-14T20:48:38Z

not fixed yet

legg0028 · 2017-03-31T03:49:37Z

Is there a way to work around this? I am getting the error:
Error in if (x$headers$content-type == "text/plain") { : argument is of length zero

I can't find a way to work around this error and it has really stopped my project from moving forward

sckott · 2017-03-31T06:21:04Z

@legg0028 Can you show me the code you used that generated that error?

legg0028 · 2017-03-31T07:56:31Z

@sckott any help is appreciated :) Try to ignore my basic coding skills haha

# Get the total number of articles (for a publisher) found
publisher_meta <- cr_works(filter = c(publisher_name = "Springer Nature",
                                      type = "journal-article"),
                     limit = 0,)

totNumFound <- publisher_meta$meta$total_results
numLimit <- 1000 #How many articles to attempt to get information for in each loop, 1000 is the max
cursor = "*"

downloadDone <- FALSE

# Get information for articles in the specific journal
while (downloadDone != TRUE) {
  # Get information for articles from the publisher
  journals <- cr_works_(filter = c(publisher_name = "Springer Nature", type = "journal-article"),
                        cursor = cursor,
                        cursor_max = 3000,
                        limit = numLimit)
  for (i in 1:length(journals)) {
    temp <- fromJSON(journals[[i]])
    temp <- temp$message$items
    
    # Check if we finished getting all our article data
    if (length(title) >= totNumFound) {
      downloadDone <- TRUE
    } else {
      assign("cursor", fromJSON(journals[[length(journals)]])$message$'next-cursor', .GlobalEnv)
    }
  }
}

sckott · 2017-03-31T15:09:36Z

thx, but title is undefined - and where does the error occur exactly in the code above. the call to cr_works at the top or the call to cr_works_

legg0028 · 2017-03-31T23:51:22Z

@sckott sorry I tried to trim the code for simplicity. The error occurs at the 'cr_works_', within the loop. The thing is, this code can run for hours before the error even occurs. That is, the 'while' loop can iterate many times before actually producing an error. Here is the unamended code, with 'title' defined.

# ----------------------------------------------------------------------------------
# ----- Get a list of all ISSNs from a publisher -----------------------------------
# ----------------------------------------------------------------------------------
# Get the total number of articles (for a publisher) found
publisher_meta <- cr_works(filter = c(publisher_name = "Springer Nature",
                                      type = "journal-article"),
                     limit = 0)

totNumFound <- publisher_meta$meta$total_results
numLimit <- 1000 #How many articles to attempt to get information for in each loop, 1000 is the max
cursor = "*"

downloadDone <- FALSE

# Set up the lists that we will populate
publisher <- list()
volume <- list()
issue <- list()
page <- list()
journal_name <- list()
DOI <- list()
title <- list()
author <- list()
ISSN <- list()
subject <- list()

# Get information for articles in the specific journal
while (downloadDone != TRUE) {
  # Get information for articles from the publisher
  journals <- cr_works_(filter = c(publisher_name = "Springer Nature", type = "journal-article"),
                        cursor = cursor,
                        cursor_max = 3000,
                        limit = numLimit)
  for (i in 1:length(journals)) {
    temp <- fromJSON(journals[[i]])
    temp <- temp$message$items
    
    assign("publisher", append(publisher, temp$publisher), .GlobalEnv)
    assign("volume", append(volume, temp$volume), .GlobalEnv)
    assign("issue", append(issue, temp$issue), .GlobalEnv)
    assign("page", append(page, temp$page), .GlobalEnv)
    assign("journal_name", append(journal_name, temp$'container-title'), .GlobalEnv)
    assign("DOI", append(DOI, temp$DOI), .GlobalEnv)
    assign("title", append(title, temp$title), .GlobalEnv)
    assign("author", append(author, temp$author), .GlobalEnv)
    assign("ISSN", append(ISSN, temp$ISSN), .GlobalEnv)
    assign("subject", append(subject, temp$subject), .GlobalEnv)
    
    # Check if we finished getting all our article data
    if (length(title) >= totNumFound) {
      downloadDone <- TRUE
    } else {
      assign("cursor", fromJSON(journals[[length(journals)]])$message$'next-cursor', .GlobalEnv)
    }
  }
}

# Consolodate all info in to one data.frame
articleData <- as.data.frame(cbind(publisher = as.character(publisher),
                                   volume = as.character(volume),
                                   issue = as.character(issue),
                                   page = as.character(page),
                                   journal_name = as.character(journal_name),
                                   DOI = as.character(DOI),
                                   title = as.character(title),
                                   author = as.character(author),
                                   ISSN = as.character(ISSN),
                                   subject = as.character(subject)),
                             stringsAsFactors = FALSE)

first check for missing content-type header - and pass if not there also bump license year to 2017 update all man files for new roxyge2n, and require that new version

sckott · 2017-04-01T09:54:37Z

@legg0028 try again after reinstalling. I wasn't able to reproduce the error - I didn't run the entire thing, as I think I know what was wrong, so pushed a change that should fix it - let me know if you can verify that you don't get error any longer

legg0028 · 2017-04-01T12:03:17Z

@sckott still testing the code. I have the cr_works_ function in a tryCatch function now so I should get all errors that occur. Only 4% done so far but I did get this error:
simpleWarning: 500: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://mds4:8983/solr/crmds1: parsing error - (works)

The updated code I am using is as follows:

# ----------------------------------------------------------------------------------
# ----- Get a list of all ISSNs from a publisher -----------------------------------
# ----------------------------------------------------------------------------------
# Get the total number of articles (for a publisher) found
publisher_meta <- cr_works(filter = c(publisher_name = "Springer Nature",
                                      type = "journal-article"),
                     limit = 0)

totNumFound <- publisher_meta$meta$total_results
numLimit <- 1000 #How many articles to attempt to get information for in each loop, 1000 is the max
cursor = "*"

downloadDone <- FALSE

# Set up the lists that we will populate
publisher <- list()
volume <- list()
issue <- list()
page <- list()
journal_name <- list()
DOI <- list()
title <- list()
author <- list()
ISSN <- list()
subject <- list()

# Get information for articles in the specific journal
while (downloadDone != TRUE) {
  wasWarning <- FALSE
  wasError <- FALSE
  
  # Get information for articles from the publisher
  tryCatch({
    journals <- cr_works_(filter = c(publisher_name = "Springer Nature", type = "journal-article"),
                          cursor = cursor,
                          cursor_max = 3000,
                          limit = numLimit)
  }, warning = function(w) {
    print(w)
    wasWarning <- TRUE
  }, error = function(e) {
    print(e)
    wasError <- TRUE
  }, finally = {
    if ((wasWarning != TRUE) & (wasError != TRUE)) {
      for (i in 1:length(journals)) {
        temp <- fromJSON(journals[[i]])
        temp <- temp$message$items
        
        assign("publisher", append(publisher, temp$publisher), .GlobalEnv)
        assign("volume", append(volume, temp$volume), .GlobalEnv)
        assign("issue", append(issue, temp$issue), .GlobalEnv)
        assign("page", append(page, temp$page), .GlobalEnv)
        assign("journal_name", append(journal_name, temp$'container-title'), .GlobalEnv)
        assign("DOI", append(DOI, temp$DOI), .GlobalEnv)
        assign("title", append(title, temp$title), .GlobalEnv)
        assign("author", append(author, temp$author), .GlobalEnv)
        assign("ISSN", append(ISSN, temp$ISSN), .GlobalEnv)
        assign("subject", append(subject, temp$subject), .GlobalEnv)
        
        # Check if we finished getting all our article data
        if (length(title) >= totNumFound) {
          downloadDone <- TRUE
        } else {
          assign("cursor", fromJSON(journals[[length(journals)]])$message$'next-cursor', .GlobalEnv)
        }
      }
    }
  })
  # Progress bar
  pb <- txtProgressBar(min = 0, max = totNumFound, style = 3)
  setTxtProgressBar(pb,length(title))
}
close(pb)

sckott · 2017-04-01T12:15:46Z

that appears to be a warning, not an error, which means code still proceeds.

legg0028 · 2017-04-03T04:05:48Z

@sckott The code has been running for 24hours and it looks like the error has not occurred again. Issue should be fixed but I will let you know if anything does go wrong. Thanks for all your help

sckott · 2017-04-04T18:16:19Z

@legg0028 did it work this time?

legg0028 · 2017-04-06T00:00:23Z

@sckott I have not gotten an error but R crashes at about 50% (with no error returned). The .csv I am exporting is above 3 million lines long so maybe that has something to do with it. I will continue testing and get back to you.

sckott · 2017-04-06T01:17:07Z

if that's the case you may want to write (append) to a file as the loop runs so even on fail you don't lose any data

legg0028 · 2017-04-06T01:23:01Z

@sckott yea that is exactly what I am doing. Appending to the .csv every 5000 results and saving the cursor value (for deep paging) in a .txt file. That way if it crashes or whatever I just run it again and it picks up from where it left off

legg0028 · 2017-04-07T01:09:29Z

Looks like the issue is sorted, ran the code through with no issue

Thanks for all your help

sckott added a commit that referenced this issue Apr 1, 2017

#127 - change http error response checker

7e9b03a

first check for missing content-type header - and pass if not there also bump license year to 2017 update all man files for new roxyge2n, and require that new version

sckott added this to the v0.7 milestone Apr 7, 2017

sckott closed this as completed Apr 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

check up on missing content-type header #127

check up on missing content-type header #127

sckott commented Dec 15, 2016

sckott commented Feb 14, 2017

legg0028 commented Mar 31, 2017

sckott commented Mar 31, 2017

legg0028 commented Mar 31, 2017 •

edited

Loading

sckott commented Mar 31, 2017

legg0028 commented Mar 31, 2017 •

edited

Loading

sckott commented Apr 1, 2017

legg0028 commented Apr 1, 2017 •

edited

Loading

sckott commented Apr 1, 2017

legg0028 commented Apr 3, 2017

sckott commented Apr 4, 2017

legg0028 commented Apr 6, 2017

sckott commented Apr 6, 2017

legg0028 commented Apr 6, 2017

legg0028 commented Apr 7, 2017

check up on missing content-type header #127

check up on missing content-type header #127

Comments

sckott commented Dec 15, 2016

sckott commented Feb 14, 2017

legg0028 commented Mar 31, 2017

sckott commented Mar 31, 2017

legg0028 commented Mar 31, 2017 • edited Loading

sckott commented Mar 31, 2017

legg0028 commented Mar 31, 2017 • edited Loading

sckott commented Apr 1, 2017

legg0028 commented Apr 1, 2017 • edited Loading

sckott commented Apr 1, 2017

legg0028 commented Apr 3, 2017

sckott commented Apr 4, 2017

legg0028 commented Apr 6, 2017

sckott commented Apr 6, 2017

legg0028 commented Apr 6, 2017

legg0028 commented Apr 7, 2017

legg0028 commented Mar 31, 2017 •

edited

Loading

legg0028 commented Mar 31, 2017 •

edited

Loading

legg0028 commented Apr 1, 2017 •

edited

Loading