Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump 0 did not finish counting rows exactly where jump 1 found its first good line start #2561

Closed
st-pasha opened this issue Jan 10, 2018 · 0 comments
Milestone

Comments

@st-pasha
Copy link
Contributor

> fread("~/Downloads/issue682.txt", verbose=T)
Input contains no \n. Taking this to be a filename to open
[01] Check arguments
  Using 8 threads (omp_get_max_threads()=8, nth=8)
  NAstrings = [<<NA>>]
  None of the NAstrings look like numbers.
  show progress = 1
  0/1 column will be read as boolean
[02] Opening the file
  Opening file ../datatable/issue682.txt
  File opened, size = 2.028MB (2126138 bytes).
  Memory mapped ok
[03] Detect and skip BOM
[04] Arrange mmap to be \0 terminated
  File ends abruptly with 'e'. This should be fine but if a problem does occur, please report that problem as a bug and workaround it by appending a newline to properly end the last record; e.g. 'echo >> ../datatable/issue682.txt'.
  \n has been found in the data so any mixture of line endings is allowed other than \r-only line endings. This is common and ideal.
[05] Skipping initial rows if needed
  Positioned on line 1 starting: <<x0>>
[06] Detect separator, quoting rule, and ncolumns
  Detecting sep ...
  Detected 1 columns on line 1. This line is either column names or first data row. Line starts as: <<x0>>
  Quote rule picked = 0
  fill=false and the most number of columns found is 1
[07] Detect column types, good nrow estimate and whether first row is column names
  Number of sampling jump points = 101 because (2126137 bytes from row 1 to eof) / (2 * 10145 jump0size) == 104
  Type codes (jump 000)    : A  Quote rule 0
  Type codes (jump 100)    : A  Quote rule 0
  'header' determined to be true because all columns are type string and a better guess is not possible
  =====
  Sampled 10166 rows (handled \n inside quoted fields) at 101 jump points
  Bytes from first data row on line 1 to the end of last row: 2126137
  Line length: mean=100.80 sd=100.71 min=2 max=878
  Estimated number of rows: 2126137 / 100.80 = 21093
  Initial alloc = 42186 rows (21093 + 100%) using bytes/max(mean-2*sd,min) clamped between [1.1*estn, 2.0*estn]
  =====
[08] Assign column names
[09] Apply user overrides on column types
  After 0 type and 0 drop user overrides : A
[10] Allocate memory for the datatable
  Allocating 1 column slots (1 - 0 dropped) with 42186 rows
[11] Read the data
  jumps=[0..2), chunk_size=1063068, total_size=2126134
Error in fread("~/Downloads/issue682.txt", verbose = T) : 
  Jump 0 did not finish counting rows exactly where jump 1 found its first good line start: prevEnd(0x10af4f902)<<fYlYVw4P03rJSEQCY8NYkAh8di7R9sn6kgrHD8GO>> != thisStart(prevEnd+42)<<kvFLaVmWeVKfnZLJNg32mhl65IiYSD05IHzLy1FANXnDGtuIbm>>

issue682.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant