Jump 0 did not finish counting rows exactly where jump 1 found its first good line start #2561

st-pasha · 2018-01-10T03:18:39Z

> fread("~/Downloads/issue682.txt", verbose=T)
Input contains no \n. Taking this to be a filename to open
[01] Check arguments
  Using 8 threads (omp_get_max_threads()=8, nth=8)
  NAstrings = [<<NA>>]
  None of the NAstrings look like numbers.
  show progress = 1
  0/1 column will be read as boolean
[02] Opening the file
  Opening file ../datatable/issue682.txt
  File opened, size = 2.028MB (2126138 bytes).
  Memory mapped ok
[03] Detect and skip BOM
[04] Arrange mmap to be \0 terminated
  File ends abruptly with 'e'. This should be fine but if a problem does occur, please report that problem as a bug and workaround it by appending a newline to properly end the last record; e.g. 'echo >> ../datatable/issue682.txt'.
  \n has been found in the data so any mixture of line endings is allowed other than \r-only line endings. This is common and ideal.
[05] Skipping initial rows if needed
  Positioned on line 1 starting: <<x0>>
[06] Detect separator, quoting rule, and ncolumns
  Detecting sep ...
  Detected 1 columns on line 1. This line is either column names or first data row. Line starts as: <<x0>>
  Quote rule picked = 0
  fill=false and the most number of columns found is 1
[07] Detect column types, good nrow estimate and whether first row is column names
  Number of sampling jump points = 101 because (2126137 bytes from row 1 to eof) / (2 * 10145 jump0size) == 104
  Type codes (jump 000)    : A  Quote rule 0
  Type codes (jump 100)    : A  Quote rule 0
  'header' determined to be true because all columns are type string and a better guess is not possible
  =====
  Sampled 10166 rows (handled \n inside quoted fields) at 101 jump points
  Bytes from first data row on line 1 to the end of last row: 2126137
  Line length: mean=100.80 sd=100.71 min=2 max=878
  Estimated number of rows: 2126137 / 100.80 = 21093
  Initial alloc = 42186 rows (21093 + 100%) using bytes/max(mean-2*sd,min) clamped between [1.1*estn, 2.0*estn]
  =====
[08] Assign column names
[09] Apply user overrides on column types
  After 0 type and 0 drop user overrides : A
[10] Allocate memory for the datatable
  Allocating 1 column slots (1 - 0 dropped) with 42186 rows
[11] Read the data
  jumps=[0..2), chunk_size=1063068, total_size=2126134
Error in fread("~/Downloads/issue682.txt", verbose = T) : 
  Jump 0 did not finish counting rows exactly where jump 1 found its first good line start: prevEnd(0x10af4f902)<<fYlYVw4P03rJSEQCY8NYkAh8di7R9sn6kgrHD8GO>> != thisStart(prevEnd+42)<<kvFLaVmWeVKfnZLJNg32mhl65IiYSD05IHzLy1FANXnDGtuIbm>>

issue682.txt

The text was updated successfully, but these errors were encountered:

st-pasha added bug fread labels Jan 10, 2018

st-pasha added this to the v1.10.6 milestone Jan 10, 2018

This was referenced Jan 10, 2018

Master task for fread bugs / proposals #2247

Closed

Jump 0 did not finish counting rows exactly where jump 1 found its first good line start h2oai/datatable#682

Closed

mattdowle mentioned this issue Feb 14, 2018

Better jump sync and run-on #2627

Merged

3 tasks

mattdowle added a commit that referenced this issue Feb 14, 2018

Added test for #2561

2ca20ae

mattdowle closed this as completed in #2627 Feb 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jump 0 did not finish counting rows exactly where jump 1 found its first good line start #2561

Jump 0 did not finish counting rows exactly where jump 1 found its first good line start #2561

st-pasha commented Jan 10, 2018

Jump 0 did not finish counting rows exactly where jump 1 found its first good line start #2561

Jump 0 did not finish counting rows exactly where jump 1 found its first good line start #2561

Comments

st-pasha commented Jan 10, 2018