Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lists only 16 or less files when a glob string including no globstar. #112

Closed
sttk opened this issue May 22, 2022 · 15 comments · Fixed by #119
Closed

Lists only 16 or less files when a glob string including no globstar. #112

sttk opened this issue May 22, 2022 · 15 comments · Fixed by #119
Assignees
Labels

Comments

@sttk
Copy link
Contributor

sttk commented May 22, 2022

What were you expecting to happen?

Lists more than 16 files in a directory.

What actually happened?

Lists only 16 files with a glob including no globstar.

Please give us a sample

$ npm init -y
$ npm install glob-stream
$ mkdir in out
$ for n in $(seq 100); do echo "hello" > in/test_${n}.txt; done
$ cat <<EOF > test.js
const { pipeline, Transform, Writable } = require('stream');                    
const GlobStream = require('glob-stream');
const fs = require('fs');
const gs = GlobStream('./in/*.txt');
let count = 0;
const ts = Transform({
  objectMode: true,
  transform: (f, enc, cb) => { ++count; cb(null, f.path + '\n'); },
});
const ws = fs.createWriteStream('./out/out.txt');
pipeline(gs, ts, ws, () => console.log('count =', count));
EOF
$

Terminal output / screenshots

$ node test.js     
count = 16
$ wc -l out/out.txt
      16 out/out.txt

Please provide the following information:

  • OS & version [e.g. MacOS Catalina 10.15.4]: macOS Monterey 12.4
  • node version (run node -v): v16.15.0, v10.23.0
  • npm version (run npm -v): 8.5.5
  • glob-stream version: 7.0.0

Additional information

When replaces the above glob string to './in/**/*.txt', the result was as follows:

$ node test.js     
count = 100
$ wc -l out/out.txt
     100 out/out.txt
@sttk sttk added the bug label May 22, 2022
@phated
Copy link
Member

phated commented May 22, 2022

@sttk if this is the full example, you've forgotten to create a Writeable stream, so it never sinks and you hit the highWaterMark

@phated
Copy link
Member

phated commented May 22, 2022

Nvm, I see the writeable stream is fs

@phated
Copy link
Member

phated commented May 22, 2022

However, I'm very confused because I was 99% sure that you aren't allowed to use an object stream with fs writeStream. It's a text based stream.

@sttk
Copy link
Contributor Author

sttk commented May 22, 2022

@phated This cause is that glob process multiple readdir only when a given glob string include globstar.

https://github.com/isaacs/node-glob/blob/3bfec21dd180ddf4672880176ad33af6296a167e/glob.js#L360

@sttk
Copy link
Contributor Author

sttk commented May 22, 2022

@phated

you aren't allowed to use an object stream with fs writeStream

In this Transform, I changed an object to a string.
In the above sample code, the Transform convert an object to a string.

const ts = Transform({
  objectMode: true,
  transform: (f, enc, cb) => { ++count; cb(null, f.path + '\n'); },
});

@sttk
Copy link
Contributor Author

sttk commented May 22, 2022

The following is the logs of isGlobStar here by the above sample code when a glob string is './in/*.txt':

$ node test.js
glob isGlobStar false
count = 16
$

And the logs when a glob string is './in/**/*.txt':

$ node test.js
glob isGlobStar true
glob isGlobStar false
glob isGlobStar false
glob isGlobStar true
glob isGlobStar false
...
glob isGlobStar false
glob isGlobStar true
count = 100
$

@sttk
Copy link
Contributor Author

sttk commented May 22, 2022

I noticed this issue when I run the test of vinyl-fs. This test failed.

@sttk
Copy link
Contributor Author

sttk commented May 22, 2022

https://github.com/isaacs/node-glob/blob/3bfec21dd180ddf4672880176ad33af6296a167e/glob.js#L412-L432

To skip this part lists all files even if a glob string includes no globstar.

@sttk
Copy link
Contributor Author

sttk commented May 28, 2022

The following code which uses GlobStream in readable.js directly works well.

$ cat <<EOF > test2.js
const { pipeline, Transform } = require('stream');
const fs = require('fs');
const GlobStream = require('glob-stream/readable');
const gs = GlobStream('./in/*.txt', []);
let count = 0;
const ts = Transform({
  objectMode: true,
  transform: (f, enc, cb) => { ++count; cb(null, f.path + '\n'); },
});
const ws = fs.createWriteStream('./out/out.txt');
pipeline(gs, ts, ws, () => console.log('count =', count));
EOF
$ node test2.js
count = 100

The following code which also uses pumpify and unique-stream like index.js works well.

$ cat <<EOF > test3.js
const { pipeline, Transform } = require('stream');
const fs = require('fs');
const pumpify = require('pumpify');
const GlobStream = require('glob-stream/readable');
const UniqueStream = require('unique-stream');
const gs = GlobStream('./in/*.txt', []);
const uniq = UniqueStream();
const pump = pumpify.obj(gs, uniq);
let count = 0;
const ts = Transform({
  objectMode: true,
  transform: (f, enc, cb) => { ++count; cb(null, f.path + '\n'); },
});
const ws = fs.createWriteStream('./out/out.txt');
pipeline(pump, ts, ws, () => console.log('count =', count));
EOF
$ node test3.js
count = 100

And the following code which also uses ordered-read-stream like index.js doesn't work well.

$ cat <<EOF > test4.js
const { pipeline, Transform } = require('stream');                              
const fs = require('fs');
const pumpify = require('pumpify');
const GlobStream = require('glob-stream/readable');
const UniqueStream = require('unique-stream');
const Combine = require('ordered-read-streams');
const gs = GlobStream('./in/*.txt', []);
const agg = new Combine(gs);
const uniq = UniqueStream();
const pump = pumpify.obj(agg, uniq);
let count = 0;
const ts = Transform({
  objectMode: true,
  transform: (f, enc, cb) => { ++count; cb(null, f.path + '\n'); },
});
const ws = fs.createWriteStream('./out/out.txt');
pipeline(pump, ts, ws, () => console.log('count =', count));
EOF
$ node test4.js 
count = 16

@sttk
Copy link
Contributor Author

sttk commented May 28, 2022

@phated I suspect that this issue is due to ordered-read-streams from the above results.

@sttk
Copy link
Contributor Author

sttk commented Aug 21, 2022

This issue is not due to ordered-read-streams.
The cause is seemed that glob-stream/readable uses highWaterMark as limit of read count.

$ cat <<EOF > test3.js
const fs = require('fs');
const { Readable, pipeline } = require('stream');
const Combine = require('ordered-read-streams');
let cnt = 1;
let max = 100;
let read_count = 0;
const rs = new Readable({
  highWaterMark: 20,
  objectMode: true,
  read(size) {
    read_count++;
    for (i = 0; i < size; i++, cnt++) {
      if (cnt > max) {
        this.push(null);
        break;
      }
      this.push(`${cnt}\n`);
    }
  },
});
const cs = new Combine([rs]);
let drain_count = 0;
const ws = fs.createWriteStream('out.txt', { highWaterMark: 10 });
ws.on('drain', () => {
  drain_count++;
});
ws.on('finish', () => {
  console.log(`read_count = ${read_count}`);
  console.log(`drain_count = ${drain_count}`);
});
pipeline(cs, ws); 
EOF
$ node test5.js 
read_count = 6
drain_count = 24
$ wc -l out.txt
     100 out.txt
$

@phated
Copy link
Member

phated commented Aug 22, 2022

The cause is seemed that glob-stream/readable uses highWaterMark as limit of read count.

That means you aren't draining the streams correctly.

@btakita
Copy link

btakita commented Aug 24, 2022

I'm having a similar issue when trying to create a simple writable stream with a listener to the 'data' event. Only 16 results are read before the process completes. Could you provide a simple example where a large directory is globbed, console.info is called for each data event while listening to all of the files that should be returned by the glob?

Here is an example where only the first 16 data events are called.

// 16 results
glob_stream(join(process.cwd(), '*'))
  .on('data', data=>console.info(data))
  .on('end', ()=>console.info('done'))

// 16 results
glob_stream(join(process.cwd(), '*')).pipe(new Writable({
  objectMode: true,
  write(chunk, _encoding, next) {
    console.debug(chunk)
    next()
  }
}))

I also tried using the mississippi library with .pipe, as you do in your tests & cannot figure out how to have the entire glob complete.

Even piping to process.stdout throws an error since this library streams objects. This seems very basic, but I can't find any examples of consuming streaming objects properly for any library. Hopefully this thread to serve posterity by resulting with a simple example of properly consuming an object stream.

// TypeError [ERR_INVALID_ARG_TYPE]: The "chunk" argument must be of type string or an instance of Buffer or Uint8Array. Received an instance of Object
glob_stream(join(dir_path, '*')).pipe(process.stdout)

@btakita

This comment was marked as off-topic.

@phated
Copy link
Member

phated commented Jan 4, 2023

I've added 2 tests for this at 790c0cb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants