Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Web: support other encodings than UTF-8 #79275

Closed
bpasero opened this issue Aug 16, 2019 · 40 comments
Closed

Web: support other encodings than UTF-8 #79275

bpasero opened this issue Aug 16, 2019 · 40 comments
Assignees
Labels
feature-request Request for new features or functionality file-encoding File encoding type issues insiders-released Patch has been released in VS Code Insiders on-release-notes Issue/pull request mentioned in release notes on-testplan web Issues related to running VSCode in the web
Milestone

Comments

@bpasero
Copy link
Member

bpasero commented Aug 16, 2019

Ideas: #79232

@bpasero bpasero added feature-request Request for new features or functionality web Issues related to running VSCode in the web labels Aug 16, 2019
@LeuisKen
Copy link
Contributor

LeuisKen commented Sep 5, 2019

Is there any plan for this issue ?

@bpasero
Copy link
Member Author

bpasero commented Apr 11, 2020

Webpack seems possible for iconv-lite but not without modifications because it seems they do not want to bundle the stream support into the browser (related ashtuchkin/iconv-lite#204). We would need:

  • to fork iconv-lite and make sufficient changes to include the streams support
  • export all streams that we actually use from node.js or rewrite our code to be node.js stream agnostic
  • possibly reduce the size of iconv-lite to only include the encodings we support (I measured ~1MB library size after webpack)

I used the following webpack.config.js:

const path = require('path');

module.exports = {
    entry: 'iconv-lite',
    devtool: 'inline-source-map',
    mode: 'production',
    output: {
        path: path.resolve(__dirname, 'lib'),
        filename: 'iconv-lite-umd.js',
        libraryTarget: 'umd',
        globalObject: 'typeof self !== \'undefined\' ? self : this'
    }
};

@ashtuchkin
Copy link

Hey, iconv-lite author here. The issue you mentioned should be fixed now (needs some more checking in other environments). As an additional benefit, the Streaming API should work in the browser if 'stream' module is included (or if enabled explicitly). Let me know if you have any questions about enabling it.

I'm trying my best to keep the size to a minimum; in my measurements, with streams, iconv-lite takes ~320Kb minified by webpack (I just used webpack cli with default params). I guess 1Mb is the source map stuff. Would that be small enough for your purposes?

@bpasero
Copy link
Member Author

bpasero commented May 18, 2020

@ashtuchkin thanks a lot for commenting here, I am planning to pick this up soon (coming months) and will report back to you how it goes 👍

@bpasero bpasero self-assigned this May 18, 2020
@bpasero bpasero modified the milestones: Backlog, On Deck May 18, 2020
@ashtuchkin
Copy link

Awesome, looking forward to it!

@gyzerok
Copy link
Contributor

gyzerok commented May 25, 2020

Hello @bpasero!

I am in desperate need of this feature, so I was thinking maybe it is a good opportunity for me to contribute. However currently I have zero knowledge about vscode codebase, so I decided it'd be good to verify it with you before diving in.

Would you be open to this being an outside contribution? Do you think adding this feature is doable for the first time contributor? Would it be possible for you to provide me some guidance as which files/places in the project this relates to?

@bpasero
Copy link
Member Author

bpasero commented May 25, 2020

Thanks for the offer, but not having fully understood yet how to approach this from the VSCode side makes this a bit harder for me to ask for help. I think there are a few steps towards the solution that maybe can be decoupled and worked on individually. I think we need a way to consume iconv-lite in UMD form to make it work both in browser and desktop similar to what we did with https://github.com/microsoft/semver-umd. E.g. have a NPM module we can consume that is the webpacked version of iconv-lite.

One thing not clear to me yet is how to get node.js streams into VSCode web, because we heavily depend on node.js streams for all encoding related things. I am not sold in having to use node.js streams so maybe an alternate solution is to convince iconv-lite to have a very simple API that different stream solutions can be plugged into?

The code for encoding handling lives in nativeTextFileService which only exists in desktop for now.

@ashtuchkin
Copy link

iconv-lite fully supports webpack (especially the 0.6 version I'm working on right now), so UMD package similar to semver-umd should work great. Before I release 0.6, please use master branch for now if you want to check it out.

As for the streams - iconv-lite internally uses a simplified stream interface for all codecs that is also compatible with Node's StringDecoder class (but doesn't depend on it). Basically the interface is the following:

interface Decoder {
  write(buf: Buffer): string;
  end(): string | undefined;
}

interface Encoder {
  write(str: string): Buffer;
  end(): Buffer | undefined;
}

// usage:
iconv.getDecoder(encoding): Decoder
iconv.getEncoder(encoding): Encoder

I'm wondering if that would help?

@gyzerok
Copy link
Contributor

gyzerok commented May 26, 2020

Awesome, thank you!

Looks like I have all the puzzle pieces. Will take a deeper look during the weekend. An opportunity to contribute to VSCode (which I have been happily using for 4 years) is very exciting :)

@bpasero
Copy link
Member Author

bpasero commented May 26, 2020

As for the streams - iconv-lite internally uses a simplified stream interface

@ashtuchkin does that mean I can pass in "something" that matches that interface and it will work? I would need this to be exposed from the TypeScript types of iconv-lite to work properly. I think today the type is still a node.js one:

https://github.com/ashtuchkin/iconv-lite/blob/de9c10bc9c424f49f451f34a9b09408f40b0aba7/lib/index.d.ts#L15-L17

If that was changed, maybe VSCode could use its own stream helpers and not the node.js ones.

@ashtuchkin
Copy link

ashtuchkin commented May 26, 2020

I can add these methods (getEncoder and getDecoder) to typescript definition, currently they are not there. I was treating them as non-public for the time being as they are not usually useful for Node.js. I did get requests to make them public before, so it probably makes sense to just do it.

The link you gave is for decodeStream/encodeStream - these are for Node.js streams, you're right. These are not what I was talking about.

does that mean I can pass in "something" that matches that interface and it will work?

It's the other way around: iconv-lite itself provides encoder/decoder objects with this interface and you can wrap it with your stream implementation/helpers to do the conversion.

@ashtuchkin
Copy link

As an example, I'm wrapping these Encoder/Decoder stream objects with Node.js Transformer class to create Node.js-compatible transformer streams here https://github.com/ashtuchkin/iconv-lite/blob/master/lib/streams.js#L27 (this.conv is the Encoder)

@ashtuchkin
Copy link

ashtuchkin commented May 26, 2020

Added methods to .d.ts in the latest master commit (ashtuchkin/iconv-lite@4c3eff9).

@gyzerok
Copy link
Contributor

gyzerok commented May 28, 2020

Hello @bpasero!

I've taken a bit closer look into the source code and am thinking that I would really like to try to contribute this feature. However I totally understand that it is not something small and you might want to do it internally inside VSCode team.

So I would kindly ask you to give me a chance here and champion me if it is at all possible. From my side I will try to split the work into smaller chunks to make it easier for you to review it and give your feedback. If at any point in time you will think that this collaboration isn't working for any reason, I'll step down.

Here is my initial plan of how to complete the task:

  1. Combine StringDecoder-like interface from iconv-lite with VSCode steam to decouple nativeTextFileService and encoding from node streams. This way I can better understand the whole process and test the whole story working on the code that should actually work already.
  2. Create UMD build of iconv-lite.
  3. Move encoding module from node to common.
  4. Reuse my knowledge and possibly code from 1. to make multiple encodings work on the web the same way it works on node.
  5. Possibly write some tests or migrate tests from node side (if any) to be used cross-platform.

How does this sound to you?

@bpasero
Copy link
Member Author

bpasero commented May 28, 2020

Yeah that sounds good to me. The first step of removing node.js streams for all encoding matters and using our own streams VSCode provides that are web supported is a good starting point.

Btw, we use another library jschardet to support "guessing" of encodings. You will probably see that during the process. I am not sure if that one can run in the web as well, I have not looked into it yet. But luckily the use of this library is limited to very few places so the adoption should be much easier compared to iconv-lite.

Timing wise we are entering endgame next week, I will probably not have a lot of time to look into this until the week after next week, but feel free to create a PR for the first step. I even think the second step can be done too. For that purpose though I would go and create a (Microsoft owned) GH repository to host the UMD version of iconv-lite. Does that make sense that MS publishes this module even though you contribute to it?

@gyzerok
Copy link
Contributor

gyzerok commented May 28, 2020

Timing wise we are entering endgame next week, I will probably not have a lot of time to look into this until the week after next week, but feel free to create a PR for the first step. I even think the second step can be done too.

Good luck with upcoming release then. Seeing this small dot above the gear always makes me happy! Let's see how far I can get with the first step by the time you are done with Thanos :)

Btw, we use another library jschardet to support "guessing" of encodings. You will probably see that during the process. I am not sure if that one can run in the web as well, I have not looked into it yet.

Hopefully it won't be a big deal. Will take a closer look during the process.

For that purpose though I would go and create a (Microsoft owned) GH repository to host the UMD version of iconv-lite. Does that make sense that MS publishes this module even though you contribute to it?

Sure, it absolutely makes sense.

@gyzerok
Copy link
Contributor

gyzerok commented May 28, 2020

@ashtuchkin do you think you can make exposing getEncoder/getDecoder types a patch release for 0.5? This will allow the first part of my work to be independent from 0.6 release timeline.

@bpasero
Copy link
Member Author

bpasero commented Jun 18, 2020

@gyzerok yeah makes sense, how would VSCode access this buffer shim? I think iconv-lite-umd should possibly export it as from its index.d.ts?

I want to be careful to only really use it for iconv-lite and not spread it around anywhere else in code, so it can only be used in encoding.ts ideally.

@bpasero
Copy link
Member Author

bpasero commented Jun 18, 2020

@gyzerok I think the best is to expose methods from the UMD Module that use Uint8Array and internally forward to iconv-lite as shimmed Buffer

@gyzerok
Copy link
Contributor

gyzerok commented Jun 18, 2020

@bpasero

yeah makes sense, how would VSCode access this buffer shim? I think iconv-lite-umd should possibly export it as from its index.d.ts?

Not sure what you are asking here.

I want to be careful to only really use it for iconv-lite and not spread it around anywhere else in code, so it can only be used in encoding.ts ideally.

From types perspective we do not expose any details of the fact that any shim is used, because only VSBuffer goes outside of encoding.ts.

@gyzerok
Copy link
Contributor

gyzerok commented Jun 18, 2020

@bpasero both Node.js Buffer and shim buffer extend Uint8Array. And VSBuffer internally refers to the thing it wraps as Uint8Array.

So even if some code will use vsBuffer.buffer it will think of it as Uint8Array which is true.

@ashtuchkin
Copy link

+1. Also for decoding, I'm 99% sure you can pass Uint8Array instead of Buffer and it'll work fine.

@bpasero
Copy link
Member Author

bpasero commented Jun 18, 2020

@gyzerok yeah anything really works for me for as long as it stays isolated to encoding.ts, please go ahead as you see fit and I can give feedback from a PR 👍 . I already commented in iconv-lite repo that you can feel free to use VSBuffer helpers if it helps. VSBuffer is our way of supporting both Buffer if available or fallback to Uint8Array.

bpasero added a commit that referenced this issue Jun 18, 2020
Co-authored-by: Benjamin Pasero <benjpas@microsoft.com>
bpasero added a commit that referenced this issue Jun 20, 2020
* move encoding.ts to common for #79275

* load iconv-lite-umd and jschardet in web version for #79275

* move EncodingOracle to the AbstractTextFileService for #79275

* review

* update to new iconv-lite-umd

* add workaround for jschardet types

* fix indentation

* add iconv-lite-umd and jschardet to workbench.html

* fix paths for modules

Co-authored-by: Benjamin Pasero <benjpas@microsoft.com>
@gyzerok
Copy link
Contributor

gyzerok commented Jun 22, 2020

@ashtuchkin while working further on the implementation I've discovered that there is one case, where Uint8Array is not accepted in browser environment. Here is the code which is responsible: https://github.com/ashtuchkin/iconv-lite/blob/v0.6.0/encodings/internal.js#L48-L59

Internally StringDecoder calls Buffer.toString(encoding) which doesn't work for Uint8Array. Simple test case:

const iconv = require('iconv-lite-umd');
iconv.decode(Uint8Array.from([...iconv.encode('Lorem ipsum', 'utf16le')]), 'utf16le');

The easy way to fix it would be to change InternalDecoder to be as follows:

function InternalDecoder(options, codec) {
    this.decoder = new StringDecoder(codec.enc);
}

InternalDecoder.prototype.write = function(buf) {
    if (Buffer.isBuffer(buf)) {
        return this.decoder.write(buf);
    }

    return this.decoder.write(Buffer.from(buf));
}

InternalDecoder.prototype.end = function() {
    return this.decoder.end();
}

Do you think this is an acceptable change for iconv-lite?

@bpasero bpasero modified the milestones: June 2020, July 2020 Jun 30, 2020
bpasero added a commit that referenced this issue Jul 3, 2020
…ce for #79275 (#100804)

* move encoding logic from NativeTextFileService to AbstractTextFileService for #79275

* some cleanup things - just cosmetic

* fix tests

* review

* use correct comparison

Co-authored-by: Benjamin Pasero <benjpas@microsoft.com>
@bpasero bpasero closed this as completed in 4e4941b Jul 4, 2020
@bpasero
Copy link
Member Author

bpasero commented Jul 4, 2020

Thanks @gyzerok for driving this from the VSCode side and @ashtuchkin for the iconv-lite side 👍

@gyzerok
Copy link
Contributor

gyzerok commented Jul 5, 2020

I would like to also thank everyone envolved:

  • @LeuisKen for reporting the original issue and inspiring me to give it a shot with his own contributions
  • @bpasero for lying down initial plans and helping me to get this done
  • @ashtuchkin for being so responsive on iconv-lite side and giving amazing advice on bundle size reduction

Love the open-source community! ❤️

gjsjohnmurray pushed a commit to gjsjohnmurray/vscode that referenced this issue Jul 6, 2020
…ce for microsoft#79275 (microsoft#100804)

* move encoding logic from NativeTextFileService to AbstractTextFileService for microsoft#79275

* some cleanup things - just cosmetic

* fix tests

* review

* use correct comparison

Co-authored-by: Benjamin Pasero <benjpas@microsoft.com>
gjsjohnmurray pushed a commit to gjsjohnmurray/vscode that referenced this issue Jul 6, 2020
gjsjohnmurray pushed a commit to gjsjohnmurray/vscode that referenced this issue Jul 6, 2020
* fix microsoft#79275 enable encodings for web

* fix one more place and prettify object

* 💄

Co-authored-by: Benjamin Pasero <benjpas@microsoft.com>
@bpasero bpasero added on-testplan on-release-notes Issue/pull request mentioned in release notes labels Jul 15, 2020
@github-actions github-actions bot locked and limited conversation to collaborators Aug 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature-request Request for new features or functionality file-encoding File encoding type issues insiders-released Patch has been released in VS Code Insiders on-release-notes Issue/pull request mentioned in release notes on-testplan web Issues related to running VSCode in the web
Projects
None yet
Development

No branches or pull requests

5 participants
@ashtuchkin @bpasero @gyzerok @LeuisKen and others