Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alpha.10 and beta.15 #275

Merged
merged 16 commits into from
Nov 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 11 additions & 1 deletion .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 3.0.0-beta.14
current_version = 3.0.0-beta.15
commit = True
tag = False
parse = (?P<major>\d+)\.(?P<minor>\d+)\.(?P<patch>\d+)\-(?P<release>\w+).(?P<num>\d+)
Expand Down Expand Up @@ -37,3 +37,13 @@ replace = "version": "{new_version}"
[bumpversion:file:tree-sitter-usfm3/package-lock.json]
search = "version": "{current_version}"
replace = "version": "{new_version}"

[bumpversion:file:node-usfm-parser/package.json]

[bumpversion:file:web-usfm-parser/package.json]
search = "version": "{current_version}"
replace = "version": "{new_version}"

[bumpversion:file:web-usfm-parser/README.md]
search = npm/usfm-grammar-web@{current_version}
replace = npm/usfm-grammar-web@{new_version}
12 changes: 9 additions & 3 deletions .github/workflows/check-on-push.yml
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ jobs:
uses: actions/setup-node@v4
with:
node-version: 20
- name: Run tests
- name: Build grammar
run: |
cd tree-sitter-usfm3
npm install --save nan
Expand All @@ -93,8 +93,12 @@ jobs:
- name: Install dependencies
run: |
cd node-usfm-parser
sed -i '/"tree-sitter-usfm3":.*/d' package.json
npm install .
npm install ../tree-sitter-usfm3
- name: Run tests
run: |
cd node-usfm-parser
node_modules/mocha/bin/mocha.js --timeout=40000 --grep "Include|Exclude|wild|Compare" --invert

Run-Web-tests:
Expand All @@ -107,7 +111,7 @@ jobs:
uses: actions/setup-node@v4
with:
node-version: 20
- name: Run tests
- name: Build grammar
run: |
cd tree-sitter-usfm3
npm install --save nan
Expand All @@ -122,7 +126,9 @@ jobs:
cp node_modules/web-tree-sitter/tree-sitter.js src/web-tree-sitter/
cp node_modules/web-tree-sitter/tree-sitter.wasm ./
cp ../tree-sitter-usfm3/tree-sitter-usfm3.wasm ./tree-sitter-usfm.wasm

- name: Run tests
run: |
cd web-usfm-parser
node_modules/mocha/bin/mocha.js --timeout=40000 --grep "Include|Exclude|wild|Compare" --invert


Expand Down
48 changes: 48 additions & 0 deletions .github/workflows/npm-publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,3 +49,51 @@ jobs:
env:
NODE_AUTH_TOKEN: ${{secrets.npm_token}}

Publish-node-usfm-grammar:
needs: Test-grammar
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-node@v1
with:
node-version: 20.4
registry-url: https://registry.npmjs.org/
- run: |
cd node-usfm-parser
sed -i '/"tree-sitter-usfm3":.*/d' package.json
npm install .
npm run build
npm publish .
env:
NODE_AUTH_TOKEN: ${{secrets.npm_token}}

Publish-web-usfm-grammar:
needs: Test-grammar
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-node@v1
with:
node-version: 20.4
registry-url: https://registry.npmjs.org/
- name: Build Grammar
run: |
cd tree-sitter-usfm3
npm install --save nan
npm install --save-dev tree-sitter-cli
./node_modules/.bin/tree-sitter generate
./node_modules/.bin/tree-sitter build --wasm
cp tree-sitter-usfm3.wasm ../web-usfm-parser/tree-sitter-usfm.wasm
- name: Install dependencies
run: |
cd web-usfm-parser/
npm install .
cp ./node_modules/web-tree-sitter/tree-sitter.js src/web-tree-sitter/
cp ./node_modules/web-tree-sitter/tree-sitter.wasm ./
- name: Build and publish
run: |
cd web-usfm-parser/
npm run build
npm publish .
env:
NODE_AUTH_TOKEN: ${{secrets.npm_token}}
27 changes: 26 additions & 1 deletion docs/Dev_notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,29 @@ pytest -k "not compare_usx_with_testsuite_samples and not testsuite_usx_with_rnc

```

In node module:

```bash
cd node-usfm-parser
npm run test

# to run selectively
node_modules/mocha/bin/mocha.js --timeout 40000 --grep "Compare" --bail
node_modules/mocha/bin/mocha.js --timeout 40000 test/basic.js
```

In web module:

```bash
cd web-usfm-parser
npm run test

# to run selectively
node_modules/mocha/bin/mocha.js --timeout 40000 --grep "Compare" --bail
node_modules/mocha/bin/mocha.js --timeout 40000 test/basic.js
```


## How to build and publish JS web module for local Development

First compile the grammar and get the wasm file
Expand All @@ -57,7 +80,7 @@ cd tree-sitter-usfm3
export PATH=$PATH:./node_modules/.bin
tree-sitter generate
tree-sitter build --wasm
cp tree-sitter-usfm.wasm ../web-usfm-parser/
cp tree-sitter-usfm3.wasm ../web-usfm-parser/tree-sitter-usfm.wasm
cd ..
```
After npm install, copy the `tree-sitter.js` file from `node_modules/web-tree-sitter` to the `js-usfm-parser/src/web-tree-sitter` folder to include it in the bundle. Also copy the `tree-sitter.wasm` file to `js-usfm-parser/` to be included in the npm packaging.
Expand Down Expand Up @@ -86,6 +109,8 @@ npm install -g verdaccio # need not do again if done once
verdaccio # runs a server at localhost:4873
touch .npmrc
echo "registry=http://localhost:4873 # OR http://0.0.0.0:4873" > .npmrc
rm -r .parcel-cache
npm run build
npm publish .
```

36 changes: 0 additions & 36 deletions docs/react-usage.md

This file was deleted.

130 changes: 116 additions & 14 deletions node-usfm-parser/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,19 +11,93 @@ npm install usfm-grammar
```

## Usage
Here's how you can use USFM Grammar in your JavaScript/TypeScript projects:

### Importing, parsing USFM, checking errors

```javascript
const {USFMParser} = require('usfm-grammar');

const USFM = '\\id GEN\n\\c 1\n\\p\n\\v 1 In the begining..\\v 2 some more text'
const usfmParser = new USFMParser(USFM);
console.log(usfmParser.errors)
```

### USJ
Here's how you can use USFM Grammar in your JavaScript projects to work with the JSON format, USJ:

```javascript
const {USFMParser} = require('usfm-grammar');

const USFM = '\\id GEN\n\\c 1\n\\p\n\\v 1 In the begining..\\v 2 some more text'
const USJ = usfmParser.toUSJ()
console.log(USJ);
const usfmParser = new USFMParser(USFM);

const USJ = usfmParser.toUSJ() // USFM to USJ
console.log(JSON.stringify(USJ, null, 2));

const usfmParser2 = new USFMParser(usfmString=null, fromUsj=USJ)
const usfmParser2 = new USFMParser(usfmString=null, fromUsj=USJ) // USJ to USFM
const usfmGen = usfmParser2.usfm;
console.log(usfmGen);
```
Working with USJ, also gives options like filtering selected markers, to edit the original USFM content. To understand more about how `exclude_markers`, `include_markers`, `combine_texts` and `Filter` works refer the section on [filtering on USJ](#filtering-on-usj)

### USX

To work with the XML format, USX:
```javascript
const {USFMParser} = require('usfm-grammar');
const { DOMImplementation, XMLSerializer } = require('xmldom');

const USFM = '\\id GEN\n\\c 1\n\\p\n\\v 1 In the begining..\\v 2 some more text'
const usfmParser = new USFMParser(USFM);

const usxElem = usfmParser.toUSX() // USFM to USX
const usxSerializer = new XMLSerializer();
const usx = usxSerializer.serializeToString(usxElem);

console.log(usx);

const usfmParser2 = new USFMParser(usfmString=null, fromUsj=null, fromUsx=usxElem) // USX to USFM
const usfmGen = usfmParser2.usfm;
console.log(usfmGen);
```

### Autofix and Validation
Experimental Validation and Autofix feature for USFM:
```javascript
const {Validator} = require("usfm-grammar");

const wrongUSFM="\\id GEN\n\\c 1\n\\v 1 test verse"
const checker = new Validator();
const resp = checker.isValidUSFM(wrongUSFM); // true or false
console.log(checker.message) // List of errors if present

const editedUSFM = checker.autoFixUSFM(wrongUSFM);
console.log(checker.message); // Report on autofix attempt

```

Validation of USJ format:
```javascript
const {Validator} = require("usfm-grammar");
const simpleUSJ = {
type: 'USJ',
version: '0.3.0',
content: [
{ type: 'book', marker: 'id', code: 'GEN', content: [] },
{ type: 'chapter', marker: 'c', number: '1', sid: 'GEN 1' },
{ type: 'para', marker: 'p', content: [
{type: 'verse', marker: 'v', number: 1 },
"In the begining..",
{type: 'verse', marker: 'v', number: 2 }
] }
]
}
const checker = new Validator();
console.log(checker.isValidUSJ(simpleUSJ));
console.log(checker.message);
```

### From ESM Project

When using in an ESModule, if `import {USFMParser} from 'usfm-grammar` doesnt work for you, you could try:
```javascript
Expand All @@ -33,25 +107,53 @@ const {USFMParser} = pkg;
...
```

## API Documentation
### Filtering on USJ
The filtering on USJ, the JSON output, is a feature incorporated to allow data extraction, markup cleaning etc. The arguments `exclude_markers` and `include_markers` in the methods `USFMParser.toUSJ()` makes this possible. Also the `USFMParser.toList()`, can accept these inputs and perform similar operations. There is CLI versions also for these arguments to replicate the filtering feature there.

- *excludeMarkers*

The first input parameter to `toUSJ()` and `toList` of `USFMParser` class. Defaults to `null`. When proivded, all markers except those listed will be included in the output.

- *includeMarkers*

The second input parameter to `toUSJ()` and `toList` of `USFMParser` class. Defaults to `null`. When proivded, only those markers listed will be included in the output. `includeMarkers` is applied before applying `excludeMarkers`.


### `USFMParser.toUSJ(): Object`
Converts a USFM string to a USJ object.
- *combineTexts*

- `usfmString`: The input USFM string.
Fourth input parameter to `toUSJ()` and `toList` of `USFMParser` class. Defaults to `true`. After filtering out makers like paragraphs and characters, we are left with texts from within them, if 'text-in-excluded-parent' is also not excluded. These text snippets may come as separate components in the contents list. When this option is `True`, the consequetive text snippets will be concatinated together. The text concatination is done in a puctuation and space aware manner. If users need more control over the space handling or for any other reason, would prefer the texts snippets as different components in the output, this can be set to `False`.

Returns: A JSON-like object representing the USJ.
- *usfm_grammar.Filter*

### `USFMParser.usjToUsfm(usjObject: Object): string`
Converts a USJ object to a USFM string.
This Class provides a set of enums that would be useful in providing in the `excludeMarkers` and `includeMarkers` inputs rather than users listing out individual markers. The class has following options
```
BOOK_HEADERS : identification and introduction markers
TITLES : section headings and associated markers
COMMENTS : comment markers like \rem
PARAGRAPHS : paragraph markers like \p, poetry markers, list table markers
CHARACTERS : all character level markups like \em, \w, \wj etc and their nested versions with +
NOTES : foot note, cross-reference and their content markers
STUDY_BIBLE : \esb and \cat
BCV : \id, \c and \v
TEXT : 'text-in-excluded-parent'
```
To inspect which are the markers in each of these options, it could be just printed out, `print(Filter.TITLES)`. These could be used individually or concatinated to get the desired filtering of markers and data:
```javascript
output = usfmParser.toUSJ(null, include_markers=Filter.BCV)
output = usfmParser.toUSJ(null, include_markers=Filter.BCV+Filter.TEXT)
output = usfmParser.toUSJ(exclude_markers=Filter.PARAGRAPHS+Filter.CHARACTERS)
```
- Inner contents of excluded markers

- `usjObject`: The input USJ object.
For markers like `\p` `\q` etc, by excluding them, we only remove them from the heirachy and retain the inner contents like `\v`, text etc that would be coming inside it. But for certain other markers like `\f`, `\x`, `\esb` etc, if they are excluded their inner contents are also excluded. Following is the set of all markers, who inner contents are discarded if they are mentioned in `excludeMarkers` or not included in `includeMarkers`.
```
BOOK_HEADERS, TITLES, COMMENTS, NOTES, STUDY_BIBLE
```
:warning: Generally, it is recommended to NOT use both `exclude_markers` and `includeMarkers` together as it could lead to unexpected behavours and data loss. For instance if `include_makers` has `\fk` and `excludeMarkers` has `\f`, the output will not contain `\fk` as all inner contents of `\f` will be discarded.

Returns: The converted USFM string.

## Contributing
Contributions are welcome! If you find any issues or have suggestions for improvements, feel free to open an issue or create a pull request on [GitHub](https://github.com/your-username/usfm-grammar).
Contributions are welcome! If you find any issues or have suggestions for improvements, feel free to open an issue or create a pull request on [GitHub](https://github.com/Bridgeconn/usfm-grammar).

## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
8 changes: 4 additions & 4 deletions node-usfm-parser/package.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "usfm-grammar",
"version": "3.0.0-alpha.9",
"description": "Parser using tree-sitter-usfm3, to convert usfm to usj format.",
"version": "3.0.0-beta.15",
"description": "Uses the tree-sitter-usfm3 parser to convert USFM files to other formats such as USJ, USX, and CSV, and converts them back to USFM",
"main": "./dist/cjs/index.cjs",
"module": "./dist/es/index.mjs",
"scripts": {
Expand All @@ -10,7 +10,7 @@
},
"repository": {
"type": "git",
"url": "https://github.com/Bridgeconn/usfm-grammar/js-usfm-parser"
"url": "https://github.com/Bridgeconn/usfm-grammar"
},
"keywords": [
"USFM",
Expand All @@ -28,7 +28,7 @@
"dependencies": {
"ajv": "^8.17.1",
"tree-sitter": "0.21.1",
"tree-sitter-usfm3": "3.0.0-beta.9",
"tree-sitter-usfm3": "3.0.0-beta.15",
"xmldom": "^0.6.0",
"xpath": "^0.0.34"
},
Expand Down
2 changes: 1 addition & 1 deletion node-usfm-parser/src/usfmParser.js
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,7 @@ Only one of USFM, USJ or USX is supported in one object.`)
if (!ignoreErrors && this.errors.length > 0) {
let errorString = this.errors.join("\n\t");
throw new Error(
`Errors present:\n\t${errorString}\nUse ignoreErrors = true to generate output despite errors.`,
`Errors present:\n\t${errorString}\nUse ignoreErrors = true, as third parameter of toUSJ(), to generate output despite errors.`,
);
}

Expand Down
Loading
Loading