feat(import): multiple docx output files

adobe · Jul 6, 2022 · fa4610b · fa4610b
1 parent 780d0a3
commit fa4610b
Show file tree

Hide file tree

Showing 10 changed files with 2,246 additions and 2,243 deletions.
diff --git a/README.md b/README.md
@@ -20,9 +20,12 @@ In the `URL(s)` field, give a list of page URLs to be imported (e.g. {https://ww
 
 ### Transformation file
 
-A default html to Markdown is applied by you can / need to provide your own. Initially the import transformation file is fetched at http://localhost:3001/tools/importer/import.js (can be changed in the options). Create the file using the following template:
+A default html to Markdown is applied by you can / need to provide your own. Initially the import transformation file is fetched at http://localhost:3001/tools/importer/import.js (can be changed in the options). Create the file using the following templates:
 
-https://gist.github.com/kptdobe/8a726387ecca80dde2081b17b3e913f7
+- if you need to create a single md/docx file out from each input page, you can use this template: https://gist.github.com/kptdobe/8a726387ecca80dde2081b17b3e913f7
+- if you need to crate multiple files md/docx out from each input page, you must use this template: https://gist.github.com/kptdobe/7bf50b69194884171b12874fc5c74588
+
+Note that in the current state, the 2 templates are doing the exact same thing. But the second one uses the `transform` method and the return array contain more than one element. See guidelines for an example.
 
 ### Guidelines
 

diff --git a/css/import/import.css b/css/import/import.css
@@ -160,3 +160,11 @@
 .import #import-markdown-preview td {
   padding: 0 6px;
 }
+
+.import #import-file-picker-container {
+  width: 100%;
+}
+
+.import #import-file-picker-container sp-picker {
+  width: 100%;
+}
diff --git a/import.html b/import.html
@@ -80,6 +80,7 @@ <h3>Page preview</h3>
                             <sp-tab label="Preview" value="import-preview"></sp-tab>
                             <sp-tab label="Markdown" value="import-markdown"></sp-tab>
                             <sp-tab label="HTML" value="import-html"></sp-tab>
+                            <div id="import-file-picker-container"></div>
                             <sp-tab-panel value="import-preview">
                                 <sp-theme color="light" scale="medium">
                                     <div id="import-markdown-preview"></div>

diff --git a/importer-guidelines.md b/importer-guidelines.md
@@ -16,15 +16,33 @@ Out of the box, the importer should be able to consume any page and output a Mar
 
 Such a rule is very straight forward to implement: it is usually a set of DOM operations: create new, move or delete DOM elements.
 
-In your `import.js` transformation file, you can implement 2 methods:
+In your `import.js` transformation file, you can implement 2 modes: 
+- one input / one output
+- one input / multiple outputs
 
-- `transformDOM: ({ document, url, html }) => {}`: implement here your transformation rules and return the DOM element that needs to be transformed to Markdown (default is `document.body` but usually a `main` element is more relevant).
+#### one input / one output
+
+You must implement those 2 methods:
+
+- `transformDOM: ({ document, url, html, params }) => {}`: implement here your transformation rules and return the DOM element that needs to be transformed to Markdown (default is `document.body` but usually a `main` element is more relevant).
   - `document`: the incoming DOM
   - `url`: the current URL being imported
   - `html`: the original HTML source (when loading the DOM as a document, some things are cleaned up, having the raw original HTML is sometimes useful)
-- `generateDocumentPath: ({ document, url }) => {}`: return a path that describes the document being transformed - allows you to define / filter the page name and the folder structure in which the document should be stored (default is the current url pathname with the trailing slash and the `.html`)
+  - `params`: some params given by the importer. Only param so far is the `originalURL` which is the url of the page being imported (url is the one to the proxy)
+- `generateDocumentPath: ({ document, url, html, params }) => {}`: return a path that describes the document being transformed - allows you to define / filter the page name and the folder structure in which the document should be stored (default is the current url pathname with the trailing slash and the `.html`). Params are the same than above.
+
+This is simpler version of the implementation. You can achieve the same by implementing the `transform` method as describe below.
+
+#### one input / multiple outputsw
+
+You must implement this method:
+- `transform: ({ document, url, html, params }) => {}`: implement here your transformation rules and return an array of pairs `{ element, path }` where element is a DOM DOM element that needs to be transformed to Markdown and path is the path to the exported file.
   - `document`: the incoming DOM
   - `url`: the current URL being imported
+  - `html`: the original HTML source (when loading the DOM as a document, some things are cleaned up, having the raw original HTML is sometimes useful)
+  - `params`: some params given by the importer. Only param so far is the `originalURL` which is the url of the page being imported (url is the one to the proxy)
+
+The idea is simple: return a list of elements that will be converted to docx and stored at the path location.
 
 ## Rule examples
 
@@ -241,6 +259,68 @@ Output is then:
 # Hello World
 ![](https://www.sample.com/images/helloworld.png);
 ```
+
+### Mutiple output
+
+If you need to transform one page into multiple Word documents (fragments, banners, author pages...), you can use the `transform` method.
+
+Input DOM:
+
+```html
+<html>
+  <head></head>
+  <body>
+    <main>
+      <h1>Hello World</h1>
+      <div class="hero" style="background-image: url(https://www.sample.com/images/helloworld.png);"></div>
+    </main>
+  </body>
+</html>
+```
+
+With the following `import.js`, you will get 2 md / docx documents:
+
+```js
+{
+  transform: ({ document, params }) => {
+    const main = document.querySelector('main');
+    // keep a reference to the image
+    const image = main.querySelector('.hero')
+
+    //remove the image from the main, otherwise we'll get it in the 2 documents
+    WebImporter.DOMUtils.remove(main, [
+      '.hero',
+    ]);
+
+    return [{
+      element: main,
+      path: '/main',
+    }, {
+      element: image,
+      path: '/image',
+    }];
+  },
+}
+```
+
+Outputs are:
+
+`/main.md`
+
+```md
+# Hello World
+```
+
+`/image.md`
+
+```md
+![](https://www.sample.com/images/helloworld.png);
+```
+
+Note:
+- be careful with the DOM elements you are working with. You always work on the same document thus you may destruct elements for one output which may have an inpact on the other outputs.
+- you may have as many outputs as you want (limit not tested yet).
+
 ### More samples
 
 Sites in the https://github.com/hlxsites/ organisation have all be imported. There are many different implementation cover a lot of use cases.

diff --git a/js/import/import.ui.js b/js/import/import.ui.js
@@ -12,6 +12,7 @@
 /* global CodeMirror, showdown, html_beautify, ExcelJS */
 import { initOptionFields, attachOptionFieldsListeners } from '../shared/fields.js';
 import { getDirectoryHandle, saveFile } from '../shared/filesystem.js';
+import { asyncForEach } from '../shared/utils.js';
 import PollImporter from '../shared/pollimporter.js';
 import alert from '../shared/alert.js';
 
@@ -38,6 +39,8 @@ const IS_BULK = document.querySelector('.import-bulk') !== null;
 const BULK_URLS_HEADING = document.querySelector('#import-result h2');
 const BULK_URLS_LIST = document.querySelector('#import-result ul');
 
+const IMPORT_FILE_PICKER_CONTAINER = document.getElementById('import-file-picker-container');
+
 const ui = {};
 const config = {};
 const importStatus = {
@@ -68,20 +71,48 @@ const setupUI = () => {
   ui.markdownPreview.innerHTML = ui.showdownConverter.makeHtml('Run an import to see some markdown.');
 };
 
-const updateImporterUI = (out) => {
-  const { md, html: outputHTML, originalURL } = out;
+const loadResult = ({ md, html: outputHTML }) => {
+  ui.transformedEditor.setValue(html_beautify(outputHTML));
+  ui.markdownEditor.setValue(md || '');
+
+  const mdPreview = ui.showdownConverter.makeHtml(md);
+  ui.markdownPreview.innerHTML = mdPreview;
+
+  // remove existing classes and styles
+  Array.from(ui.markdownPreview.querySelectorAll('[class], [style]')).forEach((t) => {
+    t.removeAttribute('class');
+    t.removeAttribute('style');
+  });
+};
+
+const updateImporterUI = (results, originalURL) => {
   if (!IS_BULK) {
-    ui.transformedEditor.setValue(html_beautify(outputHTML));
-    ui.markdownEditor.setValue(md || '');
+    IMPORT_FILE_PICKER_CONTAINER.innerHTML = '';
+    const picker = document.createElement('sp-picker');
+    picker.setAttribute('size', 'm');
+
+    results.forEach((result, index) => {
+      const { path } = result;
+
+      // add result to picker list
+      const item = document.createElement('sp-menu-item');
+      item.innerHTML = path;
+      if (index === 0) {
+        item.setAttribute('selected', true);
+        picker.setAttribute('label', path);
+        picker.setAttribute('value', path);
+      }
+      picker.appendChild(item);
+    });
 
-    const mdPreview = ui.showdownConverter.makeHtml(md);
-    ui.markdownPreview.innerHTML = mdPreview;
+    IMPORT_FILE_PICKER_CONTAINER.append(picker);
 
-    // remove existing classes and styles
-    Array.from(ui.markdownPreview.querySelectorAll('[class], [style]')).forEach((t) => {
-      t.removeAttribute('class');
-      t.removeAttribute('style');
+    picker.addEventListener('change', (e) => {
+      const r = results.filter((i) => i.path === e.target.value)[0];
+      loadResult(r);
     });
+
+    loadResult(results[0]);
   } else {
     const li = document.createElement('li');
     const link = document.createElement('sp-link');
@@ -101,6 +132,12 @@ const clearResultPanel = () => {
   BULK_URLS_HEADING.innerText = 'Importing...';
 };
 
+const clearImportStatus = () => {
+  importStatus.imported = 0;
+  importStatus.total = 0;
+  importStatus.rows = [];
+};
+
 const disableProcessButtons = () => {
   IMPORT_BUTTON.disabled = true;
 };
@@ -127,6 +164,23 @@ const getProxyURLSetup = (url, origin) => {
   };
 };
 
+const postImportProcess = async (results, originalURL) => {
+  await asyncForEach(results, async ({ docx, filename, path }) => {
+    const data = {
+      status: 'Success',
+      url: originalURL,
+      path,
+    };
+
+    const includeDocx = !!docx;
+    if (includeDocx) {
+      await saveFile(dirHandle, filename, docx);
+      data.docx = filename;
+    }
+    importStatus.rows.push(data);
+  });
+};
+
 const createImporter = () => {
   config.importer = new PollImporter({
     origin: config.origin,
@@ -140,25 +194,14 @@ const getContentFrame = () => document.querySelector(`${PARENT_SELECTOR} iframe`
 const attachListeners = () => {
   attachOptionFieldsListeners(config.fields, PARENT_SELECTOR);
 
-  config.importer.addListener(async (out) => {
+  config.importer.addListener(async ({ results }) => {
     const frame = getContentFrame();
-    out.originalURL = frame.dataset.originalURL;
-    const includeDocx = !!out.docx;
+    const { originalURL } = frame.dataset;
 
-    updateImporterUI(out, includeDocx);
+    updateImporterUI(results, originalURL);
+    postImportProcess(results, originalURL);
 
-    const data = {
-      status: 'Success',
-      url: out.originalURL,
-      path: out.path,
-    };
-    if (includeDocx) {
-      const { docx, filename } = out;
-      await saveFile(dirHandle, filename, docx);
-      data.docx = filename;
-    }
-    importStatus.rows.push(data);
-    alert.success(`Import of page ${frame.dataset.originalURL} completed.`);
+    alert.success(`Import of page ${originalURL} completed.`);
   });
 
   config.importer.addErrorListener(({ url, error: err }) => {
@@ -168,6 +211,8 @@ const attachListeners = () => {
   });
 
   IMPORT_BUTTON.addEventListener('click', (async () => {
+    clearImportStatus();
+
     if (IS_BULK) {
       clearResultPanel();
       if (config.fields['import-show-preview']) {
@@ -196,9 +241,6 @@ const attachListeners = () => {
       }
     }
 
-    importStatus.imported = 0;
-    importStatus.rows = [];
-
     const field = IS_BULK ? 'import-urls' : 'import-url';
     const urlsArray = config.fields[field].split('\n').reverse().filter((u) => u.trim() !== '');
     importStatus.total = urlsArray.length;
@@ -242,14 +284,14 @@ const attachListeners = () => {
               const includeDocx = !!dirHandle;
 
               window.setTimeout(async () => {
-                const { originalURL } = frame.dataset;
-                const { replacedURL } = frame.dataset;
+                const { originalURL, replacedURL } = frame.dataset;
                 if (frame.contentDocument) {
                   try {
                     config.importer.setTransformationInput({
                       url: replacedURL,
                       document: frame.contentDocument,
                       includeDocx,
+                      params: { originalURL },
                     });
                     await config.importer.transform();
                   } catch (e) {

diff --git a/js/shared/pollimporter.js b/js/shared/pollimporter.js
@@ -73,46 +73,62 @@ export default class PollImporter {
   }
 
   async transform() {
+    const {
+      includeDocx, url, document, params,
+    } = this.transformation;
+
     try {
-      let out;
-      if (this.transformation.includeDocx) {
-        out = await WebImporter.html2docx(
-          this.transformation.url,
-          this.transformation.document,
+      let results;
+      if (includeDocx) {
+        const out = await WebImporter.html2docx(
+          url,
+          document,
           this.projectTransform,
+          params,
         );
 
-        const { path } = out;
-        out.filename = `${path}.docx`;
+        results = Array.isArray(out) ? out : [out];
+        results.forEach((result) => {
+          const { path } = result;
+          result.filename = `${path}.docx`;
+        });
       } else {
-        out = await WebImporter.html2md(
-          this.transformation.url,
-          this.transformation.document,
+        const out = await WebImporter.html2md(
+          url,
+          document,
           this.projectTransform,
+          params,
         );
+        results = Array.isArray(out) ? out : [out];
       }
 
       this.listeners.forEach((listener) => {
         listener({
-          ...out,
-          url: this.transformation.url,
+          results,
+          url,
         });
       });
     } catch (err) {
       this.errorListeners.forEach((listener) => {
         listener({
-          url: this.transformation.url,
+          url,
           error: err,
         });
       });
     }
   }
 
-  setTransformationInput({ url, document, includeDocx = false }) {
+  setTransformationInput({
+    url,
+    document,
+    includeDocx = false,
+    params,
+  }) {
     this.transformation = {
       url,
       document,
       includeDocx,
+      params,
     };
   }