Skip to content

Commit

Permalink
Merge branch 'master' into qurator-omni
Browse files Browse the repository at this point in the history
  • Loading branch information
nl0 authored Oct 10, 2024
2 parents 14f1a02 + 6590bf2 commit 07b7f6a
Show file tree
Hide file tree
Showing 13 changed files with 280 additions and 109 deletions.
7 changes: 7 additions & 0 deletions .markdownlint.jsonc
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"default": true,
"no-blanks-blockquote": false,
"no-duplicate-heading": {
"siblings_only": true
}
}
4 changes: 1 addition & 3 deletions .markdownlintignore
Original file line number Diff line number Diff line change
@@ -1,10 +1,8 @@
# Not ready for lint yet
gendocs
lambdas
py-shared
testdocs

# Autogenerated
# Autogenerated
docs/api-reference

.git
Expand Down
2 changes: 1 addition & 1 deletion catalog/app/containers/Admin/Buckets/Buckets.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -1187,7 +1187,7 @@ function TabulatorCard({
<Card
className={className}
disabled={disabled}
title="Tabulation (Longitudinal Querying)"
title="Tabulator (Longitudinal Querying)"
>
<TabulatorForm bucket={bucket} tables={tabulatorTables} />
</Card>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ const ConfigEditor = React.lazy(() =>

const defaultConfig = `schema:
- name: column1 # specify the schema
type: Utf8
type: STRING
source:
type: quilt-packages
package_name: "" # specify a RegEx for matching packages
Expand Down
4 changes: 0 additions & 4 deletions docs/.markdownlint.jsonc

This file was deleted.

311 changes: 234 additions & 77 deletions docs/CHANGELOG.md

Large diffs are not rendered by default.

11 changes: 7 additions & 4 deletions lambdas/indexer/README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,14 @@
<!-- markdownlint-configure-file {"line-length": {"code_blocks": false}} -->
# Notes on message formats

`indexer.py` consumes messages from an SQS Queue.
Each message contains one or more S3 events (these may be synthetic events
created by the bulk indexing process).

## Sample message
```
message: {

```python
{
'messageId': 'f4feb40f-16a5-47af-89dc-091bd0fae1e2',
'receiptHandle': 'AQEBv6rRxc4+CRSi3RWY64HqOIzu+dJEWnMCAwVgyogUBDY4a1fBoEp6mnx3qy5AO/A+qvTVRWq6lWS3D2iDc8pUGfj8BAJ2/G21/mA2OqDF8e0JdItwu+haRiFzsH87W+5HAwGjIi13Yltf1UjaZoBbrdX+jOlx2lbMTgJOgAzK6ZrHnYaJdTsY72izxAY+3zm4x7U4Cg79uGj6IezWNW+ZjlsEg20tkvexQXPr6AaTbJ0cei+IVueSTy5WUiBMjTgmKxvJEWoLr3BzUvy7uI1ECJx/6m2ya5+M0161ufyYMFqYljYFe2InV2G79fXdW2pYkHy0xnbMKLlQpmOkQyJWyyYV9J6i9MO9Qkp9l0gnyxykw9eOZ/9bn0iV5p+aoRwhkopS6e1jhx8HMtTAs30TM6Uw1TFU+vPAMPu6syIMABs=',
'body': '{"Message": "{\\"Records\\": [{\\"eventName\\": \\"ObjectCreated:Put\\", \\"s3\\": {\\"bucket\\": {\\"name\\": \\"quilt-search-test\\"}}},]}"}',
Expand All @@ -24,7 +27,8 @@ message: {
```

## Sample event
```

```python
{
'eventName': 'ObjectCreated:Put',
's3': {
Expand All @@ -40,4 +44,3 @@ message: {
}
}
```

5 changes: 3 additions & 2 deletions lambdas/indexer/test/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
# Rudimentary tests

## TODO

* [ ] run inside of Docker container that mimics Lambda env
* https://blog.quiltdata.com/an-easier-way-to-build-lambda-deployment-packages-with-docker-instead-of-ec2-9050cd486ba8
* https://nvbn.github.io/2015/09/09/pytest-docker/
* <https://blog.quiltdata.com/an-easier-way-to-build-lambda-deployment-packages-with-docker-instead-of-ec2-9050cd486ba8>
* <https://nvbn.github.io/2015/09/09/pytest-docker/>

* [ ] parameterize Python version (Lambda functions can be 2.7, 3.6, etc.)
* [ ] integrate withe rest of unit testing suite?
1 change: 1 addition & 0 deletions lambdas/indexer/test/data/extended/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
<!-- markdownlint-disable first-line-h1 -->
If you run `pytest --extended` the test harness will look for parquet, etc. files
in this folder and run a few conditional tests. This is useful for testing
one-off / customer issues
Expand Down
1 change: 1 addition & 0 deletions lambdas/s3hash/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
<!-- markdownlint-disable line-length -->
# Changelog

Changes are listed in reverse chronological order (newer entries at the top).
Expand Down
5 changes: 3 additions & 2 deletions lambdas/shared/tests/data/fcs/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
<!-- markdownlint-disable first-line-h1 -->
See notebook for original source of files

normal.fcs was Accuri - C6 - A01 H2O.fcs
meta-only.fcs was BD - FACS Aria II - Compensation Controls_G710 Stained Control.fcs
* normal.fcs was Accuri - C6 - A01 H2O.fcs
* meta-only.fcs was BD - FACS Aria II - Compensation Controls_G710 Stained Control.fcs
35 changes: 20 additions & 15 deletions lambdas/thumbnail/tests/data/pdf2image-README.md
Original file line number Diff line number Diff line change
@@ -1,63 +1,68 @@
<!-- markdownlint-configure-file {"line-length": {"code_blocks": false}} -->
# Developer notes

* How we get pdftoppm running with MS fonts
* On MacBook testing: jpeg faster than png, pdftoppm faster than pdftocairo

## Extracting pdftoppm, pdftocairo (via poppler-utils) as standalones
```

```sh
yum install yum-utils rpmdevtools
yumdownloader poppler-utils
yumdownloader poppler-utils

# Extract files from .rpm
rpmdev-extract *.rpm
```

Then you need to run the binaries by hand and discover which .o files are missing. These files
were found to be children of /usr/lib64 and when they are copied to the same, /usr/bin/pdftoppm works.
Then you need to run the binaries by hand and discover which .o files are
missing. These files were found to be children of /usr/lib64 and when they
are copied to the same, /usr/bin/pdftoppm works.

Consider setting [`LD_LIBRARY_PATH`](https://docs.aws.amazon.com/lambda/latest/dg/configuration-envvars.html#configuration-envvars-runtime)
to include /somedir/usr/lib64 so Linux can find the libs.


## Getting MS fonts

* [Installing cabextract on Amazon Linux](https://aws.amazon.com/premiumsupport/knowledge-center/ec2-enable-epel/)
* [Adding MS fonts to linux](http://mscorefonts2.sourceforge.net/)

Above step will add fonts to `cp -r /usr/share/fonts`

## Making the fonts discoverable via pdftoppm, etc.
## Making the fonts discoverable via pdftoppm, etc

* [fonts.conf](https://stackoverflow.com/questions/46486261/include-custom-fonts-in-aws-lambda)
* `fc-cache` (unix util)

```
```sh
export FONTCONFIG_PATH=/io/fonts/
```

## `pdftoppm`

* Page numbering starts at 1 (not 0)
* Providing size=INT to a convert function ensures largest dimension == INT
* first_page can be negative (lib rounds up to 1)
* last_page can be > than total pages (lib takes min)

No format (=ppm) faster than JPEG faster than PNG

``` (=ppm)(=ppm)
In [8]: %timeit imgs = pdf2image.convert_from_path("tests/data/MUMmer.pdf", fmt="png",size=(1024,768))
```Python console
In [8]: %timeit imgs = pdf2image.convert_from_path("tests/data/MUMmer.pdf", fmt="png",size=(1024,768))
4.3 s ± 162 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [9]: %timeit imgs = pdf2image.convert_from_path("tests/data/MUMmer.pdf", fmt="jpeg", size=(1024,768))
In [9]: %timeit imgs = pdf2image.convert_from_path("tests/data/MUMmer.pdf", fmt="jpeg", size=(1024,768))
970 ms ± 58.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [10]: %timeit imgs = pdf2image.convert_from_path("tests/data/MUMmer.pdf", size=(1024,768))
In [10]: %timeit imgs = pdf2image.convert_from_path("tests/data/MUMmer.pdf", size=(1024,768))
798 ms ± 37.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```

jpeg also outputs faster:
```
In [14]: %timeit imgs[3].save("tmp-1024-768.png")

```Python console
In [14]: %timeit imgs[3].save("tmp-1024-768.png")
198 ms ± 4.97 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [15]: %timeit imgs[3].save("tmp-1024-768.jpeg")
In [15]: %timeit imgs[3].save("tmp-1024-768.jpeg")
44.1 ms ± 4.36 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
```
```
1 change: 1 addition & 0 deletions py-shared/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
<!-- markdownlint-disable line-length -->
# Changelog

Changes are listed in reverse chronological order (newer entries at the top).
Expand Down

0 comments on commit 07b7f6a

Please sign in to comment.