Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle weird characters #114

Open
boogheta opened this issue Dec 3, 2018 · 4 comments
Open

Handle weird characters #114

boogheta opened this issue Dec 3, 2018 · 4 comments
Labels

Comments

@boogheta
Copy link
Contributor

boogheta commented Dec 3, 2018

Apparently there are in parsed texts a lot of crazy characters :)

cat */procedure/*/texte/texte.json | awk '{for(i=1;i<=NF;i++)if("'!'"a[\$i]++)print \$i}' FS=\"\" | sort | grep -vi '[a-z0-9]' | sed 's/.*/\`&\`/'

`´`
```
`×`
`<`
`≤`
`=`
`>`
`°`
` `
`_`
`­`
`-`
`,`
`;`
`:`
`?`
`/`
`.`
`·`
`¸`
`'`
`‘`
`"`
`(`
`)`
`[`
`]`
`{`
`}`
`§`
`€`
`*`
`\`
`&`
`#`
`%`
`+`
`̀`
`́`
`ʺ`
`ˮ`
`​`
`□`
`…`
`‰`
`•`
``
``
``
``
``
``
``
``
``
``
``
``
``
``
``
``
``
``
``
``
``
``
``
``
``
``
``
``
``
``
``
``
``
``
`⅓`
`�`
`¼`
`²`
`³`
`º`
` `
@mdamien
Copy link
Member

mdamien commented Dec 3, 2018

For all the , it's because of some Senate texts like this one:

https://www.lafabriquedelaloi.fr/articles.html?loi=pjl17-567&article=1er&etape=4

image

The text: https://www.senat.fr/leg/pjl17-631.html

@mdamien mdamien added the bug label Dec 3, 2018
@boogheta
Copy link
Contributor Author

boogheta commented Dec 3, 2018

Comme on ne les voit pas bien, voilà une capture :)
lfdll-weird-chars

@mdamien
Copy link
Member

mdamien commented Dec 5, 2018

Got it, the AN is using a font named "Numero" to display the pastilles.

@font-face {
 font-family: "Numero";
 src: url('/monalisa/resources/fonts/pastilles.woff') format('woff');
}

This font, of course 😔, rely on crazy unicode characters.

Fix incoming !

mdamien added a commit that referenced this issue Dec 5, 2018
This should reduce drastically the crazy unicode characters we have
@davidbgk
Copy link
Contributor

davidbgk commented Dec 5, 2018

😱 we're working hard on getting that info in a computable format but it's harder than expected, keep you posted…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants