_ _ __ _ _
| | | | / _|(_) | |
| |__ ___ | || |_ _ ___ _ __ ___ ___ ___ __ _ _ __ ___ | |__
| '_ \ / _ \| || _|| | / _ \ | '__| / _ \ / __| / _ \ / _` || '__| / __|| '_ \
| |_) || __/| || | | || (_) || | | __/ \__ \| __/| (_| || | | (__ | | | |
|_.__/ \___||_||_| |_| \___/ |_| \___| |___/ \___| \__,_||_| \___||_| |_|
🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌻🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸
🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩
This project is aimed at the ingestion of a set of records from a CSV file into a modern lightweight full-text search engine, Orama, in order to evaluate its performance and perform useful searches at the same time.
Every Italian citizen has a unique ID, called FISCAL CODE. It is often used for fiscal / health insurance purposes. This code is generated through an algorithm. Apart from your personal details, the algorithm requires an external CODE (commonly called Belfiore code) that will be part of the suffix of the final algorithm's output.
Every city in Italy has its code and you can get the updated list at this url. As a user, you can easily remember the name of the city where you were born, but not the code. This API is for instant real-time search and retrieval of the Belfiore code (even if you get much more, as the entire archive of cities of Italy is ingested), starting from substrings of a city name, province, etc.
The project uses:
- node.js server environment
- typescript programming language
- orama db and search engine
- csv CSV parsing
- I like to use nodeenv to manage my node.js projects, so:
- Create and activate a node.js virtual env with the LTS version of node.js (currently
node:18.16.1
):
cd belfiore-search
nodeenv -n lts .nvenv
source .nvenv/bin/activate
- to build the project locally, execute in order:
npm install
npm run build
npm run vitest
npm run setup
The last command will start the ingestion of documents from the CSV into the db. Once completed (the operation lasts approx 2 secs after updating Orama to the latest version), the database is persisted in ./comuni.msp
-
now you can run a local http server and perform full-text searches running
npm run start
(ornpm run restart
as preferred way as it executes some checks on the latest release of the original dataset) -
WARNING!
CORS
is deliberately not enforced :)
{
"message": "You search with the following params. If you pass both of them, the second is used as a filter on the results",
"extra": " Optional params 'limit' (default is 10) and 'offset' (default is 0)",
"params": [
"DENOMINAZIONE_IT",
"SIGLAPROVINCIA"
]
}
{
"elapsed": {
"raw": 3515375,
"formatted": "3ms"
},
"hits": [
{
"id": "37899817-5378",
"score": 5.860792320596953,
"document": {
"ID": "438",
"DATAISTITUZIONE": "1935-03-07",
"DATACESSAZIONE": "9999-12-31",
"CODISTAT": "058009",
"CODCATASTALE": "A401",
"DENOMINAZIONE_IT": "ARICCIA",
"DENOMTRASLITTERATA": "ARICCIA",
"ALTRADENOMINAZIONE": "",
"ALTRADENOMTRASLITTERATA": "",
"ID_PROVINCIA": "58",
"IDPROVINCIAISTAT": "058",
"IDREGIONE": "12",
"IDPREFETTURA": "RM",
"STATO": "A",
"SIGLAPROVINCIA": "RM",
"FONTE": "",
"DATAULTIMOAGG": "2016-06-17",
"COD_DENOM": ""
}
},
{
"id": "37899817-5371",
"score": 5.860792320596953,
"document": {
"ID": "17567",
"DATAISTITUZIONE": "1871-01-15",
"DATACESSAZIONE": "1935-03-06",
"CODISTAT": "058009",
"CODCATASTALE": "A401",
"DENOMINAZIONE_IT": "ARICCIA",
"DENOMTRASLITTERATA": "ARICCIA",
"ALTRADENOMINAZIONE": "",
"ALTRADENOMTRASLITTERATA": "",
"ID_PROVINCIA": "58",
"IDPROVINCIAISTAT": "058",
"IDREGIONE": "12",
"IDPREFETTURA": "",
"STATO": "C",
"SIGLAPROVINCIA": "RM",
"FONTE": "",
"DATAULTIMOAGG": "2016-06-17",
"COD_DENOM": ""
}
}
],
"count": 2
}
limit
andoffset
params come from Orama and are used to achieve the pagination of results. The default values if they are not passed are 10 and 0 respectively.
Meaning of the fields, in italian.
If you search a city by its name, using the param DENOMINAZIONE_IT
for example, you may obtain an "historycal view of the city", as can exist similar documents with different intervals, in the past:
...
"DATAISTITUZIONE": "1871-01-15",
"DATACESSAZIONE": "1935-03-06",
...
or not (DATACESSAZIONE
is in the future, so the document represents the current state of the city):
...
"DATAISTITUZIONE": "1937-10-26",
"DATACESSAZIONE": "9999-12-31",
...
Results are DESC sorted by DATACESSAZIONE
.
You will find the latest docker image at the Docker Hub 🐳 . If you want to build it locally, you can run:
docker buildx build --platform=linux/amd64,linux/arm64 . -t giufus/belfiore-search
No output specified with docker-container driver. Build result will only remain in the build cache. To push result image into registry use --push
or to load image into docker use --load
.
docker build . -t giufus/belfiore-search
docker run -p 3000:3000 -d giufus/belfiore-search
- implementation of a performance test
- add details about CF algorithm
- add CI/CD (and hopefully a free hosting service)