Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove nb_hits update in config file #48

Merged
merged 8 commits into from
Jul 10, 2020

Conversation

renehernandez
Copy link
Contributor

@renehernandez renehernandez commented Jul 9, 2020

Description

Fix #17

Verifying nb_hits is still printed to console

I ran docs_scraper with my local changes against a testing meilisearch instance using a config file for our internal documentation infrastructure

Config file

Real URLs have been anonymized

{
  "index_uid": "docs",
  "start_urls": [
    {
      "url": "https://site1.docs.company.com",
      "selectors_key": "mkdocs"
    },
    {
      "url": "https://site2.docs.company.com",
      "selectors_key": "mkdocs"
    },
    {
      "url": "https://site3.docs.company.com",
      "selectors_key": "mkdocs"
    },
    {
      "url": "https://site4.docs.company.com",
      "selectors_key": "mkdocs"
    },
    {
      "url": "https://site5.docs.company.com",
      "selectors_key": "mkdocs"
    },
    {
      "url": "https://brand.company.com",
      "selectors_key": "brand"
    },
    {
      "url": "https://api.company.com/docs",
      "selectors_key": "api"
    }
  ],
  "sitemap_urls": [],
  "stop_urls": [
    "https://brand.company.com/download"
  ],
  "selectors": {
    "mkdocs": {
      "lvl0": {
        "selector": ".wy-menu-vertical a.current",
        "global": true,
        "default_value": "Documentation"
      },
      "lvl1": ".section h1",
      "lvl2": ".section h2",
      "lvl3": ".section h3",
      "lvl4": ".section h4",
      "lvl5": ".section h5",
      "lvl6": ".section h6",
      "text": ".section p, .section li, .section blockquote, .section pre"
    },
    "brand": {
      "lvl0": {
        "selector": "ul.navbar-nav a.active",
        "global": true,
        "default_value": "Documentation"
      },
      "lvl1": "section.resume-section h1",
      "lvl2": "section.resume-section h2",
      "lvl3": "section.resume-section h3",
      "lvl4": "section.resume-section h4",
      "lvl5": "section.resume-section h5",
      "lvl6": "section.resume-section h6",
      "text": "section.resume-section p, section.resume-section li, section.resume-section div.subheading, section.resume-section pre"
    },
    "api": {
      "lvl0": {
        "selector": ".toc-wrapper a.active",
        "global": true
      },
      "lvl1": "div.page-wrapper h1",
      "lvl2": "div.page-wrapper h2",
      "lvl3": "div.page-wrapper h3",
      "lvl4": "div.page-wrapper h4",
      "lvl5": "div.page-wrapper h5",
      "lvl6": "div.page-wrapper h6",
      "text": "div.page-wrapper p, div.page-wrapper li, div.page-wrapper blockquote, div.page-wrapper pre"
    }
  }
}

Docs-scraper output

$env:MEILISEARCH_API_KEY="master"; $env:MEILISEARCH_HOST_URL="http://localhost:7700"; pipenv run ./docs_scraper ../config.json
> Docs-Scraper: https://site1.docs.company.com 2 records)
> Docs-Scraper: https://site2.docs.company.com 38 records)
> Docs-Scraper: https://site3.docs.company.com 12 records)
> Docs-Scraper: https://brand.company.com 128 records)
> Docs-Scraper: https://site4.docs.company.com 13 records)
> Docs-Scraper: https://site5.docs.company.com 13 records)
> Docs-Scraper: https://site3.docs.company.com/frontend/ 10 records)
> Docs-Scraper: https://brand.company.com/designer-handbook.html 81 records)
> Docs-Scraper: https://api.company.com/docs 1776 records)
....
> Docs-Scraper: https://site1.docs.company.com/path1/ 38 records)
> Docs-Scraper: https://site1.docs.company.com/path2/ 24 records)
> Docs-Scraper: https://site1.docs.company.com/path3/ 36 records)
> Docs-Scraper: https://site1.docs.company.com/path4/ 31 records)
> Docs-Scraper: https://site1.docs.company.com/path5/ 48 records)

Nb hits: 4788

Changes

  • Remove nb_hits_updater.py file
  • Remove update_nb_hits_value function from config_loader
  • Remove update_nb_hits variable and envvar, since it is not used in the code

Copy link
Member

@curquiza curquiza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @renehernandez! thanks for your contribution!

Could you remove the previous nb_hits: XXX that is printed at the end if the config file contains a nb_hits field?
The goal is to stay consistent since we don't update the number of hits anymore but still want a display to know how many links were scraped.
Of course, the config file still has to accept the nb_hits fields to not be breaking.

@renehernandez
Copy link
Contributor Author

renehernandez commented Jul 10, 2020

Could you remove the previous nb_hits: XXX that is printed at the end if the config file contains a nb_hits field?

That is already gone as part of the removal of the NbHitsUpdater class, which in master branch is being printed at

print('previous nb_hits: {}\n'.format(self.previous_nb_hits))

@renehernandez
Copy link
Contributor Author

Of course, the config file still has to accept the nb_hits fields to not be breaking.

I did another test run, this time with the config.json having a nb_hits field and it worked as expected.

@renehernandez renehernandez requested a review from curquiza July 10, 2020 12:36
Copy link
Member

@curquiza curquiza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @renehernandez! It seems perfect from my POV! 😁
I like this kind of PR where we only remove lines!

For your next contribution, could you follow the contribution guidelines we recently added to this repo? 🙂
Indeed, we recommend creating a new branch on your fork repo instead of pointing your master to the master of the main repo. It's more convenient if I want to test and check your changes, and more convenient for you if you want to work on different issues at the same time or if we finally don't merge your PR 😉

Thanks again for your contributions, it's really helpful!!
If you like contributing in python project, we maintain a meilisearch-python repository. In general, you can find all the repos we maintain here.

I don't remember if I suggest you join us to our Slack, but here is the link. It can be useful if you need direct chatting.

Merging that!

@curquiza curquiza merged commit 025f8ed into meilisearch:master Jul 10, 2020
@renehernandez
Copy link
Contributor Author

@curquiza Will take it into account for future contributions.

I started here, because it is the project that I have been interacting the most with in the meilisearch ecosystem. In my day-to-day job I work more with golang and ruby, plus k8s infrastructure, so I am also interested/available to lend a hand in those areas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove nb_hits update in config file
2 participants