-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move metadata cache one layer up #6153
Move metadata cache one layer up #6153
Conversation
994f50b
to
199c7b5
Compare
Codecov Report
@@ Coverage Diff @@
## master #6153 +/- ##
==========================================
+ Coverage 76.92% 76.95% +0.03%
==========================================
Files 47 47
Lines 9388 9375 -13
==========================================
- Hits 7222 7215 -7
+ Misses 2166 2160 -6
Continue to review full report at Codecov.
|
@betodealmeida @mistercrunch Please take a look |
fe747fb
to
214eaa1
Compare
1fd8623
to
06b3a25
Compare
superset/models/core.py
Outdated
""" | ||
if not self.allow_multi_schema_metadata_fetch: | ||
return [] | ||
tables_dict = self.db_engine_spec.fetch_result_sets(self, 'table') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result from the method fetch_results_sets
is super confusing... took me a while to understand line 879.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was feeling confused as well. let me see whether I can refactor it as well.
superset/cli.py
Outdated
print('Fetching {} datasources ...'.format(database.name)) | ||
try: | ||
database.all_table_names_in_database( | ||
db_id=database.id, force=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to pass database.id
here, if this is a method of database
? Looking at other places where this method is called, looks like db_id
is always the same as database.id
, no?
superset/models/core.py
Outdated
cache_timeout=None, force=False): | ||
""" | ||
Parameters need to be passed as keyword arguments. | ||
If cache=True, db_id must be passed in for setting cache key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, you want the database id to be part of the key. In that case, I think it's better to create a new decorator reusing cache_util.memoized_func
, that extracts the id
from self
. Here's a simple example showing the technique:
def instance_memo(attribute, cache={}):
def decorator(func):
def decorated(self, *args, **kwargs):
# in this example, key is just `Database.id`
key = getattr(self, attribute)
if key not in cache:
print('Writing to cache')
cache[key] = func(self, *args, **kwargs)
else:
print('Reading from cache')
return cache[key]
return decorated
return decorator
class Database:
def __init__(self, id):
self.id = id
@instance_memo('id')
def method(self):
pass
Calling it we get:
>>> db1 = Database(1)
>>> db1.method()
Writing to cache
>>> db1.method()
Reading from cache
>>> db1.method()
Reading from cache
>>> db2 = Database(2)
>>> db2.method()
Writing to cache
>>> db2.method()
Reading from cache
>>> db2.method()
Reading from cache
superset/models/core.py
Outdated
inspector=self.inspector, schema=schema) | ||
except Exception as e: | ||
logging.exception(e) | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need pass
here, you only need it in empty blocks (eg, if line 927 didn't have the logging).
except Exception: | ||
inspector=self.inspector, schema=schema) | ||
except Exception as e: | ||
logging.exception(e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, you can remove pass
below.
@betodealmeida Comments addressed. Please take another look. |
Awesome! Thanks, @youngyjd! |
CACHE_CONFIG = {'CACHE_TYPE': 'null'} | ||
TABLE_NAMES_CACHE_CONFIG = {'CACHE_TYPE': 'null'} | ||
CACHE_CONFIG = {'CACHE_TYPE': 'simple'} | ||
TABLE_NAMES_CACHE_CONFIG = {'CACHE_TYPE': 'simple'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you changed this for testing — can you revert the change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right! My bad
* Update wording * nit update for api endpoint url * move metadata cache one layer up * refactor cache * fix flake8 and DatabaseTablesAsync * nit * remove logging for cache * only fetch for all tables that allows cross schema fetch * default allow_multi_schema_metadata_fetch to False * address comments * remove unused defaultdict * flake 8 (cherry picked from commit c552c12)
* Update wording * nit update for api endpoint url * move metadata cache one layer up * refactor cache * fix flake8 and DatabaseTablesAsync * nit * remove logging for cache * only fetch for all tables that allows cross schema fetch * default allow_multi_schema_metadata_fetch to False * address comments * remove unused defaultdict * flake 8 (cherry picked from commit c552c12)
* Update wording * nit update for api endpoint url * move metadata cache one layer up * refactor cache * fix flake8 and DatabaseTablesAsync * nit * remove logging for cache * only fetch for all tables that allows cross schema fetch * default allow_multi_schema_metadata_fetch to False * address comments * remove unused defaultdict * flake 8
pyhive
is callinguse default
when a connection is initiated, https://github.com/dropbox/PyHive/blob/2c2446bf905ea321aac9dcdd3fa033909ff0b0b5/pyhive/hive.py#L205. This statement took 3 seconds in my experiment, so even metadata cache is enabled for Hive, user experience is still not very good.Moving the cache one layer up can avoid running
use default
statement.