-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to mitigate current Timeout issues in searches #1755
Comments
The flip side of this is that identification is unpredictable. (200 woodfrogs) and (500 woodfrogs) are not discoverable by any one ID search, where http://arctos.database.museum/SpecimenResults.cfm?taxon_name=Rana%20sylvatica finds them all - when it doesn't time out. Locality data are very similar (perhaps even less predictable) - the "any geog" term matches anything (including data from external sources) but is slow, specific fields are unpredictable (less so for Curators, possibly) but fast. Something I was looking at this AM looks like this... so isn't very discoverable by locality data. |
We moved the 'anytaxon' field to the top so the general public would be more likely to find what they were looking for - many would type in a higher taxon like a family name (eg "Silphidae") into what was the scientific name/identification field and be surprised that they were only finding specimens that were identified to family and no members of that family (id'd to species for example). Solution: have a different looking search form for logged-in users than the search form that the public gets when they aren't logged in? |
I think the user that spawned this was "us" and I recently dealt with something similar (but completely different!) involving a "not-us" grad student who was looking for a genus in "any taxon." Not to hijack this Issue TOO much, but the obvious solution is to throw hardware at this. I've also been dealing with various things, both internal (eg, part display, media access) and external (our "volunteers" at VertNet are restless) that could greatly benefit from an Oracle upgrade. Is there some way we can swing a major infrastructure upgrade? |
Yes, this was supposed to make public searches easier, but not if it times
out. I had a complaint from a local researcher about this last week: "any
taxon" = mammalia, "any geog" = california" timed out.
Is this something we need more disk space for at TACC?
Just saw Dusty's reply . . .
How much? Grant proposal with TACC?
…On Tue, Oct 23, 2018 at 1:59 PM DerekSikes ***@***.***> wrote:
We moved the 'anytaxon' field to the top so the general public would be
more likely to find what they were looking for - many would type in a
higher taxon like a family name (eg "Silphidae") into what was the
scientific name/identification field and be surprised that they were only
finding specimens that were identified to family and no *members* of that
family (id'd to species for example).
Solution: have a different looking search form for logged-in users than
the search form that the public gets when they aren't logged in?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1755 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AOH0hGCrizy8OiCjqjeJ7P7nmUvHwftdks5un3UQgaJpZM4X2e0F>
.
|
What's the hardware needed and let's get some estimates! Should we get Chris/TACC on this discussion? Otherwise if we leave Any Taxon as the default, are there tweaks to what it is searching on to avoid the timeout error. Because that is the issue and not doing anything isnt an option |
Space isn't the problem, processors are. The answer to "how much?" is always "more!" CA + Mammalia is the intersection of two ridiculously large caches. There are some organizational things we could do to make that faster (stripe both the data and the indexes across multiple independent disks that can be read simultaneously, for example), but it's mostly a matter or raw processors - we ultimately still have to sort through a ~half-billion rows to come up with an answer. There's also a bit of a positive feedback loop here. With more processors I could spend less time dealing with performance issues and more doing cool stuff with data, which would all take more resources. I'm fairly confident that I can find a use for just about any number of processors! |
So is this a proposal we can write with TACC?
…On Tue, Oct 23, 2018 at 2:56 PM dustymc ***@***.***> wrote:
Space isn't the problem, processors are.
The answer to "how much?" is always "more!" CA + Mammalia is the
intersection of two ridiculously large caches. There are some
organizational things we could do to make that faster (stripe both the data
and the indexes across multiple independent disks that can be read
simultaneously, for example), but it's mostly a matter or raw processors -
we ultimately still have to sort through a ~half-billion rows to come up
with an answer.
There's also a bit of a positive feedback loop here. With more processors
I could spend less time dealing with performance issues and more doing cool
stuff with data, which would all take more resources. I'm fairly confident
that I can find a use for just about any number of processors!
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1755 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AOH0hBHHd5co9SDyC8rcVwsBsIg6Ytwyks5un4KPgaJpZM4X2e0F>
.
|
Possibly even to TACC?? I don't know how any of that works. |
How many processors is Arctos limited to now?
What's a lot more? give me a number that makes you giddy. I need some
numbers to crunch....
…On Tue, Oct 23, 2018 at 2:23 PM dustymc ***@***.***> wrote:
with TACC?
Possibly even to TACC?? I don't know how any of that works.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1755 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACZ_0eZBCqFu-a6Nw7RfstlseLIEcNscks5un4i0gaJpZM4X2e0F>
.
|
I have no idea how to answer that. The box I can see is running 8 E5-2660s, but I don't know if that's what's actually available to Oracle or not - I think there's a bunch of VM magic somewhere in the stack. I suppose "giddy" would be something like Amazon's auto scaling, where it can pull from some ridiculous number of processors as needed. TACC is probably in the best position to provide an answer to what's possible/realistic. |
Per email 18 Dec 2018: Simple (temp?) solution: Move the "Search Any..." fields for geography and identification to the bottom of the set of fields in that block, and add a warning message "May time out - try a more specific search in the above fields" |
There is significant functionality in any_geog (and any_taxon) that is not accessible in any other way. I'm hesitant to hide them, even if they do cause a flurry of "Arctos is broken!!" emails 3 or 4 times per year. Less than 10% of registered users (and 0% of public users) have the "Locality" pane expanded by default. I think moving it and removing it are about equivalent for "not-us," and "we" (who created these data) are the only ones who have much of a chance of finding anything without the 'anything' option. @mkoo proposed replacing any_taxon with scientificname (eg, current identification). What is the proposed replacement for any_geog? |
I use the any_geography field ALL the time when searching for ethnological and historical items as we rarely have our collections geolocated (getting more each day) - this is the main way I cast a wide net. If I get too many, then I move to Specific Locality. Our users would certainly have trouble finding the right way to look up Alaska village names if this was dropped in the hierarchy. |
I am getting timeouts all too often, just now I can't even get a result for
all Alaska north slope Vulpes using a bounding box on the map. This should
be a couple hundred specimens. I eliminated some columns, chose tissues
only, etc. I can get a red fox result but not arctic fox. I am having to go
to vert net to find my specimens, not good.
…On Tue, Dec 18, 2018 at 8:18 AM dustymc ***@***.***> wrote:
There is significant functionality in any_geog (and any_taxon) that is not
accessible in any other way. I'm hesitant to hide them, even if they do
cause a flurry of "Arctos is broken!!" emails 3 or 4 times per year.
Less than 10% of registered users (and 0% of public users) have the
"Locality" pane expanded by default. I think moving it and removing it are
about equivalent for "not-us," and "we" (who created these data) are the
only ones who have much of a chance of finding anything without the
'anything' option.
@mkoo <https://github.com/mkoo> proposed replacing any_taxon with
scientificname (eg, current identification).
What is the proposed replacement for any_geog?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1755 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AIDLepaq5e__E72yyjP7CF9PXtilq3_uks5u6SNRgaJpZM4X2e0F>
.
--
Aren Gunderson
Mammal Collection Manager
University of Alaska Museum of the North <http://www.uaf.edu/museum>
1962 Yukon Drive
Fairbanks, AK 99775
amgunderson@alaska.edu
907-474-6947
|
@amgunderson I dug one of your queries (Vulpes and coordinates) out of the logs. As I found it, execution time was 00:01:21.31 which will time the UI out. With last_part_location removed it runs in 00:00:01.12 which should easily complete. I don't think we can cache part detail due to datatype limitations, which are currently 4KB. Supposedly Oracle 12 ups that to 32KB, which probably is enough for a cache. 12 also allegedly "speaks" JSON, so we could potentially do more with the cache, and it's probably cheaper to build it. This seems to be one of a growing list of things that an upgrade would make easy to solve, even if it doesn't directly solve it by just performing better. |
While we figure this out, maybe we should think about changing the explanation for the field, or adding suggestions? |
I tried some stuff from the recent timeout logs, there seem to be two major categories. Many of the queries complete well under the timeout limit - the performance of the DB is not the only factor here, it probably has to do with other queries/activity. This seems like something an Oracle performance consultant could fix - eg, it likely depends on things like the physical arrangement of disks. There are a several timeouts caused by lots of attributes and similarly-arranged data (#1755 (comment)). We could reduce the complexity of the data public users view (#857), reduce the number of attributes (#1623), cache attributes individually in FLAT (would be a fairly significant maintenance load increase, I'm not 100% sure it'll fit anyway), or possibly exploit Oracle 12's increased VARCHAR size and JSON capabilities (if we ever get migrated to 12!) to cache all attributes as a data object. @ewommack there's some discussion of the timeout error message (and other relevant stuff) at #1564. A fair number of these queries do seem to be "us" and "we," very much unlike public users, should know how we've organized the data and therefore how to find them without using the "any" options. |
I agree with @ewommack <https://github.com/ewommack> that having an
explanation in the interface would be a big help. I am especially concerned
because public users who don't know to click on see more options get
timeouts and give up, and then have a bad impression of our interface. But
even collection managers, especially from incoming collections, can use
some pointers with a helpful message. Another thing that can be done is to
go into a previous search and reduce the number of fields returned in
add/remove data fields. But that is a very obscure solution and it would
help to explain that as an option.
I talked with Jim Stuart, and suggested this may be related to their state
firewall. He is going to do the same search at home to see if he gets the
same timeout. He did successfully search on a smaller subset.
Dusty, which of the things on your list are actionable? Can we get a list
of specific things the AWG and steering committee can work towards? What do
we need to do to migrate to Oracle 12?
…On Wed, Feb 27, 2019 at 11:01 AM dustymc ***@***.***> wrote:
I tried some stuff from the recent timeout logs, there seem to be two
major categories.
Many of the queries complete well under the timeout limit - the
performance of the DB is not the only factor here, it probably has to do
with other queries/activity. This seems like something an Oracle
performance consultant could fix - eg, it likely depends on things like the
physical arrangement of disks.
There are a several timeouts caused by lots of attributes and
similarly-arranged data (#1755 (comment)
<#1755 (comment)>).
We could reduce the complexity of the data public users view (#857
<#857>), reduce the number of
attributes (#1623 <#1623>),
cache attributes individually in FLAT (would be a fairly significant
maintenance load increase, I'm not 100% sure it'll fit anyway), or possibly
exploit Oracle 12's increased VARCHAR size and JSON capabilities (if we
ever get migrated to 12!) to cache all attributes as a data object.
@ewommack <https://github.com/ewommack> there's some discussion of the
timeout error message (and other relevant stuff) at #1564
<#1564>. A fair number of these
queries do seem to be "us" and "we," very much unlike public users, should
know how we've organized the data and therefore how to find them without
using the "any" options.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1755 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AOH0hLG5eBV0gsKLZN8Q56jTRnBbj-FDks5vRsf0gaJpZM4X2e0F>
.
|
I completely agree, but that's only half the story: we have to balance that against "public users find half of our woodfrogs (none of our Silphidae, etc., etc., etc.) and give up, and then have a bad impression of our interface."
See #1564 - I don't think it's particularly obscure, but I'm definitely up for ideas re: how to make it more obvious. I can do that at any time. Maybe something like Emily's awesome tutorial (http://handbook.arctosdb.org/how_to/How-to-Create-Agents.html#tutorial-video) would be useful here.
TACC is working on migrating TEST to 12. I won't completely know how we might exploit those capabilities until I have a working copy.
We could restrict certain "results fields" to Operators - it would take some development, but I don't think it would be overly difficult. I dislike this - making the same data available to everyone should be a priority. Perhaps we could somewhat mitigate that by just providing all attributes as JSON instead of individual "columns" but that will fail for some records. (Oracle 12 will fix much, maybe even all, of that as well.) We can discuss the number of attributes=="result columns" - #1623. We could cache individual attributes in columns, but I'd prefer to play with Oracle12 first. We could pay for an Oracle consultant at any time. We could discuss general performance with TACC - eg, maybe the queries that time out in the UI and work for me are related to network/webserver/sunspots and not Oracle as all. (Seems a bit unlikely, but still worth investigating.) Sometimes specific queries can be tuned - I'm happy to take a look if you want to pass any known problems on. |
Expensive columns on for >200 users are:
That is not necessarily most traffic - users with rarely-used columns may perform most queries, public users may turn things on, etc - but I think it's reasonably close. Adding only those to FLAT might offload enough processing to made a difference, and shouldn't be too much work/maintenance. |
Dusty, can you explain a bit as to what you mean by offloading to FLAT? Is
FLAT indexed data? I'm trying to understand all this.
…On Wed, Feb 27, 2019 at 3:14 PM dustymc ***@***.***> wrote:
Expensive columns on for >200 users are:
- weight
- age
- age class
- total length
- tail length
- reproductive data
- hind foot with claw
That is not necessarily most traffic - users with rarely-used columns may
perform most queries, public users may turn things on, etc - but I think
it's reasonably close.
Adding only those to FLAT might offload enough processing to made a
difference, and shouldn't be too much work/maintenance.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1755 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AOH0hNky8ghM5ElTsJRCfUDyMk8Cam5lks5vRwNUgaJpZM4X2e0F>
.
|
Yes, essentially that. Attributes can't be smooshed into spreadsheets very readily (there might be 20 sex determinations) so they're pulled into results by calling a procedure which smooshes them into something that'll fit in a table cell. That's fairly easy to write code to and there's not much complexity involved, but the procedure is expensive and runs for every attribute in every row of every search result (that includes an attribute). FLAT caches those data as text when they're created/changed. The computational costs are essentially pre-paid. The remaining costs are (mostly) just in maintenance and storage - the cache uses ~3 times as much disk (which I don't think is much of a problem), and there's a bunch of code to maintain. Adding a couple hundred attributes that change all the time to the cache just isn't something that I think we can afford. Adding the half-dozen that are used the most and probably won't ever change seems more realistic. It looks to me like most of the problem queries are "us." Geography-cache queries usually work, taxonomy-cache queries usually work, combinations of them sorta-usually work, any of that combined with a bunch of curatorial-type attributes fails. State (or county or whatever) and/or identification and a bunch of curatorial-type attributes usually works. Can we come up with a "welcome to Arctos, don't use the 'any' things" tutorial to send out with operator invitations or something?? |
I just dug through a bunch of timeouts from the last ~48 hours. Data from the logs and some comments below. The attribute thing does not seem to be a major factor in this. There may be other reasons to be more aggressive in caching it, but that won't fix most of these. Some (maybe most) of these seem perfectly appropriate to me - the user did exactly what I'd have done (eg, because the taxonomy is waffly or it's a "minor" geography term that might be anywhere). Others would have been better able to answer the question (or what I guess the question was) using "normal" terms. any geog "colorado"
returns 415 rows from places like... North America, United States, Wyoming, Sweetwater County, Colorado River for example - "stuff about which someone at some time used the string 'colorado'" - which is probably not what the user was looking for. We definitely need better documentation, and I'll play with more specific suggestions. ("that timed out - click here to try state_prov=colorado instead"). No clue how realistic that is yet. Almost all of these completed in well under a minute. SOMETHING other than direct DB performance is a factor here. I'll ask TACC to check the webserver and such, but I suspect it's other things simultaneously going on in the DB and the solution comes back to an Oracle tuneup. (wild baseless guess: we need a disk dedicated to the temp tables these queries generate) rawipaddress: 129.108.202.164,129.114.52.171 source: UT-EL-PASO User: public SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'GREAT BLUE HERON%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'GREAT BLUE HERON%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'GREAT BLUE HERON%' ) Execution Time: Elapsed: 00:00:02.08 Suggestion: none, if the query was for "things that have been called great blue heron" Problem: other stuff was busy?? rawipaddress: 50.236.76.57,129.114.52.171 Comcast username: (academic non-operator) SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters,concatAttributeValue(filtered_flat.collection_object_id,'total length') total_length,concatAttributeValue(filtered_flat.collection_object_id,'weight') weight FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN specimen_event segc ON (filtered_flat.collection_object_id = segc.collection_object_id) INNER JOIN cache_anygeog ON (segc.specimen_event_id = cache_anygeog.specimen_event_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'TAMIAS MINIMUS%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'TAMIAS MINIMUS%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'TAMIAS MINIMUS%' ) AND filtered_flat.collection_id IN ( select collection_id from collection where upper(guid_prefix) IN ('DMNS:MAMM') ) AND cache_anygeog.geostring like '%COLORADO%' Execution Time: Elapsed: 00:00:44.29 Suggestion: use state_prov (if that's the Colorado in question) and identified as (if that's the scope) Problem: 2 expensive terms + other stuff was busy?? rawipaddress: 164.64.199.11,129.114.52.171 NM-STATE-GOVERNMENT SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN specimen_event segc ON (filtered_flat.collection_object_id = segc.collection_object_id) INNER JOIN cache_anygeog ON (segc.specimen_event_id = cache_anygeog.specimen_event_id) WHERE filtered_flat.collection_object_id IS NOT NULL AND cache_anygeog.geostring like '%VALENCIA%' Elapsed: 00:00:10.23 Suggestion: none, it's hard to say what "Valencia" is or where we might have entered it problem: other stuff was busy?? rawipaddress: 136.152.143.22,129.114.52.171 SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN specimen_event segc ON (filtered_flat.collection_object_id = segc.collection_object_id) INNER JOIN cache_anygeog ON (segc.specimen_event_id = cache_anygeog.specimen_event_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'MULTIFASCIATA%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'MULTIFASCIATA%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'MULTIFASCIATA%' ) AND filtered_flat.collection_id IN ( select collection_id from collection where upper(guid_prefix) IN ('MVZ:HERP') ) AND cache_anygeog.geostring like '%CHINA%' Elapsed: 00:00:30.68 suggestion:
rawipaddress: 129.108.81.21,129.114.52.171 SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN specimen_event segc ON (filtered_flat.collection_object_id = segc.collection_object_id) INNER JOIN cache_anygeog ON (segc.specimen_event_id = cache_anygeog.specimen_event_id) WHERE filtered_flat.collection_object_id IS NOT NULL AND filtered_flat.collection_id IN ( select collection_id from collection where upper(guid_prefix) IN ('UTEP:HERB') ) AND cache_anygeog.geostring like '%IMPERIAL COUNTY%' Elapsed: 00:00:07.61 suggestion: use county rawipaddress: 187.189.107.207,129.114.52.171 SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'APHELOCOMA WOODHOUSEII%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'APHELOCOMA WOODHOUSEII%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'APHELOCOMA WOODHOUSEII%' ) Elapsed: 00:00:02.16 suggestion: MAYBE use identifiedas, but this is probably what I'd have done rawipaddress: 201.108.154.176,129.114.52.171 SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN specimen_event segc ON (filtered_flat.collection_object_id = segc.collection_object_id) INNER JOIN cache_anygeog ON (segc.specimen_event_id = cache_anygeog.specimen_event_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'PUMA%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'PUMA%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'PUMA%' ) AND cache_anygeog.geostring like '%MEXICO %' Elapsed: 00:00:16.77 suggestion:
rawipaddress: 164.64.199.11,129.114.52.171 SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN specimen_event segc ON (filtered_flat.collection_object_id = segc.collection_object_id) INNER JOIN cache_anygeog ON (segc.specimen_event_id = cache_anygeog.specimen_event_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'MEPHITIS MACROURA%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'MEPHITIS MACROURA%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'MEPHITIS MACROURA%' ) AND cache_anygeog.geostring like '%NEW MEXICO%'; Elapsed: 00:00:11.54 suggestion: use identifiedas and state rawipaddress: 164.64.199.11,129.114.52.171 SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN specimen_event segc ON (filtered_flat.collection_object_id = segc.collection_object_id) INNER JOIN cache_anygeog ON (segc.specimen_event_id = cache_anygeog.specimen_event_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'NYCTINOMOPS%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'NYCTINOMOPS%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'NYCTINOMOPS%' ) AND cache_anygeog.geostring like '%NEW MEXICO%' Elapsed: 00:00:12.10 suggestion: use identifiedas and state rawipaddress: 129.72.142.82,129.114.52.171 SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN specimen_event segc ON (filtered_flat.collection_object_id = segc.collection_object_id) INNER JOIN cache_anygeog ON (segc.specimen_event_id = cache_anygeog.specimen_event_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'FALCO%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'FALCO%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'FALCO%' ) AND cache_anygeog.geostring like '%WYOMING%' Elapsed: 00:00:16.91 suggestion: use identifiedas and state rawipaddress: 192.12.184.6,129.114.52.171 SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.county county,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,getMonthCollected(filtered_flat.began_date,filtered_flat.ended_date) month_collected,filtered_flat.year year_collected,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters,concatAttributeValue(filtered_flat.collection_object_id,'sex') sex,concatAttributeValue(filtered_flat.collection_object_id,'weight') weight FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) WHERE filtered_flat.collection_object_id IS NOT NULL AND filtered_flat.cataloged_item_type='specimen' AND filtered_flat.collection_id IN ( select collection_id from collection where upper(guid_prefix) IN ('ALMNH:ES','ASNHC:MAMM','ASNHC:HERB','STAR:ALG','ASUMZ:HERP','ASUMZ:INVERT','ASUMZ:BIRD','ASUMZ:BIVALVE','ASUMZ:FISH','STAR:HERB','ASUMZ:ENTO','ASUMZ:MAMM','APSU:HERP','APSU:BIRD','APSU:FISH','APSU:MAMM','UWBM:HERP','UWBM:MAMM','CHAS:HERP','CHAS:EGG','CHAS:BIRD','CHAS:ES','CHAS:EH','CHAS:FISH','CHAS:ENTO','CHAS:MAMM','CHAS:INV','CHAS:HERB','CHAS:TEACH','COA:HERP','COA:EGG','COA:BIRD','COA:ENTO','COA:MAMM','DMNS:HERP','DMNS:EGG','DMNS:BIRD','DMNS:MAMM','DMNS:INV','DMNS:PARA','HWML:PARA','KNWROBS:FISH','KNWR:ENTO','KNWR:INV','KNWROBS:MAMM','KNWROBS:HERB','KNWR:HERB','KWP:ENTO','UCSC:HERP','UCSC:BIRD','UCSC:FISH','UCSC:MAMM','MLZ:EGG','MLZ:BIRD','MLZ:MAMM','MSB:HERP','MSB:BIRD','MSB:FISH','MSB:HOST','MSBOBS:MAMM','MSB:MAMM','MSB:PARA','MVZOBS:HERP','MVZ:HERP','MVZ:HILD','MVZ:EGG','MVZOBS:BIRD','MVZ:BIRD','MVZOBS:MAMM','MVZ:MAMM','UMNH:HERP','UMNH:BIRD','UMNH:MAMM','NMU:HERP','NMU:BIRD','NMU:MAMM','NMU:PARA','OWU:AMPH','OWU:BIRD','OWU:ENTO','OWU:EH','OWU:FISH','OWU:GEOL','OWU:HERB','OWU:INV','OWU:MAMM','OWU:ES','OWU:PARA','OWU:REPT','NBSB:BIRD','USNPC:PARA','UAM:ALG','UAM:HERP','UAM:ARC','UAMOBS:BIRD','UAM:BIRD','UAMB:HERB','UAM:ES','UAM:ENV','UAM:EH','UAMOBS:FISH','UAM:FISH','UAMOBS:ENTO','UAM:ENTO','UAM:INV','UAMOBS:MAMM','UAM:MAMM','UAM:HERB','UC-SCFS:BIRD','UC-SCFS:MAMM','UCM:HERP','UCM:EGG','UCM:BIRD','UCM:FISH','UCM:MAMM','UCM:OBS','UMZM:BIRD','UMZM:MAMM','UNR:HERP','UNR:EGG','UNR:BIRD','UNR:FISH','UNR:MAMM','UNM:ES','UTEPOBS:HERP','UTEP:HERPOS','UTEP:HERP','UTEP:ARC','UTEP:BIRD','UTEP:ES','UTEP:FISH','UTEPOBS:ENTO','UTEP:ENTO','UTEP:INV','UTEP:MAMM','UTEP:HERB','UTEP:TEACH','UTEP:ZOO','UWYMV:HERP','UWYMV:BIRD','UWYMV:FISH','UWYMV:MAMM','WNMU:BIRD','WNMU:FISH','WNMU:MAMM') ) AND upper(filtered_flat.phylorder) like 'RODENTIA%' AND UPPER(filtered_flat.state_prov) LIKE '%NEW MEXICO%' 89159 rows selected. Elapsed: 00:01:04.39 suggestion: use class and state rawipaddress: 98.238.128.19,129.114.52.171 SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'MUS%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'MUS%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'MUS%' ) DLM (successfully!) trying to break it rawipaddress: 136.152.143.41,129.114.52.171 SELECT distinct flat.collection_object_id,flat.guid guid,flat.scientific_name scientific_name,flat.country country,flat.state_prov state_prov,flat.spec_locality spec_locality,flat.verbatim_date verbatim_date,flat.dec_lat dec_lat,flat.dec_long dec_long,flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM flat WHERE flat.collection_object_id IS NOT NULL and flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'AGELAIUS PHOENICEUS MALLIARDORUM%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'AGELAIUS PHOENICEUS MALLIARDORUM%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'AGELAIUS PHOENICEUS MALLIARDORUM%' ) AND flat.collection_id IN ( select collection_id from collection where upper(guid_prefix) IN ('MVZ:BIRD') ) AND upper(PARTS) LIKE '%SKIN%' no rows selected Elapsed: 00:01:06.63 suggestion: use identifiedas rawipaddress: 131.216.46.100,129.114.52.171 SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN fake_coordinate_error ON (filtered_flat.locality_id = fake_coordinate_error.locality_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'MAMMAL%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'MAMMAL%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'MAMMAL%' ) AND ( ( 36.23007635975191 between fake_coordinate_error.swlat and fake_coordinate_error.nelat OR 34.953528210949415 between fake_coordinate_error.swlat and fake_coordinate_error.nelat OR fake_coordinate_error.swlat between 34.953528210949415 and 36.23007635975191 OR fake_coordinate_error.nelat between 34.953528210949415 and 36.23007635975191 ) AND ( -115.437744140625 between fake_coordinate_error.swlong and fake_coordinate_error.nelong OR -114.49374389648438 between fake_coordinate_error.swlong and fake_coordinate_error.nelong OR fake_coordinate_error.swlong between -115.437744140625 and -114.49374389648438 OR fake_coordinate_error.nelong between -115.437744140625 and -114.49374389648438 ) ) 1186 rows selected. Elapsed: 00:00:04.02 suggestion: use class |
Thanks Dusty!
I can see my time out where I was playing around in the system yesterday after the conversation started. I was trying out different combos of options to see what might cause it to time out.
cheers,
Beth
On Thu, Feb 28, 2019 at 1:17 PM dustymc <notifications@github.com<mailto:notifications@github.com>> wrote:
◆ This message was sent from a non-UWYO address. Please exercise caution when clicking links or opening attachments from external sources.
I just dug through a bunch of timeouts from the last ~48 hours. Data from the logs and some comments below.
The attribute thing does not seem to be a major factor in this. There may be other reasons to be more aggressive in caching it, but that won't fix most of these.
Some (maybe most) of these seem perfectly appropriate to me - the user did exactly what I'd have done (eg, because the taxonomy is waffly or it's a "minor" geography term that might be anywhere).
Others would have been better able to answer the question (or what I guess the question was) using "normal" terms. any geog "colorado"
select higher_geog from geog_auth_rec where GEOG_AUTH_REC_ID in (select GEOG_AUTH_REC_ID from cache_anygeog where GEOSTRING like '%COLORADO%') group by higher_geog order by higher_geog;
returns 415 rows from places like...
North America, United States, Wyoming, Sweetwater County, Colorado River
North America, United States, Wyoming, Uinta County, Colorado River
South America
South America, Argentina
South America, Argentina, Cordoba
South America, Argentina, Formosa
South America, Argentina, Neuquen
South America, Argentina, Rio Negro
South America, Argentina, Salta
South America, Argentina, Salta, Cafayate Department
South America, Bolivia, Potosi
South America, Brazil, Distrito Federal
South America, Brazil, Rondonia
South America, Chile, Araucania
South America, Colombia, Boyaca
South America, Ecuador
South America, Ecuador, Pichincha
South America, Ecuador, Santo Domingo
West Indies, Netherlands, Netherlands Antilles, Aruba Island Territory, Lesser Antilles, Aruba
West Indies, Trinidad and Tobago, Saint Patrick Parish, Lesser Antilles, Trinidad
West Indies, Trinidad and Tobago, Victoria Parish, Lesser Antilles, Trinidad
no higher geography recorded
for example - "stuff about which someone at some time used the string 'colorado'" - which is probably not what the user was looking for. We definitely need better documentation, and I'll play with more specific suggestions. ("that timed out - click here to try state_prov=colorado instead"). No clue how realistic that is yet.
Almost all of these completed in well under a minute. SOMETHING other than direct DB performance is a factor here. I'll ask TACC to check the webserver and such, but I suspect it's other things simultaneously going on in the DB and the solution comes back to an Oracle tuneup. (wild baseless guess: we need a disk dedicated to the temp tables these queries generate)
…________________________________
rawipaddress: 129.108.202.164,129.114.52.171
source: UT-EL-PASO
User: public
SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'GREAT BLUE HERON%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'GREAT BLUE HERON%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'GREAT BLUE HERON%' )
Execution Time: Elapsed: 00:00:02.08
Suggestion: none, if the query was for "things that have been called great blue heron"
Problem: other stuff was busy??
________________________________
rawipaddress: 50.236.76.57,129.114.52.171
Comcast
username: (academic non-operator)
SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters,concatAttributeValue(filtered_flat.collection_object_id,'total length') total_length,concatAttributeValue(filtered_flat.collection_object_id,'weight') weight FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN specimen_event segc ON (filtered_flat.collection_object_id = segc.collection_object_id) INNER JOIN cache_anygeog ON (segc.specimen_event_id = cache_anygeog.specimen_event_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'TAMIAS MINIMUS%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'TAMIAS MINIMUS%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'TAMIAS MINIMUS%' ) AND filtered_flat.collection_id IN ( select collection_id from collection where upper(guid_prefix) IN ('DMNS:MAMM') ) AND cache_anygeog.geostring like '%COLORADO%'
Execution Time: Elapsed: 00:00:44.29
Suggestion: use state_prov (if that's the Colorado in question) and identified as (if that's the scope)
Problem: 2 expensive terms + other stuff was busy??
________________________________
rawipaddress: 164.64.199.11,129.114.52.171
NM-STATE-GOVERNMENT
SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN specimen_event segc ON (filtered_flat.collection_object_id = segc.collection_object_id) INNER JOIN cache_anygeog ON (segc.specimen_event_id = cache_anygeog.specimen_event_id) WHERE filtered_flat.collection_object_id IS NOT NULL AND cache_anygeog.geostring like '%VALENCIA%'
;
Elapsed: 00:00:10.23
Suggestion: none, it's hard to say what "Valencia" is or where we might have entered it
problem: other stuff was busy??
________________________________
rawipaddress: 136.152.143.22,129.114.52.171
University of California at Berkeley (UCAB-1)
SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN specimen_event segc ON (filtered_flat.collection_object_id = segc.collection_object_id) INNER JOIN cache_anygeog ON (segc.specimen_event_id = cache_anygeog.specimen_event_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'MULTIFASCIATA%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'MULTIFASCIATA%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'MULTIFASCIATA%' ) AND filtered_flat.collection_id IN ( select collection_id from collection where upper(guid_prefix) IN ('MVZ:HERP') ) AND cache_anygeog.geostring like '%CHINA%'
Elapsed: 00:00:30.68
suggestion:
1. use country
2. not clear what 'MULTIFASCIATA' refers to - probably needs either a beginning % or more in identified-as?
________________________________
rawipaddress: 129.108.81.21,129.114.52.171
UT-EL-PASO
SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN specimen_event segc ON (filtered_flat.collection_object_id = segc.collection_object_id) INNER JOIN cache_anygeog ON (segc.specimen_event_id = cache_anygeog.specimen_event_id) WHERE filtered_flat.collection_object_id IS NOT NULL AND filtered_flat.collection_id IN ( select collection_id from collection where upper(guid_prefix) IN ('UTEP:HERB') ) AND cache_anygeog.geostring like '%IMPERIAL COUNTY%'
Elapsed: 00:00:07.61
suggestion: use county
________________________________
rawipaddress: 187.189.107.207,129.114.52.171
MX-TPTE-LACNIC (mexico)
SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'APHELOCOMA WOODHOUSEII%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'APHELOCOMA WOODHOUSEII%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'APHELOCOMA WOODHOUSEII%' )
Elapsed: 00:00:02.16
suggestion: MAYBE use identifiedas, but this is probably what I'd have done
________________________________
rawipaddress: 201.108.154.176,129.114.52.171
MX-USCV4-LACNIC (mexico)
SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN specimen_event segc ON (filtered_flat.collection_object_id = segc.collection_object_id) INNER JOIN cache_anygeog ON (segc.specimen_event_id = cache_anygeog.specimen_event_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'PUMA%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'PUMA%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'PUMA%' ) AND cache_anygeog.geostring like '%MEXICO %'
Elapsed: 00:00:16.77
suggestion:
1. MAYBE use identifiedas, but this is probably what I'd have done
2. use country
________________________________
rawipaddress: 164.64.199.11,129.114.52.171
NM-STATE-GOVERNMENT
SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN specimen_event segc ON (filtered_flat.collection_object_id = segc.collection_object_id) INNER JOIN cache_anygeog ON (segc.specimen_event_id = cache_anygeog.specimen_event_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'MEPHITIS MACROURA%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'MEPHITIS MACROURA%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'MEPHITIS MACROURA%' ) AND cache_anygeog.geostring like '%NEW MEXICO%';
Elapsed: 00:00:11.54
suggestion: use identifiedas and state
________________________________
rawipaddress: 164.64.199.11,129.114.52.171
NM-STATE-GOVERNMENT
SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN specimen_event segc ON (filtered_flat.collection_object_id = segc.collection_object_id) INNER JOIN cache_anygeog ON (segc.specimen_event_id = cache_anygeog.specimen_event_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'NYCTINOMOPS%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'NYCTINOMOPS%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'NYCTINOMOPS%' ) AND cache_anygeog.geostring like '%NEW MEXICO%'
Elapsed: 00:00:12.10
suggestion: use identifiedas and state
________________________________
rawipaddress: 129.72.142.82,129.114.52.171
UWYO
SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN specimen_event segc ON (filtered_flat.collection_object_id = segc.collection_object_id) INNER JOIN cache_anygeog ON (segc.specimen_event_id = cache_anygeog.specimen_event_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'FALCO%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'FALCO%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'FALCO%' ) AND cache_anygeog.geostring like '%WYOMING%'
Elapsed: 00:00:16.91
suggestion: use identifiedas and state
________________________________
rawipaddress: 192.12.184.6,129.114.52.171
Los Alamos National Laboratory (LANL-2)
username: D*********1@gmail.com<mailto:1@gmail.com>
SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.county county,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,getMonthCollected(filtered_flat.began_date,filtered_flat.ended_date) month_collected,filtered_flat.year year_collected,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters,concatAttributeValue(filtered_flat.collection_object_id,'sex') sex,concatAttributeValue(filtered_flat.collection_object_id,'weight') weight FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) WHERE filtered_flat.collection_object_id IS NOT NULL AND filtered_flat.cataloged_item_type='specimen' AND filtered_flat.collection_id IN ( select collection_id from collection where upper(guid_prefix) IN ('ALMNH:ES','ASNHC:MAMM','ASNHC:HERB','STAR:ALG','ASUMZ:HERP','ASUMZ:INVERT','ASUMZ:BIRD','ASUMZ:BIVALVE','ASUMZ:FISH','STAR:HERB','ASUMZ:ENTO','ASUMZ:MAMM','APSU:HERP','APSU:BIRD','APSU:FISH','APSU:MAMM','UWBM:HERP','UWBM:MAMM','CHAS:HERP','CHAS:EGG','CHAS:BIRD','CHAS:ES','CHAS:EH','CHAS:FISH','CHAS:ENTO','CHAS:MAMM','CHAS:INV','CHAS:HERB','CHAS:TEACH','COA:HERP','COA:EGG','COA:BIRD','COA:ENTO','COA:MAMM','DMNS:HERP','DMNS:EGG','DMNS:BIRD','DMNS:MAMM','DMNS:INV','DMNS:PARA','HWML:PARA','KNWROBS:FISH','KNWR:ENTO','KNWR:INV','KNWROBS:MAMM','KNWROBS:HERB','KNWR:HERB','KWP:ENTO','UCSC:HERP','UCSC:BIRD','UCSC:FISH','UCSC:MAMM','MLZ:EGG','MLZ:BIRD','MLZ:MAMM','MSB:HERP','MSB:BIRD','MSB:FISH','MSB:HOST','MSBOBS:MAMM','MSB:MAMM','MSB:PARA','MVZOBS:HERP','MVZ:HERP','MVZ:HILD','MVZ:EGG','MVZOBS:BIRD','MVZ:BIRD','MVZOBS:MAMM','MVZ:MAMM','UMNH:HERP','UMNH:BIRD','UMNH:MAMM','NMU:HERP','NMU:BIRD','NMU:MAMM','NMU:PARA','OWU:AMPH','OWU:BIRD','OWU:ENTO','OWU:EH','OWU:FISH','OWU:GEOL','OWU:HERB','OWU:INV','OWU:MAMM','OWU:ES','OWU:PARA','OWU:REPT','NBSB:BIRD','USNPC:PARA','UAM:ALG','UAM:HERP','UAM:ARC','UAMOBS:BIRD','UAM:BIRD','UAMB:HERB','UAM:ES','UAM:ENV','UAM:EH','UAMOBS:FISH','UAM:FISH','UAMOBS:ENTO','UAM:ENTO','UAM:INV','UAMOBS:MAMM','UAM:MAMM','UAM:HERB','UC-SCFS:BIRD','UC-SCFS:MAMM','UCM:HERP','UCM:EGG','UCM:BIRD','UCM:FISH','UCM:MAMM','UCM:OBS','UMZM:BIRD','UMZM:MAMM','UNR:HERP','UNR:EGG','UNR:BIRD','UNR:FISH','UNR:MAMM','UNM:ES','UTEPOBS:HERP','UTEP:HERPOS','UTEP:HERP','UTEP:ARC','UTEP:BIRD','UTEP:ES','UTEP:FISH','UTEPOBS:ENTO','UTEP:ENTO','UTEP:INV','UTEP:MAMM','UTEP:HERB','UTEP:TEACH','UTEP:ZOO','UWYMV:HERP','UWYMV:BIRD','UWYMV:FISH','UWYMV:MAMM','WNMU:BIRD','WNMU:FISH','WNMU:MAMM') ) AND upper(filtered_flat.phylorder) like 'RODENTIA%' AND UPPER(filtered_flat.state_prov) LIKE '%NEW MEXICO%'
89159 rows selected.
Elapsed: 00:01:04.39
suggestion: use class and state
NOTE: has attributes
________________________________
rawipaddress: 98.238.128.19,129.114.52.171
SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'MUS%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'MUS%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'MUS%' )
DLM (successfully!) trying to break it
________________________________
rawipaddress: 136.152.143.41,129.114.52.171
University of California at Berkeley (UCAB-1)
username: operator
SELECT distinct flat.collection_object_id,flat.guid guid,flat.scientific_name scientific_name,flat.country country,flat.state_prov state_prov,flat.spec_locality spec_locality,flat.verbatim_date verbatim_date,flat.dec_lat dec_lat,flat.dec_long dec_long,flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM flat WHERE flat.collection_object_id IS NOT NULL and flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'AGELAIUS PHOENICEUS MALLIARDORUM%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'AGELAIUS PHOENICEUS MALLIARDORUM%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'AGELAIUS PHOENICEUS MALLIARDORUM%' ) AND flat.collection_id IN ( select collection_id from collection where upper(guid_prefix) IN ('MVZ:BIRD') ) AND upper(PARTS) LIKE '%SKIN%'
no rows selected
Elapsed: 00:01:06.63
suggestion: use identifiedas
________________________________
rawipaddress: 131.216.46.100,129.114.52.171
Nevada System of Higher Education (NEVAD-9)
SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN fake_coordinate_error ON (filtered_flat.locality_id = fake_coordinate_error.locality_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'MAMMAL%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'MAMMAL%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'MAMMAL%' ) AND ( ( 36.23007635975191 between fake_coordinate_error.swlat and fake_coordinate_error.nelat OR 34.953528210949415 between fake_coordinate_error.swlat and fake_coordinate_error.nelat OR fake_coordinate_error.swlat between 34.953528210949415 and 36.23007635975191 OR fake_coordinate_error.nelat between 34.953528210949415 and 36.23007635975191 ) AND ( -115.437744140625 between fake_coordinate_error.swlong and fake_coordinate_error.nelong OR -114.49374389648438 between fake_coordinate_error.swlong and fake_coordinate_error.nelong OR fake_coordinate_error.swlong between -115.437744140625 and -114.49374389648438 OR fake_coordinate_error.nelong between -115.437744140625 and -114.49374389648438 ) )
1186 rows selected.
Elapsed: 00:00:04.02
suggestion: use class
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#1755 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ARs_RLDWONj0jT-ehA4-wk3va2pbH8ZTks5vSDemgaJpZM4X2e0F>.
--
Elizabeth Wommack, PhD
Curator and Collections Manager of Vertebrates
University of Wyoming Museum of Vertebrates
Berry Biodiversity Conservation Center
University of Wyoming,
Laramie, WY 82071
ewommack@<mailto:ewommack@berkeley.edu>uwyo.edu<http://uwyo.edu/>
www.uwymv.<http://www.uwymv.edu/>org
|
YAY!!! NM-STATE-GOVERNMENT - was probably looking for Valencia County. Using the county field does it. Not sure how to direct people there other than what we do already? WOW - LANL is searching our stuff? Wonder what they are doing with the data? Interesting! |
I did what I can for getLatestPartLocation - I doubt you'll see a real difference, but it's slightly more efficient than it used to be.... |
Thanks Dusty! I didn't realize I closed this issue by accident. Still feeling pretty new to GitHub. |
As we are going to be presenting on Arctos next month at the SPNHC meeting, this is now becoming an even more critical issue in need of resolution. We absolutely must get rid of the any taxon element and any geographic element search fields, or hide them somewhere very deep and put big red warnings on them. Right now, anyone going to our search page will use those two boxes, and they will get a visually nasty and discouraging red letter message of failure. This is not how we reach the public or serve our own members. I just searched on a single species name and a single locality term for a record I know exists, and got the timeout. I put the same search terms in the identification and specific locality boxes and got results within a second. Replace those boxes with Identification and Locality fields, please. |
And, to clarify, I was searching as a public user. |
Removal is not mitigation; it's just removing a powerful tool capable of piercing the veil of our amazingly inconsistent data. If we must shoot ourselves in the foot, hopefully temporarily until we can find some actual way to supply the resources required by these hugely powerful options, can we limit it to one foot? I'd vote fairly enthusiastically for keeping the taxonomy option.
And from that last point, maybe we can shoot ourselves in that one foot with a slightly smaller gun by implementing a "any curatorially-asserted geography" option, which would somewhat alleviate the issue @AJLinn brought up? That's maybe 10% of the functionality of the current thing, but at least it'll find "fairbanks" spread across specloc/verbloc/quad/county without searching 4 times. |
This morning I searched on Falco sparverius from Roswell, using those
fields, and got a timeout. I just tried again with the taxon only and it
works.
So perhaps if we prevent people from trying to use both at the same time,
by getting rid of the geography search or burying it?
Maybe we need a "search locality" that can pull only from specific or
verbatim locality terms? Not sure if that is any better - probably would
confuse people who think that means "California".
Make Specific Locality more visible?
What does MCZ do? Their interface is simple and clear.
…On Thu, Apr 25, 2019 at 8:49 AM dustymc ***@***.***> wrote:
Removal is not mitigation; it's just removing a powerful tool capable of
piercing the veil of our amazingly inconsistent data.
If we must shoot ourselves in the foot, hopefully temporarily until we can
find some actual way to supply the resources required by these hugely
powerful options, can we limit it to one foot? I'd vote fairly
enthusiastically for keeping the taxonomy option.
1. It performs better/times out less than geography. From a recent
timeout, Branta hutchinsii works in about 3 seconds, Colorado takes almost
a minute.
2. It gets more usage than geography. I can't immediately quantify
that, but I think it does...
3. Chances of finding what you're looking for are really good with it,
and really bad without it. There are a finite-ish number of ways of saying
"goose" and Arctos contains most of them. There are an infinite number of
ways of describing places, and the webservices that drive anygeog only find
some of them.
4. Without it, there's little point in proceeding with #2057
<#2057> or #1732
<#1732>, and not much benefit
from all the cleanup/WoRMS/Paleobiodb/Species+/etc. data we've recently
drug in.
And from that last point, maybe we can shoot ourselves in that one foot
with a slightly smaller gun by implementing a "any curatorially-asserted
geography" option, which would somewhat alleviate the issue @AJLinn
<https://github.com/AJLinn> brought up
<#1755 (comment)>?
That's maybe 10% of the functionality of the current thing, but at least
it'll find "fairbanks" spread across specloc/verbloc/quad/county without
searching 4 times.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1755 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADQ7JBDUAR2VPYN457BZIETPSHAHVANCNFSM4F6Z5UCQ>
.
|
I'll throw together a "local place-stuff" search and see what happens.
That still leaves you guessing what we've done with "Fairbanks." (Actual answer: little of this, little of that, ...)
Yes, that's part of where the "any" fields came from. Our internal inconsistencies don't much help.
Use local data, of which they have much less - eg, they have 342,441 "taxonomy rows" and we have 235,653,300. AFAIK they also have a limited locality model - their data are capable of being flattened, ours are not. And I think they overall have much more DBA support; I don't know details, but I suspect if they think they need a tune-up they just get a tune-up.
It's our interface from a few years ago??? |
I tried something fairly unorthodox, and the 'any' options seem to be working and performing reasonably well. Assuming nothing melts as people start making updates, I think this is functional for now. We can talk about reintroducing service-derived terms after we get to 12, assuming we get a functional GMaps API key. Feel free to try to break it, and let me know details if you do. This does nothing for adding attributes-and-such, it's only the 'any' search options. |
So the combined ANY TAXON and ANY GEOGRAPHY search that timed out yesterday works today within a couple seconds, using an exact specific locality and an exact taxon name ("Falco sparverius and Roswell"). Changing the geography parameters to "New Mexico" as higher geog takes much longer but eventually yields the 201 results on public view. Changing to a common name "rabbit" also worked, although I got "rabbitfootgrass". I guess we need to work on our common names. |
Maybe we need to do better with the tooltips? If you KNOW what you're looking for and where it is, the 'any' options are both inefficient and likely to find some stuff you don't care about. Roswell (AFAIK) is just a placename that'll reliably end up in specloc, where Fairbanks is at least 3 things county, quad, specloc). I have no idea how to make that clear in something short enough a user might actually read it. I'm not sure if I can use the current approach to search for "is" or not - it's converting a bunch of stuff to a big concatenation and that to a document which is then indexed, which for some inexplicable reason the Oracle-fairies find easier to navigate than any 'normal' solution. I can look closer if you want.... Right now, the taxonomy search options are limited to 'anything' and 'ranked terms in the collection's preferred classification." There's no reason (other than trying to get users to understand it) that we couldn't add the ability to search WHATEVER - 'family according to GenBank' or 'family according to someone' or 'common name is exactly' or WHATEVER. New Issue...
New Issue as well - short answer is 'yes' but it may be less-trivial than I'd like. |
This issue is continuing to cause problems. Any hope that migration to Postgres and collaboration with TACC will solve the timeouts using Any Geography and Any Taxon? Name: Kirsten Nicholson |
Yes |
Yah I responded to Kirten at length about this today. Even conducting
searches on known locations (country is Canada), all collections, for genus
is Ursus gave her time outs.
So not even using the Any Geography and Any taxa made everything die but
having large searches caused crashing.
Could this be because we have more collections now then previously, or just
more people using Arctos at one time?
Anyway, I agree, remove these 2 boxes, or put a disclaimer next to them!!
…On Mon, Oct 7, 2019 at 4:38 PM Mariel Campbell ***@***.***> wrote:
This issue is continuing to cause problems. Any hope that migration to
Postgres and collaboration with TACC will solve the timeouts using Any
Geography and Any Taxon?
If not, I ask again if we can please delete these search boxes or hide
them from the public.
Name: Kirsten Nicholson
Email: ***@***.***
Message: No matter what search I do, it kicks me out and gives me the
following error message: [Macromedia][Oracle JDBC Driver]Execution timeout
expired. Please help! We need the database for our teaching labs!!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1755?email_source=notifications&email_token=ABCJF4N3WTIAGIGVVGOQ4QLQNPCBDA5CNFSM4F6Z5UC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEASE2MA#issuecomment-539249968>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABCJF4MLE4JGMWFFPFGD7E3QNPCBDANCNFSM4F6Z5UCQ>
.
--
Carol L. Spencer, Ph.D.
Staff Curator of Herpetology & Researcher
Museum of Vertebrate Zoology
3101 Valley Life Sciences Building
University of California, Berkeley, CA, USA 94720-3160
atrox10@gmail.com or atrox@berkeley.edu
510-643-5778
http://mvz.berkeley.edu/
|
Little of all that, and some other stuff.
My arguments for keeping the 'any...' options are the same as always: we have inconsistent data and they provide the only realistic pathway to it, as frustrating as the timeouts can be. PG+TACC, at least in theory, should fix the performance issues. |
Could we move the 'any' option boxes to the bottom of the Taxonomy and Locality sections and push up the Identification option for Taxonomy, and perhaps push up three options in the collapsed form: State/Province, County, and Specific Locality for Locality? |
The form is pretty easy to manipulate, we can move whatever wherever. Whether we should or not is the bigger question. I see compelling arguments for both cases from here. The vast majority of complaints come from users who also have things like Attributes turned on. Maybe we should instead limit results for non-Operators to only columns in FLAT? |
I don't like limiting search options because I need to be able to get
attributes from collections aI don't have access too. Also, we are trying
to promote Arctos as a source of trait data and a powerful research grade
search tool, which it is not if put limits on public searches. Otherwise,
just go to GBIF.
I strongly support moving the boxes to less visible. We look bad Everytime
people search and the search fails. I keep promoting search tools at
meetings, and then no one can make them work without a personal Arctos
coach.
…On Tue, Oct 8, 2019, 12:53 PM dustymc ***@***.***> wrote:
The form is pretty easy to manipulate, we can move whatever wherever.
Whether we should or not is the bigger question. I see compelling
arguments for both cases from here.
The vast majority of complaints come from users who also have things like
Attributes turned on. Maybe we should instead limit results for
non-Operators to only columns in FLAT?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1755?email_source=notifications&email_token=ADQ7JBCVITHJK3HZPK4KWBDQNTJLPA5CNFSM4F6Z5UC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAVHDFY#issuecomment-539652503>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADQ7JBFPY7DPSMM7WH7V3GDQNTJLPANCNFSM4F6Z5UCQ>
.
|
I agree with Mariel. |
Objections? @DerekSikes ? Does anyone have a way to partially mitigate Derek's concerns? If we do move those out as default, I'll need a replacement. In light of the (internal??) 'Arctos is so complicated' thing I think a single replacement, so we don't go over 8 fields, is necessary. |
Default should be (or at least the top row)
Identification
Include previous IDs? Current ID only Include all IDs Match Type starts
with is (case insensitive) does not contain contains comma-list comma-list
(substring)
and modified from Elizabeth's request
Country, State/Province, County, and Specific Locality for Locality
Why can MCZbase do the Any options but we can't?
…On Wed, Oct 9, 2019 at 8:27 AM dustymc ***@***.***> wrote:
Objections? @DerekSikes <https://github.com/DerekSikes> ? Does anyone
have a way to partially mitigate Derek's concerns?
If we do move those out as default, I'll need a replacement. In light of
the (internal??) 'Arctos is so complicated' thing I think a single
replacement, so we don't go over 8 fields, is necessary.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1755?email_source=notifications&email_token=ADQ7JBFGTQP6KS4AEVQEZ7TQNXS6ZA5CNFSM4F6Z5UC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAYCMDY#issuecomment-540026383>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADQ7JBDZMT6J6LDIJYRTN6DQNXS6ZANCNFSM4F6Z5UCQ>
.
|
Mariel - were you having timeout issues today? So was I. I was trying to search for all records I just loaded in an accession, 1615 records so not that huge. Timed out multiple times before finally working. |
@ccicero I see that in the logs, the query is being slowed down by a lot of attributes and part location and such. If you can trim that up, it'll make things work better. It starts returning data in under a second, but then takes just over a minute to complete. There's also still a fair bit of Chinese traffic today - you might have been competing with that. As far as I know MCZbase has MUCH simpler data - I don't think they consume globalnames or locality webservices. |
Actually, I added a lot of those attributes to my custom results AFTER I got it working because I needed to download them. Will remove in the future, but I don't think that was it. I'll blame it on the Chinese :) |
Thanks - a cheap query failing but then working in a more expensive form later is definitely a sign of things being momentarily overwhelmed. FYI your you+montana+perisoreous query with a ton of attributes on completes in 1:30 in Oracle - the system for which it's tuned, right now when there's basically only one other user (and they're not using much) - or 13 seconds in PG. We do have an actual solution, I just need some more focused time to see it through. |
13 seconds is good! Yay, Postgres!
…On Wed, Nov 20, 2019, 7:55 AM dustymc ***@***.***> wrote:
Thanks - a cheap query failing but then working in a more expensive form
later is definitely a sign of things being momentarily overwhelmed.
FYI your you+montana+perisoreous query with a ton of attributes on
completes in 1:30 in Oracle - the system for which it's tuned, right now
when there's basically only one other user (and they're not using much) -
or 13 seconds in PG. We do have an actual solution, I just need some more
focused time to see it through.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1755?email_source=notifications&email_token=ADQ7JBBL54UGI3YQQEYHKT3QUVFV3A5CNFSM4F6Z5UC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEESHSUI#issuecomment-556038481>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADQ7JBAN7GEYDMXDI37XS7LQUVFV3ANCNFSM4F6Z5UCQ>
.
|
Yes, yay!!! Thanks. |
One order of magnitude is perhaps a more useful way to look at that. Note also and that's an un-tuned single-box development server with the front-end running on a solar-powered calculator (OK not quite but still...). It's faster than our current setup, but more importantly it's much more scalable than our current setup. If we find our way back here again, we can do something about it that doesn't first involve many tens of thousands of dollars in licensing. |
The problem
Conducting searches using Any Taxon or Any Geography is resulting in timeout errors, flaming computers (#1564 ), and general curator crankiness. Majority of users (at MVZ anyway) are simply searching on a taxon and something (like Tissue flag, or county or island etc). They dont expand all the search options and dont get any results since the search times out. (we have already have eliminated browser, network, 3rd app interference as issues).
Describe the solution you'd like
Unlimited resources would help ( #857) but since we dont have that, Id like to suggest some high-priority fix to the interface for the short-term
Alternatives?
Open to suggestions-- anything that will reduce timeout errors. I am not sure what to do with Any Geography at all frankly (and how frequent this is the issue)
Thoughts on this since it would be a change to the default search interface?!
Priority
I would like to have this resolved by date: ASAP
The text was updated successfully, but these errors were encountered: