Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to mitigate current Timeout issues in searches #1755

Closed
mkoo opened this issue Oct 23, 2018 · 67 comments
Closed

How to mitigate current Timeout issues in searches #1755

mkoo opened this issue Oct 23, 2018 · 67 comments
Labels
Display/Interface I don't like the way Arctos looks or it isn't working for me aesthetically. Function-SearchOrDownload Infrastructure-limited Issue which could be resolved, or more easily resolved, with additional computational power

Comments

@mkoo
Copy link
Member

mkoo commented Oct 23, 2018

The problem
Conducting searches using Any Taxon or Any Geography is resulting in timeout errors, flaming computers (#1564 ), and general curator crankiness. Majority of users (at MVZ anyway) are simply searching on a taxon and something (like Tissue flag, or county or island etc). They dont expand all the search options and dont get any results since the search times out. (we have already have eliminated browser, network, 3rd app interference as issues).

Describe the solution you'd like

Unlimited resources would help ( #857) but since we dont have that, Id like to suggest some high-priority fix to the interface for the short-term

  • Take Any Taxon field and move to anywhere below the 'fold', that is, you need to click Show more options to reveal. Replace with the Identification box or Genus, Species (maybe Family?) boxes or simply the DwC field ScientificName box to imply that one should use these fields to conduct majority of fast option searches. True, you may not get everything you want but we can add a caveat message to suggest what a search on Identification As implies.

Alternatives?
Open to suggestions-- anything that will reduce timeout errors. I am not sure what to do with Any Geography at all frankly (and how frequent this is the issue)
Thoughts on this since it would be a change to the default search interface?!

Priority
I would like to have this resolved by date: ASAP

@mkoo mkoo added Priority-Critical (Arctos is broken) Critical because it is breaking functionality. Function-SearchOrDownload Display/Interface I don't like the way Arctos looks or it isn't working for me aesthetically. labels Oct 23, 2018
@dustymc
Copy link
Contributor

dustymc commented Oct 23, 2018

The flip side of this is that identification is unpredictable.

http://arctos.database.museum/SpecimenResults.cfm?scientific_name=Lithobates%20sylvaticus&scientific_name_scope=currentID&scientific_name_match_type=startswith

(200 woodfrogs) and

http://arctos.database.museum/SpecimenResults.cfm?scientific_name=Rana%20sylvatica&scientific_name_scope=currentID&scientific_name_match_type=startswith

(500 woodfrogs) are not discoverable by any one ID search, where

http://arctos.database.museum/SpecimenResults.cfm?taxon_name=Rana%20sylvatica

finds them all - when it doesn't time out.

Locality data are very similar (perhaps even less predictable) - the "any geog" term matches anything (including data from external sources) but is slow, specific fields are unpredictable (less so for Curators, possibly) but fast. Something I was looking at this AM looks like this...

screen shot 2018-10-23 at 12 50 49 pm

so isn't very discoverable by locality data.

@dustymc dustymc added the Infrastructure-limited Issue which could be resolved, or more easily resolved, with additional computational power label Oct 23, 2018
@DerekSikes
Copy link

We moved the 'anytaxon' field to the top so the general public would be more likely to find what they were looking for - many would type in a higher taxon like a family name (eg "Silphidae") into what was the scientific name/identification field and be surprised that they were only finding specimens that were identified to family and no members of that family (id'd to species for example).

Solution: have a different looking search form for logged-in users than the search form that the public gets when they aren't logged in?

@dustymc
Copy link
Contributor

dustymc commented Oct 23, 2018

I think the user that spawned this was "us" and I recently dealt with something similar (but completely different!) involving a "not-us" grad student who was looking for a genus in "any taxon."

Not to hijack this Issue TOO much, but the obvious solution is to throw hardware at this. I've also been dealing with various things, both internal (eg, part display, media access) and external (our "volunteers" at VertNet are restless) that could greatly benefit from an Oracle upgrade. Is there some way we can swing a major infrastructure upgrade?

@campmlc
Copy link

campmlc commented Oct 23, 2018 via email

@mkoo
Copy link
Member Author

mkoo commented Oct 23, 2018

What's the hardware needed and let's get some estimates! Should we get Chris/TACC on this discussion?

Otherwise if we leave Any Taxon as the default, are there tweaks to what it is searching on to avoid the timeout error. Because that is the issue and not doing anything isnt an option

@dustymc
Copy link
Contributor

dustymc commented Oct 23, 2018

Space isn't the problem, processors are.

The answer to "how much?" is always "more!" CA + Mammalia is the intersection of two ridiculously large caches. There are some organizational things we could do to make that faster (stripe both the data and the indexes across multiple independent disks that can be read simultaneously, for example), but it's mostly a matter or raw processors - we ultimately still have to sort through a ~half-billion rows to come up with an answer.

There's also a bit of a positive feedback loop here. With more processors I could spend less time dealing with performance issues and more doing cool stuff with data, which would all take more resources. I'm fairly confident that I can find a use for just about any number of processors!

@campmlc
Copy link

campmlc commented Oct 23, 2018 via email

@dustymc
Copy link
Contributor

dustymc commented Oct 23, 2018

with TACC?

Possibly even to TACC?? I don't know how any of that works.

@mkoo
Copy link
Member Author

mkoo commented Oct 23, 2018 via email

@dustymc
Copy link
Contributor

dustymc commented Oct 23, 2018

I have no idea how to answer that. The box I can see is running 8 E5-2660s, but I don't know if that's what's actually available to Oracle or not - I think there's a bunch of VM magic somewhere in the stack.

I suppose "giddy" would be something like Amazon's auto scaling, where it can pull from some ridiculous number of processors as needed. TACC is probably in the best position to provide an answer to what's possible/realistic.

@ccicero
Copy link

ccicero commented Dec 18, 2018

Per email 18 Dec 2018: Simple (temp?) solution: Move the "Search Any..." fields for geography and identification to the bottom of the set of fields in that block, and add a warning message "May time out - try a more specific search in the above fields"

@dustymc dustymc added this to the Needs Discussion milestone Dec 18, 2018
@dustymc
Copy link
Contributor

dustymc commented Dec 18, 2018

There is significant functionality in any_geog (and any_taxon) that is not accessible in any other way. I'm hesitant to hide them, even if they do cause a flurry of "Arctos is broken!!" emails 3 or 4 times per year.

Less than 10% of registered users (and 0% of public users) have the "Locality" pane expanded by default. I think moving it and removing it are about equivalent for "not-us," and "we" (who created these data) are the only ones who have much of a chance of finding anything without the 'anything' option.

@mkoo proposed replacing any_taxon with scientificname (eg, current identification).

What is the proposed replacement for any_geog?

@AJLinn
Copy link

AJLinn commented Dec 18, 2018

There is significant functionality in any_geog (and any_taxon) that is not accessible in any other way.

I use the any_geography field ALL the time when searching for ethnological and historical items as we rarely have our collections geolocated (getting more each day) - this is the main way I cast a wide net. If I get too many, then I move to Specific Locality. Our users would certainly have trouble finding the right way to look up Alaska village names if this was dropped in the hierarchy.

@amgunderson
Copy link
Contributor

amgunderson commented Dec 18, 2018 via email

@dustymc
Copy link
Contributor

dustymc commented Dec 18, 2018

@amgunderson I dug one of your queries (Vulpes and coordinates) out of the logs.

As I found it, execution time was 00:01:21.31 which will time the UI out.

With last_part_location removed it runs in 00:00:01.12 which should easily complete.

I don't think we can cache part detail due to datatype limitations, which are currently 4KB. Supposedly Oracle 12 ups that to 32KB, which probably is enough for a cache. 12 also allegedly "speaks" JSON, so we could potentially do more with the cache, and it's probably cheaper to build it.

This seems to be one of a growing list of things that an upgrade would make easy to solve, even if it doesn't directly solve it by just performing better.

@ewommack
Copy link

While we figure this out, maybe we should think about changing the explanation for the field, or adding suggestions?
Something about if your query produces and error message, please try adding more specific info to the field. Or if using both the Any Taxon and Any Geographic Feature fields returns an error message, please try using one of the more specific geography fields for your search?

@dustymc
Copy link
Contributor

dustymc commented Feb 27, 2019

I tried some stuff from the recent timeout logs, there seem to be two major categories.

Many of the queries complete well under the timeout limit - the performance of the DB is not the only factor here, it probably has to do with other queries/activity. This seems like something an Oracle performance consultant could fix - eg, it likely depends on things like the physical arrangement of disks.

There are a several timeouts caused by lots of attributes and similarly-arranged data (#1755 (comment)). We could reduce the complexity of the data public users view (#857), reduce the number of attributes (#1623), cache attributes individually in FLAT (would be a fairly significant maintenance load increase, I'm not 100% sure it'll fit anyway), or possibly exploit Oracle 12's increased VARCHAR size and JSON capabilities (if we ever get migrated to 12!) to cache all attributes as a data object.

@ewommack there's some discussion of the timeout error message (and other relevant stuff) at #1564. A fair number of these queries do seem to be "us" and "we," very much unlike public users, should know how we've organized the data and therefore how to find them without using the "any" options.

@campmlc
Copy link

campmlc commented Feb 27, 2019 via email

@dustymc
Copy link
Contributor

dustymc commented Feb 27, 2019

public users who don't know to click on see more options get timeouts and give up, and then have a bad impression of our interface.

I completely agree, but that's only half the story: we have to balance that against "public users find half of our woodfrogs (none of our Silphidae, etc., etc., etc.) and give up, and then have a bad impression of our interface."

a very obscure solution

See #1564 - I don't think it's particularly obscure, but I'm definitely up for ideas re: how to make it more obvious. I can do that at any time.

Maybe something like Emily's awesome tutorial (http://handbook.arctosdb.org/how_to/How-to-Create-Agents.html#tutorial-video) would be useful here.

Oracle 12

TACC is working on migrating TEST to 12. I won't completely know how we might exploit those capabilities until I have a working copy.

actionable

We could restrict certain "results fields" to Operators - it would take some development, but I don't think it would be overly difficult. I dislike this - making the same data available to everyone should be a priority. Perhaps we could somewhat mitigate that by just providing all attributes as JSON instead of individual "columns" but that will fail for some records. (Oracle 12 will fix much, maybe even all, of that as well.)

We can discuss the number of attributes=="result columns" - #1623.

We could cache individual attributes in columns, but I'd prefer to play with Oracle12 first.

We could pay for an Oracle consultant at any time.

We could discuss general performance with TACC - eg, maybe the queries that time out in the UI and work for me are related to network/webserver/sunspots and not Oracle as all. (Seems a bit unlikely, but still worth investigating.)

Sometimes specific queries can be tuned - I'm happy to take a look if you want to pass any known problems on.

@dustymc
Copy link
Contributor

dustymc commented Feb 27, 2019

Expensive columns on for >200 users are:

  • weight
  • age
  • age class
  • total length
  • tail length
  • reproductive data
  • hind foot with claw

That is not necessarily most traffic - users with rarely-used columns may perform most queries, public users may turn things on, etc - but I think it's reasonably close.

Adding only those to FLAT might offload enough processing to made a difference, and shouldn't be too much work/maintenance.

@campmlc
Copy link

campmlc commented Feb 27, 2019 via email

@dustymc
Copy link
Contributor

dustymc commented Feb 27, 2019

Yes, essentially that. Attributes can't be smooshed into spreadsheets very readily (there might be 20 sex determinations) so they're pulled into results by calling a procedure which smooshes them into something that'll fit in a table cell. That's fairly easy to write code to and there's not much complexity involved, but the procedure is expensive and runs for every attribute in every row of every search result (that includes an attribute).

FLAT caches those data as text when they're created/changed. The computational costs are essentially pre-paid. The remaining costs are (mostly) just in maintenance and storage - the cache uses ~3 times as much disk (which I don't think is much of a problem), and there's a bunch of code to maintain. Adding a couple hundred attributes that change all the time to the cache just isn't something that I think we can afford. Adding the half-dozen that are used the most and probably won't ever change seems more realistic.

It looks to me like most of the problem queries are "us." Geography-cache queries usually work, taxonomy-cache queries usually work, combinations of them sorta-usually work, any of that combined with a bunch of curatorial-type attributes fails. State (or county or whatever) and/or identification and a bunch of curatorial-type attributes usually works. Can we come up with a "welcome to Arctos, don't use the 'any' things" tutorial to send out with operator invitations or something??

@dustymc
Copy link
Contributor

dustymc commented Feb 28, 2019

I just dug through a bunch of timeouts from the last ~48 hours. Data from the logs and some comments below.

The attribute thing does not seem to be a major factor in this. There may be other reasons to be more aggressive in caching it, but that won't fix most of these.

Some (maybe most) of these seem perfectly appropriate to me - the user did exactly what I'd have done (eg, because the taxonomy is waffly or it's a "minor" geography term that might be anywhere).

Others would have been better able to answer the question (or what I guess the question was) using "normal" terms. any geog "colorado"

 select higher_geog from geog_auth_rec where GEOG_AUTH_REC_ID in (select GEOG_AUTH_REC_ID from cache_anygeog where GEOSTRING like '%COLORADO%') group by higher_geog order by higher_geog;

returns 415 rows from places like...

North America, United States, Wyoming, Sweetwater County, Colorado River
North America, United States, Wyoming, Uinta County, Colorado River
South America
South America, Argentina
South America, Argentina, Cordoba
South America, Argentina, Formosa
South America, Argentina, Neuquen
South America, Argentina, Rio Negro
South America, Argentina, Salta
South America, Argentina, Salta, Cafayate Department
South America, Bolivia, Potosi
South America, Brazil, Distrito Federal
South America, Brazil, Rondonia
South America, Chile, Araucania
South America, Colombia, Boyaca
South America, Ecuador
South America, Ecuador, Pichincha
South America, Ecuador, Santo Domingo
West Indies, Netherlands, Netherlands Antilles, Aruba Island Territory, Lesser Antilles, Aruba
West Indies, Trinidad and Tobago, Saint Patrick Parish, Lesser Antilles, Trinidad
West Indies, Trinidad and Tobago, Victoria Parish, Lesser Antilles, Trinidad
no higher geography recorded

for example - "stuff about which someone at some time used the string 'colorado'" - which is probably not what the user was looking for. We definitely need better documentation, and I'll play with more specific suggestions. ("that timed out - click here to try state_prov=colorado instead"). No clue how realistic that is yet.

Almost all of these completed in well under a minute. SOMETHING other than direct DB performance is a factor here. I'll ask TACC to check the webserver and such, but I suspect it's other things simultaneously going on in the DB and the solution comes back to an Oracle tuneup. (wild baseless guess: we need a disk dedicated to the temp tables these queries generate)


rawipaddress: 129.108.202.164,129.114.52.171

source: UT-EL-PASO

User: public

SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'GREAT BLUE HERON%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'GREAT BLUE HERON%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'GREAT BLUE HERON%' )

Execution Time: Elapsed: 00:00:02.08

Suggestion: none, if the query was for "things that have been called great blue heron"

Problem: other stuff was busy??


rawipaddress: 50.236.76.57,129.114.52.171

Comcast

username: (academic non-operator)

SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters,concatAttributeValue(filtered_flat.collection_object_id,'total length') total_length,concatAttributeValue(filtered_flat.collection_object_id,'weight') weight FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN specimen_event segc ON (filtered_flat.collection_object_id = segc.collection_object_id) INNER JOIN cache_anygeog ON (segc.specimen_event_id = cache_anygeog.specimen_event_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'TAMIAS MINIMUS%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'TAMIAS MINIMUS%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'TAMIAS MINIMUS%' ) AND filtered_flat.collection_id IN ( select collection_id from collection where upper(guid_prefix) IN ('DMNS:MAMM') ) AND cache_anygeog.geostring like '%COLORADO%'

Execution Time: Elapsed: 00:00:44.29

Suggestion: use state_prov (if that's the Colorado in question) and identified as (if that's the scope)

Problem: 2 expensive terms + other stuff was busy??


rawipaddress: 164.64.199.11,129.114.52.171

NM-STATE-GOVERNMENT

SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN specimen_event segc ON (filtered_flat.collection_object_id = segc.collection_object_id) INNER JOIN cache_anygeog ON (segc.specimen_event_id = cache_anygeog.specimen_event_id) WHERE filtered_flat.collection_object_id IS NOT NULL AND cache_anygeog.geostring like '%VALENCIA%'
;

Elapsed: 00:00:10.23

Suggestion: none, it's hard to say what "Valencia" is or where we might have entered it

problem: other stuff was busy??


rawipaddress: 136.152.143.22,129.114.52.171
University of California at Berkeley (UCAB-1)

SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN specimen_event segc ON (filtered_flat.collection_object_id = segc.collection_object_id) INNER JOIN cache_anygeog ON (segc.specimen_event_id = cache_anygeog.specimen_event_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'MULTIFASCIATA%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'MULTIFASCIATA%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'MULTIFASCIATA%' ) AND filtered_flat.collection_id IN ( select collection_id from collection where upper(guid_prefix) IN ('MVZ:HERP') ) AND cache_anygeog.geostring like '%CHINA%'

Elapsed: 00:00:30.68

suggestion:

  1. use country
  2. not clear what 'MULTIFASCIATA' refers to - probably needs either a beginning % or more in identified-as?

rawipaddress: 129.108.81.21,129.114.52.171
UT-EL-PASO

SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN specimen_event segc ON (filtered_flat.collection_object_id = segc.collection_object_id) INNER JOIN cache_anygeog ON (segc.specimen_event_id = cache_anygeog.specimen_event_id) WHERE filtered_flat.collection_object_id IS NOT NULL AND filtered_flat.collection_id IN ( select collection_id from collection where upper(guid_prefix) IN ('UTEP:HERB') ) AND cache_anygeog.geostring like '%IMPERIAL COUNTY%'

Elapsed: 00:00:07.61

suggestion: use county


rawipaddress: 187.189.107.207,129.114.52.171
MX-TPTE-LACNIC (mexico)

SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'APHELOCOMA WOODHOUSEII%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'APHELOCOMA WOODHOUSEII%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'APHELOCOMA WOODHOUSEII%' )

Elapsed: 00:00:02.16

suggestion: MAYBE use identifiedas, but this is probably what I'd have done


rawipaddress: 201.108.154.176,129.114.52.171
MX-USCV4-LACNIC (mexico)

SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN specimen_event segc ON (filtered_flat.collection_object_id = segc.collection_object_id) INNER JOIN cache_anygeog ON (segc.specimen_event_id = cache_anygeog.specimen_event_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'PUMA%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'PUMA%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'PUMA%' ) AND cache_anygeog.geostring like '%MEXICO %'

Elapsed: 00:00:16.77

suggestion:

  1. MAYBE use identifiedas, but this is probably what I'd have done
  2. use country

rawipaddress: 164.64.199.11,129.114.52.171
NM-STATE-GOVERNMENT

SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN specimen_event segc ON (filtered_flat.collection_object_id = segc.collection_object_id) INNER JOIN cache_anygeog ON (segc.specimen_event_id = cache_anygeog.specimen_event_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'MEPHITIS MACROURA%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'MEPHITIS MACROURA%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'MEPHITIS MACROURA%' ) AND cache_anygeog.geostring like '%NEW MEXICO%';

Elapsed: 00:00:11.54

suggestion: use identifiedas and state


rawipaddress: 164.64.199.11,129.114.52.171
NM-STATE-GOVERNMENT

SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN specimen_event segc ON (filtered_flat.collection_object_id = segc.collection_object_id) INNER JOIN cache_anygeog ON (segc.specimen_event_id = cache_anygeog.specimen_event_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'NYCTINOMOPS%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'NYCTINOMOPS%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'NYCTINOMOPS%' ) AND cache_anygeog.geostring like '%NEW MEXICO%'

Elapsed: 00:00:12.10

suggestion: use identifiedas and state


rawipaddress: 129.72.142.82,129.114.52.171
UWYO

SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN specimen_event segc ON (filtered_flat.collection_object_id = segc.collection_object_id) INNER JOIN cache_anygeog ON (segc.specimen_event_id = cache_anygeog.specimen_event_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'FALCO%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'FALCO%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'FALCO%' ) AND cache_anygeog.geostring like '%WYOMING%'

Elapsed: 00:00:16.91

suggestion: use identifiedas and state


rawipaddress: 192.12.184.6,129.114.52.171
Los Alamos National Laboratory (LANL-2)
username: D*********1@gmail.com

SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.county county,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,getMonthCollected(filtered_flat.began_date,filtered_flat.ended_date) month_collected,filtered_flat.year year_collected,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters,concatAttributeValue(filtered_flat.collection_object_id,'sex') sex,concatAttributeValue(filtered_flat.collection_object_id,'weight') weight FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) WHERE filtered_flat.collection_object_id IS NOT NULL AND filtered_flat.cataloged_item_type='specimen' AND filtered_flat.collection_id IN ( select collection_id from collection where upper(guid_prefix) IN ('ALMNH:ES','ASNHC:MAMM','ASNHC:HERB','STAR:ALG','ASUMZ:HERP','ASUMZ:INVERT','ASUMZ:BIRD','ASUMZ:BIVALVE','ASUMZ:FISH','STAR:HERB','ASUMZ:ENTO','ASUMZ:MAMM','APSU:HERP','APSU:BIRD','APSU:FISH','APSU:MAMM','UWBM:HERP','UWBM:MAMM','CHAS:HERP','CHAS:EGG','CHAS:BIRD','CHAS:ES','CHAS:EH','CHAS:FISH','CHAS:ENTO','CHAS:MAMM','CHAS:INV','CHAS:HERB','CHAS:TEACH','COA:HERP','COA:EGG','COA:BIRD','COA:ENTO','COA:MAMM','DMNS:HERP','DMNS:EGG','DMNS:BIRD','DMNS:MAMM','DMNS:INV','DMNS:PARA','HWML:PARA','KNWROBS:FISH','KNWR:ENTO','KNWR:INV','KNWROBS:MAMM','KNWROBS:HERB','KNWR:HERB','KWP:ENTO','UCSC:HERP','UCSC:BIRD','UCSC:FISH','UCSC:MAMM','MLZ:EGG','MLZ:BIRD','MLZ:MAMM','MSB:HERP','MSB:BIRD','MSB:FISH','MSB:HOST','MSBOBS:MAMM','MSB:MAMM','MSB:PARA','MVZOBS:HERP','MVZ:HERP','MVZ:HILD','MVZ:EGG','MVZOBS:BIRD','MVZ:BIRD','MVZOBS:MAMM','MVZ:MAMM','UMNH:HERP','UMNH:BIRD','UMNH:MAMM','NMU:HERP','NMU:BIRD','NMU:MAMM','NMU:PARA','OWU:AMPH','OWU:BIRD','OWU:ENTO','OWU:EH','OWU:FISH','OWU:GEOL','OWU:HERB','OWU:INV','OWU:MAMM','OWU:ES','OWU:PARA','OWU:REPT','NBSB:BIRD','USNPC:PARA','UAM:ALG','UAM:HERP','UAM:ARC','UAMOBS:BIRD','UAM:BIRD','UAMB:HERB','UAM:ES','UAM:ENV','UAM:EH','UAMOBS:FISH','UAM:FISH','UAMOBS:ENTO','UAM:ENTO','UAM:INV','UAMOBS:MAMM','UAM:MAMM','UAM:HERB','UC-SCFS:BIRD','UC-SCFS:MAMM','UCM:HERP','UCM:EGG','UCM:BIRD','UCM:FISH','UCM:MAMM','UCM:OBS','UMZM:BIRD','UMZM:MAMM','UNR:HERP','UNR:EGG','UNR:BIRD','UNR:FISH','UNR:MAMM','UNM:ES','UTEPOBS:HERP','UTEP:HERPOS','UTEP:HERP','UTEP:ARC','UTEP:BIRD','UTEP:ES','UTEP:FISH','UTEPOBS:ENTO','UTEP:ENTO','UTEP:INV','UTEP:MAMM','UTEP:HERB','UTEP:TEACH','UTEP:ZOO','UWYMV:HERP','UWYMV:BIRD','UWYMV:FISH','UWYMV:MAMM','WNMU:BIRD','WNMU:FISH','WNMU:MAMM') ) AND upper(filtered_flat.phylorder) like 'RODENTIA%' AND UPPER(filtered_flat.state_prov) LIKE '%NEW MEXICO%'

89159 rows selected.

Elapsed: 00:01:04.39

suggestion: use class and state
NOTE: has attributes


rawipaddress: 98.238.128.19,129.114.52.171

SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'MUS%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'MUS%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'MUS%' )

DLM (successfully!) trying to break it


rawipaddress: 136.152.143.41,129.114.52.171
University of California at Berkeley (UCAB-1)
username: operator

SELECT distinct flat.collection_object_id,flat.guid guid,flat.scientific_name scientific_name,flat.country country,flat.state_prov state_prov,flat.spec_locality spec_locality,flat.verbatim_date verbatim_date,flat.dec_lat dec_lat,flat.dec_long dec_long,flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM flat WHERE flat.collection_object_id IS NOT NULL and flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'AGELAIUS PHOENICEUS MALLIARDORUM%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'AGELAIUS PHOENICEUS MALLIARDORUM%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'AGELAIUS PHOENICEUS MALLIARDORUM%' ) AND flat.collection_id IN ( select collection_id from collection where upper(guid_prefix) IN ('MVZ:BIRD') ) AND upper(PARTS) LIKE '%SKIN%'

no rows selected

Elapsed: 00:01:06.63

suggestion: use identifiedas


rawipaddress: 131.216.46.100,129.114.52.171
Nevada System of Higher Education (NEVAD-9)

SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN fake_coordinate_error ON (filtered_flat.locality_id = fake_coordinate_error.locality_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'MAMMAL%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'MAMMAL%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'MAMMAL%' ) AND ( ( 36.23007635975191 between fake_coordinate_error.swlat and fake_coordinate_error.nelat OR 34.953528210949415 between fake_coordinate_error.swlat and fake_coordinate_error.nelat OR fake_coordinate_error.swlat between 34.953528210949415 and 36.23007635975191 OR fake_coordinate_error.nelat between 34.953528210949415 and 36.23007635975191 ) AND ( -115.437744140625 between fake_coordinate_error.swlong and fake_coordinate_error.nelong OR -114.49374389648438 between fake_coordinate_error.swlong and fake_coordinate_error.nelong OR fake_coordinate_error.swlong between -115.437744140625 and -114.49374389648438 OR fake_coordinate_error.nelong between -115.437744140625 and -114.49374389648438 ) )

1186 rows selected.

Elapsed: 00:00:04.02

suggestion: use class

@ewommack
Copy link

ewommack commented Feb 28, 2019 via email

@Jegelewicz
Copy link
Member

I'll play with more specific suggestions. ("that timed out - click here to try state_prov=colorado instead")

YAY!!!

NM-STATE-GOVERNMENT - was probably looking for Valencia County. Using the county field does it. Not sure how to direct people there other than what we do already?

WOW - LANL is searching our stuff? Wonder what they are doing with the data? Interesting!

@dustymc
Copy link
Contributor

dustymc commented Apr 24, 2019

I did what I can for getLatestPartLocation - I doubt you'll see a real difference, but it's slightly more efficient than it used to be....

@dustymc dustymc reopened this Apr 24, 2019
@marecaguthrie
Copy link

Thanks Dusty! I didn't realize I closed this issue by accident. Still feeling pretty new to GitHub.

@campmlc
Copy link

campmlc commented Apr 25, 2019

As we are going to be presenting on Arctos next month at the SPNHC meeting, this is now becoming an even more critical issue in need of resolution. We absolutely must get rid of the any taxon element and any geographic element search fields, or hide them somewhere very deep and put big red warnings on them. Right now, anyone going to our search page will use those two boxes, and they will get a visually nasty and discouraging red letter message of failure. This is not how we reach the public or serve our own members. I just searched on a single species name and a single locality term for a record I know exists, and got the timeout. I put the same search terms in the identification and specific locality boxes and got results within a second. Replace those boxes with Identification and Locality fields, please.

@campmlc
Copy link

campmlc commented Apr 25, 2019

And, to clarify, I was searching as a public user.
@dustymc

@dustymc
Copy link
Contributor

dustymc commented Apr 25, 2019

Removal is not mitigation; it's just removing a powerful tool capable of piercing the veil of our amazingly inconsistent data.

If we must shoot ourselves in the foot, hopefully temporarily until we can find some actual way to supply the resources required by these hugely powerful options, can we limit it to one foot? I'd vote fairly enthusiastically for keeping the taxonomy option.

  1. It performs better/times out less than geography. From a recent timeout, Branta hutchinsii works in about 3 seconds, Colorado takes almost a minute.
  2. It gets more usage than geography. I can't immediately quantify that, but I think it does...
  3. Chances of finding what you're looking for are really good with it, and really bad without it. There are a finite-ish number of ways of saying "goose" and Arctos contains most of them. There are an infinite number of ways of describing places, and the webservices that drive anygeog only find some of them.
  4. Without it, there's little point in proceeding with Getty Vocabularies as a Taxonomy Source #2057 or Nomenclature 4.0 for Man-Made Items #1732, and not much benefit from all the cleanup/WoRMS/Paleobiodb/Species+/etc. data we've recently drug in.

And from that last point, maybe we can shoot ourselves in that one foot with a slightly smaller gun by implementing a "any curatorially-asserted geography" option, which would somewhat alleviate the issue @AJLinn brought up? That's maybe 10% of the functionality of the current thing, but at least it'll find "fairbanks" spread across specloc/verbloc/quad/county without searching 4 times.

@campmlc
Copy link

campmlc commented Apr 25, 2019 via email

@dustymc
Copy link
Contributor

dustymc commented Apr 25, 2019

I'll throw together a "local place-stuff" search and see what happens.

only from specific or verbatim locality terms

That still leaves you guessing what we've done with "Fairbanks." (Actual answer: little of this, little of that, ...)

confuse people who think that means "California".

Yes, that's part of where the "any" fields came from. Our internal inconsistencies don't much help.

What does MCZ do?

Use local data, of which they have much less - eg, they have 342,441 "taxonomy rows" and we have 235,653,300.

AFAIK they also have a limited locality model - their data are capable of being flattened, ours are not.

And I think they overall have much more DBA support; I don't know details, but I suspect if they think they need a tune-up they just get a tune-up.

Their interface is simple and clear.

It's our interface from a few years ago???

@dustymc
Copy link
Contributor

dustymc commented Apr 26, 2019

I tried something fairly unorthodox, and the 'any' options seem to be working and performing reasonably well.

Assuming nothing melts as people start making updates, I think this is functional for now. We can talk about reintroducing service-derived terms after we get to 12, assuming we get a functional GMaps API key.

Feel free to try to break it, and let me know details if you do.

This does nothing for adding attributes-and-such, it's only the 'any' search options.

@campmlc
Copy link

campmlc commented Apr 26, 2019

So the combined ANY TAXON and ANY GEOGRAPHY search that timed out yesterday works today within a couple seconds, using an exact specific locality and an exact taxon name ("Falco sparverius and Roswell"). Changing the geography parameters to "New Mexico" as higher geog takes much longer but eventually yields the 201 results on public view. Changing to a common name "rabbit" also worked, although I got "rabbitfootgrass". I guess we need to work on our common names.
Logged in with my original two searches also worked, although both took about 12 seconds. Can we add a pop-up that says "Search in Progress" or something?

@dustymc
Copy link
Contributor

dustymc commented Apr 26, 2019

using an exact specific locality and an exact taxon name

Maybe we need to do better with the tooltips? If you KNOW what you're looking for and where it is, the 'any' options are both inefficient and likely to find some stuff you don't care about. Roswell (AFAIK) is just a placename that'll reliably end up in specloc, where Fairbanks is at least 3 things county, quad, specloc). I have no idea how to make that clear in something short enough a user might actually read it.

I'm not sure if I can use the current approach to search for "is" or not - it's converting a bunch of stuff to a big concatenation and that to a document which is then indexed, which for some inexplicable reason the Oracle-fairies find easier to navigate than any 'normal' solution. I can look closer if you want....

Right now, the taxonomy search options are limited to 'anything' and 'ranked terms in the collection's preferred classification." There's no reason (other than trying to get users to understand it) that we couldn't add the ability to search WHATEVER - 'family according to GenBank' or 'family according to someone' or 'common name is exactly' or WHATEVER. New Issue...

Search in Progress"

New Issue as well - short answer is 'yes' but it may be less-trivial than I'd like.

@campmlc
Copy link

campmlc commented Oct 7, 2019

This issue is continuing to cause problems. Any hope that migration to Postgres and collaboration with TACC will solve the timeouts using Any Geography and Any Taxon?
If not, I ask again if we can please delete these search boxes or hide them from the public.

Name: Kirsten Nicholson
Email: kirsten.nicholson@cmich.edu
Message: No matter what search I do, it kicks me out and gives me the following error message: [Macromedia][Oracle JDBC Driver]Execution timeout expired. Please help! We need the database for our teaching labs!!

@dustymc
Copy link
Contributor

dustymc commented Oct 7, 2019

hope that migration to Postgres

Yes

@atrox10
Copy link

atrox10 commented Oct 8, 2019 via email

@dustymc
Copy link
Contributor

dustymc commented Oct 8, 2019

Little of all that, and some other stuff.

  • There's a lot more data to sift through nowadays.
  • The data has a lot more locality-data than it used to.
  • There were a lot of users on this AM.
  • I made a big update that was running in the background (but still eating resources).
  • She had some attributes turned on. There's a 'disclaimer' for that:

Screen Shot 2019-10-07 at 6 37 32 PM

* All I can see suggests she was still using 'any geog,' which also does have a disclaimer

Screen Shot 2019-10-07 at 6 38 20 PM

My arguments for keeping the 'any...' options are the same as always: we have inconsistent data and they provide the only realistic pathway to it, as frustrating as the timeouts can be. PG+TACC, at least in theory, should fix the performance issues.

@ewommack
Copy link

ewommack commented Oct 8, 2019

Could we move the 'any' option boxes to the bottom of the Taxonomy and Locality sections and push up the Identification option for Taxonomy, and perhaps push up three options in the collapsed form: State/Province, County, and Specific Locality for Locality?

@dustymc
Copy link
Contributor

dustymc commented Oct 8, 2019

The form is pretty easy to manipulate, we can move whatever wherever.

Whether we should or not is the bigger question. I see compelling arguments for both cases from here.

The vast majority of complaints come from users who also have things like Attributes turned on. Maybe we should instead limit results for non-Operators to only columns in FLAT?

@campmlc
Copy link

campmlc commented Oct 9, 2019 via email

@ccicero
Copy link

ccicero commented Oct 9, 2019

I agree with Mariel.

@dustymc
Copy link
Contributor

dustymc commented Oct 9, 2019

Objections? @DerekSikes ? Does anyone have a way to partially mitigate Derek's concerns?

If we do move those out as default, I'll need a replacement. In light of the (internal??) 'Arctos is so complicated' thing I think a single replacement, so we don't go over 8 fields, is necessary.

@campmlc
Copy link

campmlc commented Nov 20, 2019 via email

@ccicero
Copy link

ccicero commented Nov 20, 2019

Mariel - were you having timeout issues today? So was I.

I was trying to search for all records I just loaded in an accession, 1615 records so not that huge. Timed out multiple times before finally working.

@dustymc
Copy link
Contributor

dustymc commented Nov 20, 2019

@ccicero I see that in the logs, the query is being slowed down by a lot of attributes and part location and such. If you can trim that up, it'll make things work better. It starts returning data in under a second, but then takes just over a minute to complete. There's also still a fair bit of Chinese traffic today - you might have been competing with that.

As far as I know MCZbase has MUCH simpler data - I don't think they consume globalnames or locality webservices.

@ccicero
Copy link

ccicero commented Nov 20, 2019

Actually, I added a lot of those attributes to my custom results AFTER I got it working because I needed to download them. Will remove in the future, but I don't think that was it. I'll blame it on the Chinese :)

@dustymc
Copy link
Contributor

dustymc commented Nov 20, 2019

Thanks - a cheap query failing but then working in a more expensive form later is definitely a sign of things being momentarily overwhelmed.

FYI your you+montana+perisoreous query with a ton of attributes on completes in 1:30 in Oracle - the system for which it's tuned, right now when there's basically only one other user (and they're not using much) - or 13 seconds in PG. We do have an actual solution, I just need some more focused time to see it through.

@campmlc
Copy link

campmlc commented Nov 20, 2019 via email

@ccicero
Copy link

ccicero commented Nov 20, 2019

Yes, yay!!! Thanks.

@dustymc
Copy link
Contributor

dustymc commented Nov 20, 2019

13 seconds

One order of magnitude is perhaps a more useful way to look at that.

Note also and that's an un-tuned single-box development server with the front-end running on a solar-powered calculator (OK not quite but still...). It's faster than our current setup, but more importantly it's much more scalable than our current setup. If we find our way back here again, we can do something about it that doesn't first involve many tens of thousands of dollars in licensing.

@dustymc dustymc closed this as completed Jul 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Display/Interface I don't like the way Arctos looks or it isn't working for me aesthetically. Function-SearchOrDownload Infrastructure-limited Issue which could be resolved, or more easily resolved, with additional computational power
Projects
None yet
Development

No branches or pull requests