Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema Admin via HTTP [JIRA: RIAK-1715] #10

Open
rzezeski opened this issue Oct 16, 2012 · 15 comments
Open

Schema Admin via HTTP [JIRA: RIAK-1715] #10

rzezeski opened this issue Oct 16, 2012 · 15 comments
Milestone

Comments

@rzezeski
Copy link
Contributor

Yokozuna should have the ability to create and modify schemas
remotely, HTTP for this specific issue.

There are still a lot of questions regarding this issue. The
fundamental ones are:

  • Can a running system cope with schema changes? If so, how can it be
    done safely?
  • Can schemas be modified piecemeal or must it be all-or-nothing?
  • Can JSON be used to read/modify/write the schema?
  • Should concurrent writers/siblings be accounted for? I'm a little
    less worried about concurrent writers in a healthy cluster and more
    worried about partitioned writes. Would PW/PR/DW/W/R=N be good
    enough?

Specification

  • The resource: <host-port>/yokozuna/schema/<schema-name>

GET

  • Return the schema with content type of application/xml.
  • TODO: allow to pick-out subset of schema to return, e.g. a list of
    fields?
  • TODO: allow to return in JSON format?

PUT

  • Accepts text/xml or application/xml.
  • The body is a properly formed Solr schema. See the
    example schema.
  • If the schema name already exists then don't replace the current
    one. Instead return an error to user stating it already exists.
    Need
    to be able to overwrite a schema in case a bad schema is uploaded.
  • TODO: Think about adding param overwrite=true to bypass the
    previous check allowing the user to overwrite the current schema
    definition. This has to be thought about carefully because changing
    schemas could cause issues.

POST

TODO: Think about allowing POSTs to add to or modify a subset of a
schema. E.g. adding a new field without read/modify/write of entire
schema.

DELETE

TODO: Do we allow deletes of schemas?

@abhinavsingh
Copy link

@rzezeski My 2 cents over above concerns:

  1. Yokozuna should explore the idea of adapting to schema changes i.e. running system will eventually cope up with schema changes all by itself
  2. There can also be a flag using which developer can decide whether Yokozuna should adapt to schema changes for old data by itself. If this flag is turned on, Yokozuna can do relevant reindexing job in the background
  3. Currently schema version tag is not utilised by riak search. Yokozuna can make use of these version strings and make relevant indexes available via different url paths e.g. /solr/1.0/select?q= and /solr/1.1/select?q=. Developers can enjoy consuming older indexes from /solr/1.0/select?q= and new indexes will be available via /solr/1.1/select?q= while they are being rebuilt
  4. Similarly there can be a way using which developer can cancel/revert back the schema changes being done. This will also stop/pause the background reindexing job
  5. Finally, if developer is happy with schema version 1.1, he can do the garbage collection job which cleans up indexes for version 1.0

Having said, these functionalities are bound to put some load on yokozuna clusters while reindexing job is on for a large number of documents in db.

@rzezeski
Copy link
Contributor Author

@abhinavsingh Very interesting ideas. Given the fact that Solr cores can be copied/swapped and the active anti-entropy sub-system will repair missing data perhaps this is a doable. There is much to think about here. I think the main issue can be done without considering your points, i.e. they are additions. I'll give it some thought and perhaps create some additional issues. At minimum, Yokozuna should strive to make schema migration not a pain in the ass.

@dreverri
Copy link
Contributor

This may not be acceptable to everyone but it seems immutable schemas might be acceptable if Yokozuna allowed for a bucket to have many indexes. If a new schema is needed, create a new index with a new immutable schema. AAE will take care of indexing old data. Developers can switch over to the new schema when AAE is done and drop the old index when appropriate.

@rzezeski
Copy link
Contributor Author

PR #42 addressed the basic concerns in this issue but there are still things that must be addressed. Pushing this issue back another release so it can continued to be iterated upon.

@rzezeski
Copy link
Contributor Author

It appears that Solr has been doing some work related to this issue.

SOLR-4503 allows fetching schema properties via HTTP.

SOLR-3251 would allow dynamic adding of fields.

SOLR-1147 would allow configuration of solrconfig.xml, which is a bit tangental from the schema but I thought I'd add it here anyways.

SOLR-791 would allow setting the schema during core creation. This doesn't really change anything from a Yokozuna user's perspective but would help with Yokozuna code. Wouldn't need to do direct file copying anymore.

@coderoshi
Copy link
Contributor

#58 is related to this issue, namely, the verification of an uploaded schema

@coderoshi
Copy link
Contributor

Should this still be labeled as a "must"? Many of the important items are either done, or undoable. +1 for closing.

@rzezeski
Copy link
Contributor Author

I agree the basics are there. I'd like to see how Solr upstream deals with adding fields on the fly and then revisit this topic.

I'm still concerned about modifying a schema for an index that already has data. I'm not sure what effects changing field names, field types, or analyzer chains might have. I'm sure it's often not good. In the future I think it would be good if Yokozuna kept track of schema versions via hash + datetime. It might be feasible to design graceful migrations by way of Solr cores. But that is a change that requires some thought and probably a fair amount of code. For now I'd like to punt on it and leave the behavior undefined. Essentially, modifying schemas with existing data should be done with great care for now.

Since this is an umbrella issue I'm going to remove the 'must' tag but leave it open as a reminder to revisit later.

@DSomogyi
Copy link

Comment for Jira.

@Basho-JIRA Basho-JIRA changed the title Schema Admin via HTTP Schema Admin via HTTP [JIRA: RIAK-1715] Apr 14, 2015
@rzezeski
Copy link
Contributor Author

rzezeski commented Oct 7, 2021

I wear the wolf shirt.

@jaredmorrow
Copy link
Contributor

I wear the wolf shirt.

Obviously not if you haven't fixed this in the past 9 years.

@rzezeski
Copy link
Contributor Author

rzezeski commented Oct 7, 2021

Problem?

@jaredmorrow
Copy link
Contributor

Shouldn't you be debugging a printer or something?

@andrewjstone
Copy link

Shouldn't you be debugging a printer or something?

He doesn't have time. He's still on projector duty.

@rzezeski
Copy link
Contributor Author

rzezeski commented Oct 8, 2021

As @vinoski would say, due tomorrow means do tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants