Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[platform] on-premise create universe flow does not accurately place masters in multi-region configuration #3999

Closed
ajcaldera1 opened this issue Mar 17, 2020 · 3 comments
Assignees
Labels
area/platform Yugabyte Platform area/ui All issues relating to UI/UX priority/high High Priority

Comments

@ajcaldera1
Copy link
Contributor

Steps to repro:

  1. Create onprem provider with 3 regions
  2. Add 3 nodes to each region
  3. Create universe with the 9 nodes - all of the masters will get assigned to a single region
@ajcaldera1 ajcaldera1 added area/ui All issues relating to UI/UX priority/high High Priority area/platform Yugabyte Platform labels Mar 17, 2020
@ajcaldera1
Copy link
Contributor Author

So after conducting a repro with 2.1.2, I find that it picks the first 3 nodes provisioned (n1,n2,n3) regardless of location.

@ajcaldera1
Copy link
Contributor Author

I've isolated the fault. The problem is that in the on-premise flow we have no subnet information stored in the availability_zone table, so it is non-deterministic as to how it would select the masters. We can overcome the difficulty when we create regions/az's in the onpremise YW flow and update the subnet information manually from PG, but this is not desirable and may not always be possible.

yugaware=# select * from availability_zone;
                 uuid                 |    code    |    name    |             region_uuid              | active |          subnet          | config
--------------------------------------+------------+------------+--------------------------------------+--------+--------------------------+--------
 16bbd69a-31d6-49a2-924e-cd13c03c3446 | us-east-1a | us-east-1a | bf20e6ed-db09-4952-ad68-21880de820e5 | t      | subnet-04733d49f678ac6ec |
 c919e08f-e298-4ecb-8b3c-ada5d01a437d | us-east-1b | us-east-1b | bf20e6ed-db09-4952-ad68-21880de820e5 | t      | subnet-0eb5ee784f73d1e66 |
 662aad45-29b8-4e7e-af17-dc4083002b1f | us-east-1c | us-east-1c | bf20e6ed-db09-4952-ad68-21880de820e5 | t      | subnet-056361fd94cef10cb |
 d6845e8f-1d22-43a6-be1d-d4eada7a8115 | us-east-1d | us-east-1d | bf20e6ed-db09-4952-ad68-21880de820e5 | t      | subnet-06f4bd601386fd99a |
 54b4e790-c910-4d52-a84d-eca5ba896833 | us-east-1e | us-east-1e | bf20e6ed-db09-4952-ad68-21880de820e5 | t      | subnet-09a90092a9f124d48 |
 0a65eadc-d317-45df-b75b-764357b3c25b | us-east-1f | us-east-1f | bf20e6ed-db09-4952-ad68-21880de820e5 | t      | subnet-0343ec259b0c12fb1 |
 5e8845d1-ed17-4d80-9dcf-416afe055e1b | us-east-2a | us-east-2a | 0b0db819-b02a-4d81-9c44-4b95566f2045 | t      | subnet-0e7a24bf07b06e281 |
 06dffe4e-4667-40d0-92ac-b01e341e33ce | us-east-2b | us-east-2b | 0b0db819-b02a-4d81-9c44-4b95566f2045 | t      | subnet-00ee5c710e8005aeb |
 3fe1e340-52c7-4cc5-b017-69818d821cca | us-east-2c | us-east-2c | 0b0db819-b02a-4d81-9c44-4b95566f2045 | t      | subnet-093980da8b1434511 |
 47b0dbf9-7fff-4bc9-a2df-da870594ddb1 | us-west-2a | us-west-2a | bbafcf5e-4607-403f-a05b-877f5011aba0 | t      | subnet-0dcb2128d8c5af902 |
 49f6d2af-5a36-4803-a97c-c9330a3b85df | us-west-2b | us-west-2b | bbafcf5e-4607-403f-a05b-877f5011aba0 | t      | subnet-06dfefea866799604 |
 0669effc-5617-4379-bad8-e190a32bc32e | us-west-2c | us-west-2c | bbafcf5e-4607-403f-a05b-877f5011aba0 | t      | subnet-00b1a9a7932ee62fc |
 7867f2c4-e11c-47aa-bcd5-d3bb181622f0 | us-west-2a | us-west-2a | cc5a931c-b0f5-4cda-b2f5-7d535ffc0f48 | t      |                          |
 b1e34d95-bbbf-42ee-acb5-a172c2e8343b | us-east-1a | us-east-1a | 02a42408-fdd6-4c0d-999d-d1839f09326d | t      |                          |
 3535b242-e58c-49a5-b6b6-a92c2251588d | us-east-2a | us-east-2a | 978eb5c6-b330-4a29-af87-265ec532ad39 | t      |                          |
(15 rows)

We may want to consider a unique key constraint on that field or simply defaulting it to the unique key generated for the availability_zone record.

Arnav15 added a commit that referenced this issue Mar 27, 2020
Summary:
The current master selection logic used subnets instead of zones and regions, which caused
issues for on-prem flows, where the subnet might be null. Since subnets and azs are mapped
one-to-one anyways, we can use the zones to select master placement. The updated logic now cycles
the master selection in each region/zone combination before putting a master in the same region/zone
combination again.

Test Plan:
Added a unit test. Also tested by deploying a multi-region/multi-zone universe and
verifying that the masters got sprayed correctly.

Reviewers: ram, bogdan

Reviewed By: bogdan

Subscribers: kannan, jenkins-bot, yugaware

Differential Revision: https://phabricator.dev.yugabyte.com/D8164
@bmatican bmatican assigned Arnav15 and unassigned WesleyW and ramkumarvs Mar 27, 2020
@schoudhury schoudhury changed the title [YW] on-premise create universe flow does not accurately place masters in multi-region configuration [platform] on-premise create universe flow does not accurately place masters in multi-region configuration Apr 13, 2020
@schoudhury
Copy link
Contributor

this issue has been resolved with the commit referenced above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/platform Yugabyte Platform area/ui All issues relating to UI/UX priority/high High Priority
Projects
None yet
Development

No branches or pull requests

5 participants