Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] experimenting with shuffle locations in scheduler... #3

Closed
wants to merge 2 commits into from

Conversation

yifeih
Copy link
Owner

@yifeih yifeih commented May 8, 2019

Trying to add ShuffleLocations into the scheduler as a first class concept

@yifeih
Copy link
Owner Author

yifeih commented May 9, 2019

@mccheah @ifilonenko I'm seeing that the scheduler treats the idea of "host" and "location" as two different things. There are parts where it deletes all map outputs from a particular location (host/port combo) and parts where it deletes only based on host. To preserve the current behavior, we should probably expose the idea of a host from the ShuffleLocation interface.

@squito also brought up a good point that we have no use case for a generic ShuffleLocation that does not follow a host/port combination. The only implementation that needs to support a more generic form of ShuffleLocation is the individual file servers (for a DFS, we can configure a namenode location and not use ShuffleLocation. For async upload, we would use a host/port combination to represent the local location, and the namenode can be configured globally. For local implementation, the ShuffleLocation is also a host/port combination since it's just a wrapper around the BlockManagerId, as demonstrated in our PRs.) And for the individual file servers, I also can't see that we'd use anything other than a list of host/port combos. To simplify the API, I think we can safely and explicitly define the shuffle locations metadata stored in the driver to be an array of host/port combinations. With explicit definitions of shuffle locations as an array of host/port combos, it becomes easier to blacklist by host since you're no longer dealing with this nebulous object that can be anything (i.e. for the individual file server case, you can remove a location from the array of host/port combos if you discover that a particular location is inaccessible)

@yifeih
Copy link
Owner Author

yifeih commented May 13, 2019

follow up here: palantir#548

@yifeih yifeih closed this May 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant