-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Roadmap] XGBoost 1.0.0 Roadmap #4680
Comments
for other contributors who have no permission to edit the post, please comment here about what you think should be in 1.0.0 |
Also, should we target moving exclusively to the Scala based Rabit tracker (for Spark) in 1.0? |
I am also not a committer but me and the company I work in is very interested in fixing the performance issue with checkpointing (or at least mitigate it) #3946 |
@trams @thesuperzapper I think this is an overview for everyone to have a feeling for what's coming next. It would be difficult to list everything coming since XGBoost is a community driven project. Just open a PR when it's ready.
@thesuperzapper Let's track the progress. I certainly hope that I can start testing it. :-) |
There is also the secondary consideration, that we might not be ready for 1.0, and the API guarantees that come with that, for example, we could instead do 0.10.0 next? |
@thesuperzapper 1.0 is not gonna be a final version. It's just we are trying to do semantic versioning. |
Added some gpu related items. |
would like to get native xgb fix included. |
JSON is removed from the list. See #4683 (comment) |
I raised an issue for my above suggestion: #4781 (To remove the python Rabit tracker) |
FeatureImportance in the Spark version will be great as well (i.e. easily have the feature Importance) |
Added regression test. |
@chenqin I'd like to hear from you about regression tests, since you have experience with managing ML in production. Any suggestions? |
I think we should cover regression test on various of workloads and benchmark against prediction accuracy and stability (equal or better) than previous version within approximate same time. Two candidates on top of my head are https://archive.ics.uci.edu/ml/datasets/HIGGS sparse Dmatrix We can try various of tree methods and configurations to ensure good coverage tree_method, configurations / dataset / standalone or cluster declaimer:
May be more organized plan is to build a automation tool where user can take and benchmark various settings against their private data-set and model in their own data center. |
We should add fixing #4779 as a requirement to ship 1.0 |
I add #4899 as a cleanup step. |
@dmlc/xgboost-committer Since we have quite a few tasks left for 1.0, maybe we should make an interim release 0.91? |
@hcho3 Or perhaps 0.10.0 |
@thesuperzapper That will confuse version system. I don't mind a 0.91 release, but still I want to see proper procedures for regression tests. |
@trivialfis If master has API changes, shouldn't we bump a major version, which I guess would look like 0.100.0 |
@thesuperzapper The 1.0.0 version is the first version we would adopt semantic versioning scheme, so no, semantic versioning won't apply to the interim release. It's a bit tricky, since we have quite a lot to do until 1.0.0 is released. |
@CodingCat How about 0.100 or 0.95? "Preview" sounds like the 1.0.0 release is just around the corner, but we have quite a few major features (PySpark) on the line. |
Does it support weight xgboost ? |
I am not worrying about the impression of 1.0.0 to users
Spark 3.0 preview is releasing in this month, but formal release is next
April (around spark summit) maybe
…On Tue, Oct 8, 2019 at 11:41 AM Philip Hyunsu Cho ***@***.***> wrote:
@CodingCat <https://github.com/CodingCat> How about 0.100 or 0.95?
"Preview" sounds like the 1.0.0 release is just around the corner, but we
have quite a few major features (PySpark) on the line.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4680?email_source=notifications&email_token=AAFFQ6AOGIWIB6W6TW3R5W3QNTH6TA5CNFSM4IE5CQGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAVF7MA#issuecomment-539647920>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAFFQ6HF52HBR7ZNSKLIY3TQNTH6TANCNFSM4IE5CQGA>
.
|
@CodingCat at least from the point of view of xgboost4j-spark, that 1.0.0 preview won't be useful for most people, as almost no one is running Spark on 2.12. Additionally, you can't easily get a compiled binary as https://spark.apache.org/downloads.html dosen't distribute compiled versions of Spark for 2.12 with the Hadoop binaries included. |
Then we should release nothing?
…On Thu, Oct 10, 2019 at 10:05 PM Mathew Wicks ***@***.***> wrote:
@CodingCat <https://github.com/CodingCat> at least from the point of view
of xgboost4j-spark, that 1.0.0 preview won't be useful for most people, as
almost no one is running Spark on 2.12. Additionally, you can't easily get
a compiled binary as https://spark.apache.org/downloads.html dosen't
distribute compiled versions of Spark for 2.12 with the Hadoop binaries
included.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4680?email_source=notifications&email_token=AAFFQ6AN3FJQ7ZE7EOTXLW3QOACSFA5CNFSM4IE5CQGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEA6ZM2Q#issuecomment-540907114>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFFQ6EJRRMTNY7R7JVALTDQOACSFANCNFSM4IE5CQGA>
.
|
@CodingCat @thesuperzapper I thought #4574 would allow for compiling XGBoost with both Scala 2.11 and 2.12? In that case, we should compile XGBoost with 2.11 and upload JAR to Maven. |
Removed:
I don't think we can get to there right now. |
@thesuperzapper It will be come easier to develop against the Apache Spark master (3.0) branch and Scala 2.12 after Spark releases a 3.0 preview (targeted pretty soon this fall). I'd expect a much bigger shift to Scala 2.12 in the Spark community after the final 3.0 release (targeted early 2020), but you're right that there isn't a ton of 2.12 usage now. I created #4926 to solicit discussion around the upcoming Spark release. |
#4574 does not allow to cross compile. So someone may compile a jar with 2.11 and upload to Maven |
Does it support Multi objective learning? |
@douglasren Sadly no. Could you start a new issue so we can discuss it? The term "multi objective" can vary depending on contexts, like one objective function for multiple outputs, multiple objectives with one output or multiple objectives with multiple outputs? |
I would like to cast my vote towards an interim release as well. |
Removed:
|
An interim release would be great as the macOS installation is still a pain right now |
Can we get documented support for learning to rank (pairwise) with XGBoost4J-Spark? Currently, there is no concrete solution to how to specify training data. There's some confusion around partitioning by groupID and training data needing to follow same partition strategy, but it's quite vague. |
I'd like to cast my vote to an interim release as well. We're looking forward to the next version mostly for the missing value fix by @cpfarrell (see #4805). Is there a time estimate related to the next release (major or interim)? PS: @thesuperzapper we're using 2.11 and 2.12 and an interim release would be extremely helpful for us |
@hcho3 Can we make create a release branch and have a week or so for testing? |
Yes! |
@hcho3 In addition to a branch, we can also make an official release candidate on GitHub Releases so that the community can have more confidence to test it as well. |
This sounds awesome! Really looking forward to the next release. Let me know if we can help. We're definitely going to test it out at Yelp. |
I will cut a new branch |
Release candidate is now available for Python: #5253. You can try it today by running pip3 install xgboost==1.0.0rc1 |
1.0.0 is now out:
|
@dmlc/xgboost-committer please add your items here by editing this post. Let's ensure that
each item has to be associated with a ticket
major design/refactoring are associated with a RFC before committing the code
blocking issue must be marked as blocking
breaking change must be marked as breaking
for other contributors who have no permission to edit the post, please comment here about what you think should be in 1.0.0
I have created three new types labels, 1.0.0, Blocking, Breaking
The text was updated successfully, but these errors were encountered: