Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop win rebased #880

Closed
wants to merge 9 commits into from
Closed

Conversation

alex85k
Copy link
Contributor

@alex85k alex85k commented Jan 22, 2014

Pathces for issue #646

Hope something of this compatibity code could be useful
This one does compile on Linux too (checked on old RHEL with custom-built gcc 4.7.2 and libaries from sources)

@DennisOSRM
Copy link
Collaborator

Great, thanks for the update. I also got a Windows license.

@alex85k
Copy link
Contributor Author

alex85k commented Jan 22, 2014

Great!

By the way, on Linux 67 cucumber tests are failing. Now I am compiling unpatched develop version to compare.

@@ -810,7 +811,7 @@ class StaticRTree : boost::noncopyable {
found_a_nearest_edge = true;
} else
if( DoubleEpsilonCompare(current_perpendicular_distance, min_dist) &&
( 1 == abs(current_edge.id - result_phantom_node.edgeBasedNode ) ) &&
( DoubleEpsilonCompare((double) current_edge.id - result_phantom_node.edgeBasedNode , 1) ) &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Casting to double here doesn't make much sense as we are comparing integers here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a problematic place - (unsigned) - (unsigned) is causing compilation errors.
(abs is not compiling due to many possible overloads)

I'll replace it with id1==id2+1 || id1+1==id2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was also producing incorrect results (no ide, why). Was fixed :)

@alex85k
Copy link
Contributor Author

alex85k commented Jan 22, 2014

We (mostly I :) ) have a problem. The patching introduced 67 failing tests, in clean develop branch only one fails.

I'll locate the problematic places and improve the patces.

@emiltin
Copy link
Contributor

emiltin commented Jan 22, 2014

thank god for tests :-)
however, note that the test suite is not complete. in particular, none of the program option stuff is covered, nor stuff like geometry encoding.

@emiltin
Copy link
Contributor

emiltin commented Jan 22, 2014

thank god for tests :-)
but keep in mind that the test suite is not complete. for example, none of the program option stuff is covered yet.

@emiltin
Copy link
Contributor

emiltin commented Jan 22, 2014

thank god for tests :-)
but keep in mind that the test suite is not complete, for example the program options stuff is not covered yet.

1 similar comment
@emiltin
Copy link
Contributor

emiltin commented Jan 22, 2014

thank god for tests :-)
but keep in mind that the test suite is not complete, for example the program options stuff is not covered yet.

@DennisOSRM
Copy link
Collaborator

@alex85k Which test is failing in Windows? Can you run the one test verbose and report the output?

@alex85k
Copy link
Contributor Author

alex85k commented Jan 22, 2014

Too many tests fail with incorrect result (not running error), must be something like signed/unsigned datatype mismatch
https://dl.dropboxusercontent.com/u/63393258/__log
https://dl.dropboxusercontent.com/u/63393258/fail.log
They pass correctly on untouched develop, except one (about timestamp, maybe partial incompatible libraries).

I am still unable to run tests on Windows (problem with launching/stopping/connecting service - occupied socket or something like that), there are results of running on Linux with changed code.

I am checking commits one-by-one to isolate bad change.

@alex85k
Copy link
Contributor Author

alex85k commented Jan 22, 2014

Only failing test for current develop is "Basic travel time, 1km scale" here is the log:
https://dl.dropboxusercontent.com/u/63393258/failing_test_on_develop.txt

@alex85k
Copy link
Contributor Author

alex85k commented Jan 22, 2014

Bad commit detected:
alex85k@f4fd2c0
Trying to fix the problem.

@emiltin
Copy link
Contributor

emiltin commented Jan 22, 2014

we should try to get the cucumber tests running on windows. can you create a gist with the command line output?

@alex85k
Copy link
Contributor Author

alex85k commented Jan 22, 2014

There are just timeout errors from routed (after patching .rb files to open exe files and avoid unix utilities).

@alex85k
Copy link
Contributor Author

alex85k commented Jan 22, 2014

I have re-published the fixed patches.

@DennisOSRM
Copy link
Collaborator

I have a lead on the single failing test. It comes down to floating point precision and actual differences in hardware, i.e. the tests pass on some processor/compiler combinations, but fail on others.

@alex85k
Copy link
Contributor Author

alex85k commented Jan 23, 2014

The test scripts were adapted to run on Windows (Linux testing is passing correctly, I checked)

Problem with timeout existed because the routing on my 2-core PC is running very slow on Windows - about 1 second per one viaroute request (8-core Linux server is about 4 times faster).

When the testing process will finish (~1hour), I'll post the duration and failing tests info.

@alex85k
Copy link
Contributor Author

alex85k commented Jan 23, 2014

Wow! The tests passed successfully on Windows! :)
(43 minutes vs 12m50s on Linux server)

(One test is failing on both Linux and Windows (I did not use last commit from develop branch).

@DennisOSRM
Copy link
Collaborator

Great job. The last failing test should also be ok, if you rebase onto latest develop branch

@alex85k
Copy link
Contributor Author

alex85k commented Jan 23, 2014

One more rebase is not a problem :) (the commits in this PR will soon change a little)

@alex85k
Copy link
Contributor Author

alex85k commented Jan 23, 2014

Yes! The last test passed.
Had to change two commits (ambiguos fabs call was already removed in develop, also used boost::math::isnan in one more file).

@alex85k
Copy link
Contributor Author

alex85k commented Jan 23, 2014

By the way, there are some problems with automatic tests on Travis CI (great feature!):
java.io.FileNotFoundException: /usr/share/java/postgis.jar

@DennisOSRM
Copy link
Collaborator

yeah, no idea why travis is failing. will have to investigate at a later point.

@alex85k
Copy link
Contributor Author

alex85k commented Jan 26, 2014

I investigated slow runnig of test on Windows. After many pring-based debuging, trivial answer was found:
if IPV6 is enabled by fdefault, connections to localhost become extremely slow. (it trues IPV6, then use IPV4 as fallback). Replacing address with 127.0.0.1 helped a lot!

All the tests on Windows passed in 12 minutes only (Core 2 Duo E6400, empty test/cache folder)
When cache folder is generated, re-testing takes only 3min 40s!

(I have also used datastore-based launcher from #889 , but inserted osrm-routed shutdown code to avoid strange errors )

@alex85k
Copy link
Contributor Author

alex85k commented Jan 26, 2014

My updated testing sources are here: https://github.com/alex85k/Project-OSRM/commits/experimental/cuke_datastore

@DennisOSRM
Copy link
Collaborator

@alex85k nice progress. Could you write up a gist on how to install dependencies on windows?

@emiltin
Copy link
Contributor

emiltin commented Jan 26, 2014

well done.
test re-run in 1:07 on my Mac. With the osrm-datastore, they re-run in 20s.

@emiltin
Copy link
Contributor

emiltin commented Jan 26, 2014

your branch now includes my commit tat uses osrm-datastore during test. why is it requried to shutdown osrm-routed each time on windows?

@alex85k
Copy link
Contributor Author

alex85k commented Jan 26, 2014

your branch now includes my commit tat uses osrm-datastore during test. why is it requried to shutdown osrm-routed each time on windows?

Yes, I included your datastore-runner (branching result isn git is always some random surprice for me) for demonstration, mine did not work well enough :) (iI did not know about stici variables and correct usinf ifs in Ruby).
When I did not shutdown orsm-datastore, I get multiple errors, especially on second run (when delays are smaller).
Some tests pass, but it is needed to fix or tune something in main code or settings to run tetsts successfully (I can not also run them successfully on Linux).

@alex85k nice progress. Could you write up a gist on how to install dependencies on windows?

Thank you. I have batch files that downloads and install all dependencies, will publish them as soon as possible.

@DennisOSRM
Copy link
Collaborator

Cool. Let's put this batch file in the repo, too

@alex85k
Copy link
Contributor Author

alex85k commented Jan 26, 2014

Here are the instructions and batch files:
https://gist.github.com/alex85k/8637217
Basic libs sources: https://dl.dropboxusercontent.com/u/63393258/Build_OSRM.zip

At this point building is still semi-manual (you need to download some libraries, edit some path in the file and install problematic libraries like stxxl), maybe after your experience with building it can be automated and placed to the repository.

Do not hesitate to remind if I forgot something...

@emiltin
Copy link
Contributor

emiltin commented Jan 26, 2014

i don't have a windows dev setup. but you you try adding a delay after ruby calls osrm-datastore, and see if that make tests pass.

@alex85k
Copy link
Contributor Author

alex85k commented Jan 27, 2014

I have inserted 7 seconds delay before and after osrm-datastore and also adding --threads 1.
The errors are much more rare now, but they exist (however, I did not catch "different tables", only sometimes strange server stops near the end of the .feature file processing.

--threads 1 without delay does not help.
3 or 5 second delay is not enough - I get Table::Different rather often and server stops at random query.
However, this is Windows - shared memory blocks an even named mutexes are emulated via filesystem (Boost decides to do it to imitate POSIX behavior). I can see 1-byte no_running_queries, pending_update, query and update files in boost_interprocess folder.

@emiltin
Copy link
Contributor

emiltin commented Jan 27, 2014

sevend seconds delay sounds pretty obnoxious :-) it sounds to be like something in the launch mechanics is broken.
do you see the same behaviour when you run osrm-datastore and osrm-routed manually?

@emiltin
Copy link
Contributor

emiltin commented Jan 27, 2014

i suggest leaving out the code for using osrm-datastore with cucumber from this PR, and handle it in a separate one.

@DennisOSRM
Copy link
Collaborator

yeah, let's keep this separate.

@alex85k
Copy link
Contributor Author

alex85k commented Jan 27, 2014

OK, this issue is about Windows.compatibility, not about testing. However, I had to reimplement shared-memory base class on Windows (without any experience in interprocess communication, except MPI), so please check that part well before including (and without that part it can not compile) :)

About manual running: if I start osrm-routed manually one time and test with reloading osrm-datastore, the results are the same (small delays - more errors, big delays - rare errors, mostly falling server).

If I reload the data manually, then server seems to work correctly. But I remember one run, when manual loading data from different pbf extract with osrm-datastore gave me the segm. fault of osrm-routed on next query (on Windows).

@alex85k
Copy link
Contributor Author

alex85k commented Jan 27, 2014

I also tried enabling all possible logging (removed output redirecting to file, added debugging output to routed on memory reload). The order of calls seems to be correct even for failing tests: loading to memory finished, then delay, then first request triggers memory reload before answering, then next requests are processed. If osrm-routed fails, it does this somewhere inside the request processing, not always on the last request of the test.

@emiltin
Copy link
Contributor

emiltin commented Jan 27, 2014

it's probably a problem with your reimplementation of shared mem then? @DennisOSRM do you want full windows compatibility including shared mem before merging this branch, or you prefer to merge as-is and fix the shared mem stuff later?

@alex85k
Copy link
Contributor Author

alex85k commented Jan 28, 2014

I am afraid unpatched develop version may have datastore-testing errors too in Linux environment (I had the same errors on RedHat and Ubuntu, but you may recheck to be sure).

@alex85k
Copy link
Contributor Author

alex85k commented Apr 7, 2014

I have updated path to the recent OSRM 0.3.8 and created new pull-request
Maybe this time it could be considered :)

@DennisOSRM
Copy link
Collaborator

@alex85k would you like to update this PR or would you go for an new PR?

@DennisOSRM
Copy link
Collaborator

Closing in favor of #979

@DennisOSRM DennisOSRM closed this Apr 7, 2014
@alex85k alex85k deleted the develop-win-rebased branch October 21, 2015 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants