Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep guava in step with Scio #72

Closed
idreeskhan opened this issue Apr 12, 2018 · 13 comments
Closed

Keep guava in step with Scio #72

idreeskhan opened this issue Apr 12, 2018 · 13 comments
Assignees
Labels

Comments

@idreeskhan
Copy link
Contributor

idreeskhan commented Apr 12, 2018

Currently we are on 0.21 while Scio is on 0.20. Should investigate downgrade and potential issues. Concerned this may cause unexpected errors when running.

@tkhduracell
Copy link

Might this be the cause of java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkNotNull(Ljava/lang/Object;Ljava/lang/String;Ljava/lang/Object;)Ljava/lang/Object; that I get when running the latest 0.3.0 jar?

@idreeskhan idreeskhan self-assigned this Apr 13, 2018
@idreeskhan idreeskhan added the bug label Apr 13, 2018
@idreeskhan
Copy link
Contributor Author

Seems to be the case. Fixing now

@idreeskhan
Copy link
Contributor Author

Closing, please try 0.3.1 and re-open if there are issues

@tkhduracell
Copy link

tkhduracell commented Apr 13, 2018

@idreeskhan Still reproducible with the 0.3.1.jar

[ForkJoinPool-1-worker-5] ERROR org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler - 2018-04-13T15:11:52.057Z: java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkNotNull(Ljava/lang/Object;Ljava/lang/String;Ljava/lang/Object;)Ljava/lang/Object;
	at org.apache.beam.sdk.io.gcp.bigquery.BigQueryAvroUtils.convertRequiredField(BigQueryAvroUtils.java:169)
	at org.apache.beam.sdk.io.gcp.bigquery.BigQueryAvroUtils.getTypedCellValue(BigQueryAvroUtils.java:131)
	at org.apache.beam.sdk.io.gcp.bigquery.BigQueryAvroUtils.convertGenericRecordToTableRow(BigQueryAvroUtils.java:115)
	at org.apache.beam.sdk.io.gcp.bigquery.BigQueryAvroUtils.convertGenericRecordToTableRow(BigQueryAvroUtils.java:104)
	at org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO$TableRowParser.apply(BigQueryIO.java:381)
	at org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO$TableRowParser.apply(BigQueryIO.java:374)
	at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase$1.apply(BigQuerySourceBase.java:204)
	at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase$1.apply(BigQuerySourceBase.java:198)
	at org.apache.beam.sdk.io.AvroSource$AvroBlock.readNextRecord(AvroSource.java:581)
	at org.apache.beam.sdk.io.BlockBasedSource$BlockBasedReader.readNextRecord(BlockBasedSource.java:223)
	at org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.advanceImpl(FileBasedSource.java:470)
	at org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.startImpl(FileBasedSource.java:465)
	at org.apache.beam.sdk.io.OffsetBasedSource$OffsetBasedReader.start(OffsetBasedSource.java:261)
	at com.google.cloud.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:594)
	at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:347)
	at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:183)
	at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:148)
	at com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:68)
	at com.google.cloud.dataflow.worker.DataflowWorker.executeWork(DataflowWorker.java:330)
	at com.google.cloud.dataflow.worker.DataflowWorker.doWork(DataflowWorker.java:302)
	at com.google.cloud.dataflow.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:251)
	at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:135)
	at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:115)
	at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:102)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
java -cp ratatool-0.3.1.jar com.spotify.ratatool.diffy.BigDiffy --runner=DataflowRunner 
--region=europe-west1 
--tempLocation=gs://<bucket>/tmp/ 
--stagingLocation=gs://<bucket>/tmp/ 
--gcsTempLocation=gs://<bucket>/tmp/  
--mode=bigquery --key=user_id 
--lhs=project:dataset.table --rhs=project:dataset.table 
--output=gs://<bucket>/table_result 

@idreeskhan
Copy link
Contributor Author

idreeskhan commented Apr 13, 2018

Ack

@idreeskhan idreeskhan reopened this Apr 13, 2018
@idreeskhan
Copy link
Contributor Author

Seems it can be removed entirely, running some additional tests to verify I didn't break other things. But it definitely fixes this case

@tkhduracell
Copy link

Still reproducible on ratatool-0.3.2-SNAPSHOT. :(
@idreeskhan

@idreeskhan
Copy link
Contributor Author

idreeskhan commented Apr 13, 2018

Hmm worked for me locally @tkhduracell

with --mode=bigquery and --runner=DataflowRunner

@tkhduracell
Copy link

Have you tried the public jar?

@idreeskhan
Copy link
Contributor Author

idreeskhan commented Apr 13, 2018

Local Build:

shasum -a 256 ratatool-0.3.2-SNAPSHOT.jar
364f48303d9298458c8409ebda8d9253736a52fcf086fea7d292ddd650a8971c  

Uploaded Jar:

shasum -a 256 ~/Downloads/ratatool-0.3.2-SNAPSHOT.jar
364f48303d9298458c8409ebda8d9253736a52fcf086fea7d292ddd650a8971c 

@idreeskhan
Copy link
Contributor Author

idreeskhan commented Apr 15, 2018

This is solely a fat jar problem, should be fixed on master already for other uses. Fat jar is mostly for ease of use, but could potentially provide a solution with sbt pack instead. May complicate the Homebrew formula, which requires some investigation, but should prevent these sorts of issues in the future.

@idreeskhan
Copy link
Contributor Author

idreeskhan commented Apr 15, 2018

I've uploaded a gzipped pack folder which works for me end-to-end for BigDiffy and BigSampler with BQ, also will PR changes to public Homebrew formula to fix it as well. For now it can be run by executing bin/big-diffy, will need to update docs before releasing a full version. Also potentially should address #67 to simplify the packed installation

SHA256 is 7684a016e3619e1860cfbb08ae0a3932b5c42d2c8d698927fc9d6fe59be083fb

@idreeskhan
Copy link
Contributor Author

Closing this issue for now, re-open if the packed folder does not address your issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants