Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Can not instantiate LogStore class: io.delta.storage.HDFSLogStore #3299

Closed
2 of 8 tasks
abhishekrb19 opened this issue Jun 24, 2024 · 1 comment · Fixed by #3304
Closed
2 of 8 tasks

[BUG] Can not instantiate LogStore class: io.delta.storage.HDFSLogStore #3299

abhishekrb19 opened this issue Jun 24, 2024 · 1 comment · Fixed by #3304
Labels
bug Something isn't working

Comments

@abhishekrb19
Copy link
Contributor

Bug

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Describe the problem

Steps to reproduce

Easy to reproduce using a simple ingest query to ingest a Delta table from the Druid-Delta connector. After upgrading the Delta Kernel dependency from 3.1.0 to 3.2.0, we noticed that ingestion from existing Delta tables setup locally stopped working. The stacktrace is:

java.lang.IllegalArgumentException: Can not instantiate `LogStore` class: io.delta.storage.HDFSLogStore
	at io.delta.kernel.defaults.internal.DefaultEngineErrors.canNotInstantiateLogStore(DefaultEngineErrors.java:26)
	at io.delta.kernel.defaults.internal.logstore.LogStoreProvider.getLogStore(LogStoreProvider.java:88)
	at io.delta.kernel.defaults.engine.DefaultFileSystemClient.listFrom(DefaultFileSystemClient.java:76)
	at io.delta.kernel.internal.snapshot.SnapshotManager.listFrom(SnapshotManager.java:228)
	at io.delta.kernel.internal.snapshot.SnapshotManager.listFromOrNone(SnapshotManager.java:253)
	at io.delta.kernel.internal.snapshot.SnapshotManager.listDeltaAndCheckpointFiles(SnapshotManager.java:294)
	at io.delta.kernel.internal.snapshot.SnapshotManager.getLogSegmentForVersion(SnapshotManager.java:466)
	at io.delta.kernel.internal.snapshot.SnapshotManager.getLogSegmentFrom(SnapshotManager.java:415)
	at io.delta.kernel.internal.snapshot.SnapshotManager.getSnapshotAtInit(SnapshotManager.java:352)
	at io.delta.kernel.internal.snapshot.SnapshotManager.buildLatestSnapshot(SnapshotManager.java:113)
	at io.delta.kernel.internal.TableImpl.getLatestSnapshot(TableImpl.java:55)
	at org.apache.druid.delta.input.DeltaInputSource.createSplits(DeltaInputSource.java:210)
	at org.apache.druid.msq.input.external.ExternalInputSpecSlicer.sliceSplittableInputSource(ExternalInputSpecSlicer.java:119)

The underlying issue seems to be that it's unable to instantiate LogStore, which is coming from a new code path added in https://github.com/delta-io/delta/pull/2770/files.

After adding the dependency to the Druid-Delta connector, I was still seeing the same error:

     <dependency>
      <groupId>io.delta</groupId>
      <artifactId>delta-storage</artifactId>
      <version>${delta-kernel.version}</version>
    </dependency>

Thanks to @vkorukanti who swiftly helped provide a workaround by adding the following line before making calls to the Kernel APIs: Thread.currentThread().setContextClassLoader(LogStore.class.getClassLoader()); before calls to the Kernel API

It's unclear if this workaround is needed for all the Kernel APIs or just specifically when getting the snapshot information - i.e., table.getLatestSnapshot(engine)

Observed results

Ingestion fails.

Further details

The slack thread has some more information: https://delta-users.slack.com/archives/CJ70UCSHM/p1719190251166299

Environment information

  • Delta Lake version: 3.20
  • Spark version: 3.5.0

Willingness to contribute

The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?

  • Yes. I can contribute a fix for this bug independently.
  • Yes. I would be willing to contribute a fix for this bug with guidance from the Delta Lake community.
  • No. I cannot contribute a bug fix at this time.
@abhishekrb19 abhishekrb19 added the bug Something isn't working label Jun 24, 2024
abhishekrb19 added a commit to abhishekrb19/incubator-druid that referenced this issue Jun 24, 2024
With the upgrade to Kernel 3.2.0, the Druid Delta connector extension
isn't able to ingest from Delta tables successfully.

Please see delta-io/delta#3299

The underlying problem seems to be coming from
https://github.com/delta-io/delta/blob/master/kernel/kernel-defaults/src/main/java/io/delta/kernel/defaults/internal/logstore/LogStoreProvider.java#L99

This patch is a workaround to setting the thread class loader explictly.
The Kernel community may consider a fix in the next release as it's affected another
connector as well.
abhishekrb19 added a commit to abhishekrb19/incubator-druid that referenced this issue Jun 24, 2024
With the upgrade to Kernel 3.2.0, the Druid Delta connector extension
isn't able to ingest from Delta tables successfully.

Please see delta-io/delta#3299

The underlying problem seems to be coming from
https://github.com/delta-io/delta/blob/master/kernel/kernel-defaults/src/main/java/io/delta/kernel/defaults/internal/logstore/LogStoreProvider.java#L99

This patch is a workaround to setting the thread class loader explictly.
The Kernel community may consider a fix in the next release as it's affected another
connector as well.
@vkorukanti
Copy link
Collaborator

Hi @abhishekrb19, I think we can just use the Class.forName(className) here instead of using the Class.forName(className, classloader from current thread).

Let me know if you want to make a PR to remove those.

abhishekrb19 added a commit to abhishekrb19/delta that referenced this issue Jun 24, 2024
vkorukanti pushed a commit that referenced this issue Jun 24, 2024
…3304)

## Description

Look for the `LogStore` class more broadly in the class loader than just
the thread local's class loader. The thread context class loader
requires the Thread local variables to have the `LogStore` in it, but
this class loader may not have all the dependencies wired up.

## How was this patch tested?
Not tested yet.

Fixes #3299.
abhishekrb19 added a commit to apache/druid that referenced this issue Jun 25, 2024
… Delta table ingestion (#16648)

* Workaround to ingesting from Delta table in 3.2.0.

With the upgrade to Kernel 3.2.0, the Druid Delta connector extension
isn't able to ingest from Delta tables successfully.

Please see delta-io/delta#3299

The underlying problem seems to be coming from
https://github.com/delta-io/delta/blob/master/kernel/kernel-defaults/src/main/java/io/delta/kernel/defaults/internal/logstore/LogStoreProvider.java#L99

This patch is a workaround to setting the thread class loader explictly.
The Kernel community may consider a fix in the next release as it's affected another
connector as well.

* Address review comment: clear the CL after the Thread CL is set.
vkorukanti pushed a commit to vkorukanti/delta that referenced this issue Aug 30, 2024
…`. (delta-io#3304)

Look for the `LogStore` class more broadly in the class loader than just
the thread local's class loader. The thread context class loader
requires the Thread local variables to have the `LogStore` in it, but
this class loader may not have all the dependencies wired up.

Not tested yet.

Fixes delta-io#3299.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants