Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: unable to merge branch with large data change #4448

Closed
wvandeun opened this issue Sep 24, 2024 · 0 comments · Fixed by #4818
Closed

bug: unable to merge branch with large data change #4448

wvandeun opened this issue Sep 24, 2024 · 0 comments · Fixed by #4818
Assignees
Labels
priority/2 This issue stalls work on the project or its dependents, it's a blocker for a release type/bug Something isn't working as expected
Milestone

Comments

@wvandeun
Copy link
Contributor

Component

API Server / GraphQL

Infrahub version

0.16.0

Current Behavior

When you want to merge a branch with a large data change (40 devices, 4000 interfaces) you get an error message:

{
  "name": "ApolloError",
  "graphQLErrors": [
    {
      "message": "Unable to connect to the database",
      "locations": [
        {
          "line": 2,
          "column": 3
        }
      ],
      "path": [
        "BranchMerge"
      ]
    }
  ],
  "protocolErrors": [],
  "clientErrors": [],
  "networkError": null,
  "message": "Unable to connect to the database"
}

The message only seems to occur after some time and it seems that the database server has run out of memory.

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "req-rsp-timeout-task"
Exception in thread "HTTP-Dispatcher" java.lang.OutOfMemoryError: Java heap space
Exception in thread "neo4j.ThroughputMonitor-1" java.lang.OutOfMemoryError: Java heap space
Uncaught error from thread [cc-discovery-actor-system-akka.io.pinned-dispatcher-8]: Java heap space, shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[cc-discovery-actor-system]
java.lang.OutOfMemoryError: Java heap space
Uncaught error from thread [cc-discovery-actor-system-scheduler-1]: Java heap space, shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[cc-discovery-actor-system]
java.lang.OutOfMemoryError: Java heap space
Uncaught error from thread [cc-discovery-actor-system-akka.actor.internal-dispatcher-33]: Java heap space, shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[cc-discovery-actor-system]
java.lang.OutOfMemoryError: Java heap space
2024-09-24 15:46:47.127+0000 INFO  Neo4j Server shutdown initiated by request
2024-09-24 15:46:47.138+0000 INFO  Stopping...
ERROR StatusConsoleListener An exception occurred processing Appender rotatingWriter.neo4j.database.neo4j.db.query.execution.pipelined.failure.csv
 org.apache.logging.log4j.core.appender.AppenderLoggingException: java.lang.OutOfMemoryError: Java heap space
        at org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:165)
        at org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:134)
        at org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:125)
        at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:89)
        at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:683)
        at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:641)
        at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:624)
        at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:560)
        at org.apache.logging.log4j.core.config.DefaultReliabilityStrategy.log(DefaultReliabilityStrategy.java:63)
        at org.apache.logging.log4j.core.Logger.log(Logger.java:163)
        at org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2168)
        at org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2122)
        at org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2105)
        at org.apache.logging.log4j.spi.AbstractLogger.printf(AbstractLogger.java:2095)
        at org.neo4j.logging.log4j.RotatingLogFileWriter.printf(RotatingLogFileWriter.java:71)
        at com.neo4j.metrics.output.RotatableCsvReporter.report(RotatableCsvReporter.java:234)
        at com.neo4j.metrics.output.RotatableCsvReporter.reportMeter(RotatableCsvReporter.java:180)
        at com.neo4j.metrics.output.RotatableCsvReporter.report(RotatableCsvReporter.java:144)
        at com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:277)
        at com.codahale.metrics.ScheduledReporter.lambda$start$0(ScheduledReporter.java:206)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
        at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: java.lang.OutOfMemoryError: Java heap space

After this issue, you seem to be unable to restart the database and therefor the application server.

This happens using the recommended hardware requirements (in this case 12CPU, 16GB RAM).

Expected Behavior

The merge operation should succeed!

Steps to Reproduce

  • load an instance of Infrahub with the demo schema inv dev.start demo.load-infra-schema
  • create a branch infrahubctl branch create test
  • run the following script to load the dataset in the branch test (40 devices, 100 interfaces each) infrahubctl run <script.py> num_devices=40 --branch test
import logging
from infrahub_sdk import InfrahubClient


async def run(client: InfrahubClient, log: logging.Logger, branch: str, num_devices: int=50) -> None:
    site = await client.create("LocationSite", name="atl1")
    await site.save(allow_upsert=True)

    num_devices = int(num_devices)

    device_batch = await client.create_batch()
    interface_batch = await client.create_batch()

    for i in range(num_devices):
        device = await client.create("InfraDevice", name=f"atl1-test{i}", site=site, type="testing")
        device_batch.add(task=device.save, node=device, allow_upsert=True)
        log.info(f"Added device {device.name.value}")
        
    async for node, result in device_batch.execute():
        print(f"device {node.name.value} was created in Infrahub succesfully")
        client.store.set(key=node.name.value, node=node)

    for i in range(num_devices):
        for j in range(100):
            interface = await client.create("InfraInterfaceL2", name=f"Ethernet{j}", l2_mode="Access", speed=10000, device=client.store.get(key=f"atl1-test{i}"))
            interface_batch.add(task=interface.save, node=interface, allow_upsert=True)
            log.info(f"  Added interface {interface.name.value} for device {interface.device.peer.name.value}")

    async for node, result in interface_batch.execute():
        print(f"interface {node.name.value} {node.device.peer.name.value} was created in Infrahub succesfully")
  • go to the branch detail page for the test branch
  • merge the branch

Additional Information

No response

@wvandeun wvandeun added type/bug Something isn't working as expected group/backend Issue related to the backend (API Server, Git Agent) labels Sep 24, 2024
@exalate-issue-sync exalate-issue-sync bot added priority/2 This issue stalls work on the project or its dependents, it's a blocker for a release and removed group/backend Issue related to the backend (API Server, Git Agent) labels Sep 24, 2024
@exalate-issue-sync exalate-issue-sync bot added this to the Infrahub - 0.16.2 milestone Sep 24, 2024
@exalate-issue-sync exalate-issue-sync bot added the state/planned This issue is planned to be worked on in an upcoming release. label Oct 1, 2024
@exalate-issue-sync exalate-issue-sync bot removed the state/planned This issue is planned to be worked on in an upcoming release. label Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/2 This issue stalls work on the project or its dependents, it's a blocker for a release type/bug Something isn't working as expected
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants