Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Missing performance analyzer configuration files #3042

Closed
guillaume-alvarez opened this issue Jan 4, 2023 · 9 comments · Fixed by #3066
Closed

[BUG] Missing performance analyzer configuration files #3042

guillaume-alvarez opened this issue Jan 4, 2023 · 9 comments · Fixed by #3066
Assignees
Labels
bug Something isn't working docker

Comments

@guillaume-alvarez
Copy link

Describe the bug
When using opensearch 2.3.0 or 2.4.1 docker the content of /usr/share/opensearch/logs/performance-analyzer.log contains errors about configuration files:

11:18:56.798 [rca-controller] ERROR org.opensearch.performanceanalyzer.rca.RcaController - Error reading file /usr/share/opensearch/data/rca_enabled.conf
java.nio.file.NoSuchFileException: /usr/share/opensearch/data/rca_enabled.conf
....
11:18:57.402 [pa-reader] ERROR org.opensearch.performanceanalyzer.reader.ReaderMetricsProcessor - Error reading file '/usr/share/opensearch/data/batch_metrics_enabled.conf': java.nio.file.NoSuchFileException: /usr/share/opensearch/data/batch_metrics_enabled.conf

To Reproduce
Steps to reproduce the behavior:

  1. start opensearch 2.4.1 with docker run -d -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" opensearchproject/opensearch:latest
  2. Open a shell inside the docker.
  3. head -n 50 execute tail -f /usr/share/opensearch/logs/performance-analyzer.log

Expected behavior
No error with default conf.

Plugins
No specific configuration.
Performance-analyzer is enabled by default.

Screenshots
logs:

11:18:56.798 [rca-controller] ERROR org.opensearch.performanceanalyzer.rca.RcaController - Error reading file /usr/share/opensearch/data/rca_enabled.conf
java.nio.file.NoSuchFileException: /usr/share/opensearch/data/rca_enabled.conf
	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:92) ~[?:?]
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) ~[?:?]
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]
	at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:218) ~[?:?]
	at java.nio.file.Files.newByteChannel(Files.java:380) ~[?:?]
	at java.nio.file.Files.newByteChannel(Files.java:432) ~[?:?]
	at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:422) ~[?:?]
	at java.nio.file.Files.newInputStream(Files.java:160) ~[?:?]
	at java.util.Scanner.<init>(Scanner.java:718) ~[?:?]
	at org.opensearch.performanceanalyzer.rca.RcaController.lambda$readRcaEnabledFromConf$1(RcaController.java:344) ~[performance-analyzer-rca-2.3.0.0.jar:?]
	at org.opensearch.performanceanalyzer.core.Util.lambda$invokePrivileged$1(Util.java:57) ~[performance-analyzer-rca-2.3.0.0.jar:?]
	at java.security.AccessController.doPrivileged(AccessController.java:318) ~[?:?]
	at org.opensearch.performanceanalyzer.core.Util.invokePrivileged(Util.java:53) ~[performance-analyzer-rca-2.3.0.0.jar:?]
	at org.opensearch.performanceanalyzer.rca.RcaController.readRcaEnabledFromConf(RcaController.java:342) ~[performance-analyzer-rca-2.3.0.0.jar:?]
	at org.opensearch.performanceanalyzer.rca.RcaController.run(RcaController.java:300) ~[performance-analyzer-rca-2.3.0.0.jar:?]
	at org.opensearch.performanceanalyzer.PerformanceAnalyzerApp.lambda$startRcaTopLevelThread$0(PerformanceAnalyzerApp.java:173) ~[performance-analyzer-rca-2.3.0.0.jar:?]
	at org.opensearch.performanceanalyzer.threads.ThreadProvider.lambda$createThreadForRunnable$0(ThreadProvider.java:45) ~[performance-analyzer-rca-2.3.0.0.jar:?]
	at java.lang.Thread.run(Thread.java:833) [?:?]
11:18:57.042 [pa-reader] ERROR org.opensearch.performanceanalyzer.reader.ReaderMetricsProcessor - Error reading file '/usr/share/opensearch/data/batch_metrics_enabled.conf': java.nio.file.NoSuchFileException: /usr/share/opensearch/data/batch_metrics_enabled.conf
Dec 31, 2022 11:18:57 AM org.jooq.tools.JooqLogger info
INFO: 
                                      
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@  @@        @@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@        @@@@@@@@@@
@@@@@@@@@@@@@@@@  @@  @@    @@@@@@@@@@
@@@@@@@@@@  @@@@  @@  @@    @@@@@@@@@@
@@@@@@@@@@        @@        @@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@        @@        @@@@@@@@@@
@@@@@@@@@@    @@  @@  @@@@  @@@@@@@@@@
@@@@@@@@@@    @@  @@  @@@@  @@@@@@@@@@
@@@@@@@@@@        @@  @  @  @@@@@@@@@@
@@@@@@@@@@        @@        @@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@  @@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  Thank you for using jOOQ 3.10.8
                                      
11:18:57.402 [pa-reader] ERROR org.opensearch.performanceanalyzer.reader.ReaderMetricsProcessor - Error reading file '/usr/share/opensearch/data/batch_metrics_enabled.conf': java.nio.file.NoSuchFileException: /usr/share/opensearch/data/batch_metrics_enabled.conf
11:18:59.554 [pa-reader] ERROR org.opensearch.performanceanalyzer.reader.ReaderMetricsProcessor - Error reading file '/usr/share/opensearch/data/batch_metrics_enabled.conf': java.nio.file.NoSuchFileException: /usr/share/opensearch/data/batch_metrics_enabled.conf
Dec 31, 2022 11:19:02 AM org.jooq.tools.JooqLogger info
INFO: Single batch             : No bind variables have been provided with a single statement batch execution. This may be due to accidental API misuse

Host/Environment (please complete the following information):

  • OS: Windows+WSL2 or Ubuntu 22.10
  • Version: reproduced on opensearch 2.3.0 or 2.4.1

Additional context
Opened an issue for some other logs: opensearch-project/performance-analyzer#355

@guillaume-alvarez guillaume-alvarez added bug Something isn't working untriaged Issues that have not yet been triaged labels Jan 4, 2023
@guillaume-alvarez guillaume-alvarez changed the title [BUG] Errors with performance analyzer configuration [BUG] Missing performance analyzer configuration Jan 4, 2023
@guillaume-alvarez guillaume-alvarez changed the title [BUG] Missing performance analyzer configuration [BUG] Missing performance analyzer configuration files Jan 4, 2023
@saratvemulapalli
Copy link
Member

Thanks @guillaume-alvarez for reaching out.
Looks like the tarball is fine but the docker distribution is having problems with PA configurations.
Lets start with opensearch-build who packages the docker distribution and move it to PA if we'd like.

@saratvemulapalli saratvemulapalli transferred this issue from opensearch-project/OpenSearch Jan 4, 2023
@bbarani
Copy link
Member

bbarani commented Jan 5, 2023

[Triage] We will look in this issue and update our findings soon. CC: @prudhvigodithi

@prudhvigodithi prudhvigodithi removed the untriaged Issues that have not yet been triaged label Jan 5, 2023
@prudhvigodithi
Copy link
Collaborator

prudhvigodithi commented Jan 5, 2023

Hey I'm able to re-produce this with docker installation (in both 1.x and 2.x) and not with tar installation, I suspect this is because of how the PA plugin is started as recommend and as seen in the entrypoint, but inside docker i can see both files batch_metrics_enabled.conf rca_enabled.conf.
@peterzhuamazon

@prudhvigodithi
Copy link
Collaborator

prudhvigodithi commented Jan 5, 2023

Found that with tar distribution ./opensearch-tar-install.sh PA wont auto start, user has to manually start the PA, hence with tar we dont see that error and with docker the PA is auto started (unless we disable it).
But after few sec's the error java.nio.file.NoSuchFileException wont appear and I'm able to curl the PA API

curl -X GET localhost:9600/_plugins/_performanceanalyzer/metrics/units
{"Disk_Utilization":"%","Cache_Request_Hit":"count","Segments_Memory":"B","Refresh_Time":"ms","ThreadPool_QueueLatency":"count","Merge_Time":"ms","ClusterApplierService_Latency":"ms","PublishClusterState_Latency":"ms","Cache_Request_Size":"B","LeaderCheck_Failure":"count","ThreadPool_QueueSize":"count","Sched_Runtime":"s/ctxswitch","Disk_ServiceRate":"MB/s","Heap_AllocRate":"B/s","Indexing_Pressure_Current_Limits":"B","Sched_Waittime":"s/ctxswitch","ShardBulkDocs":"count","Thread_Blocked_Time":"s/event","VersionMap_Memory":"B","Master_Task_Queue_Time":"ms","IO_TotThroughput":"B/s","Indexing_Pressure_Current_Bytes":"B","Indexing_Pressure_Last_Successful_Timestamp":"ms","Net_PacketRate6":"packets/s","Cache_Query_Hit":"count","IO_ReadSyscallRate":"count/s","Net_PacketRate4":"packets/s","Cache_Request_Miss":"count","ThreadPool_RejectedReqs":"count","Net_TCP_TxQ":"segments/flow","Master_Task_Run_Time":"ms","IO_WriteSyscallRate":"count/s","IO_WriteThroughput":"B/s","Refresh_Event":"count","Flush_Time":"ms","Heap_Init":"B","Indexing_Pressure_Rejection_Count":"count","CPU_Utilization":"cores","Cache_Query_Size":"B","Merge_Event":"count","DocValues_Memory":"B","Cache_FieldData_Eviction":"count","IO_TotalSyscallRate":"count/s","Net_Throughput":"B/s","Paging_RSS":"pages","AdmissionControl_ThresholdValue":"count","Indexing_Pressure_Average_Window_Throughput":"count/s","Cache_MaxSize":"B","IndexWriter_Memory":"B","Net_TCP_SSThresh":"B/flow","IO_ReadThroughput":"B/s","LeaderCheck_Latency":"ms","FollowerCheck_Failure":"count","TermVectors_Memory":"B","HTTP_RequestDocs":"count","Net_TCP_Lost":"segments/flow","GC_Collection_Event":"count","Sched_CtxRate":"count/s","AdmissionControl_RejectionCount":"count","Heap_Max":"B","ClusterApplierService_Failure":"count","PublishClusterState_Failure":"count","Merge_CurrentEvent":"count","Indexing_Buffer":"B","Bitset_Memory":"B","Norms_Memory":"B","Net_PacketDropRate4":"packets/s","Heap_Committed":"B","Net_PacketDropRate6":"packets/s","Thread_Blocked_Event":"count","GC_Collection_Time":"ms","Cache_Query_Miss":"count","Latency":"ms","Shard_State":"count","Thread_Waited_Event":"count","CB_ConfiguredSize":"B","ThreadPool_QueueCapacity":"count","CB_TrippedEvents":"count","Disk_WaitTime":"ms","Data_RetryingPendingTasksCount":"count","AdmissionControl_CurrentValue":"count","Flush_Event":"count","Net_TCP_RxQ":"segments/flow","Points_Memory":"B","Shard_Size_In_Bytes":"B","Thread_Waited_Time":"s/event","HTTP_TotalRequests":"count","ThreadPool_ActiveThreads":"count","Paging_MinfltRate":"count/s","Net_TCP_SendCWND":"B/flow","Cache_Request_Eviction":"count","Segments_Total":"count","FollowerCheck_Latency":"ms","Terms_Memory":"B","Heap_Used":"B","Master_ThrottledPendingTasksCount":"count","CB_EstimatedSize":"B","Indexing_ThrottleTime":"ms","StoredFields_Memory":"B","Master_PendingQueueSize":"count","Cache_FieldData_Size":"B","Paging_MajfltRate":"count/s","ThreadPool_TotalThreads":"count","ShardEvents":"count","Net_TCP_NumFlows":"count","Election_Term":"count"}[opensearch@a82e83b616c1 ~]

@peterzhuamazon @saratvemulapalli @guillaume-alvarez

@peterzhuamazon
Copy link
Member

As for both the TAR and DOCKER it is missing a few lines that rpm has:

echo 'true' > ${data_dir}/rca_enabled.conf
echo 'true' > ${config_dir}/performance_analyzer_enabled.conf
echo 'true' > ${config_dir}/rca_enabled.conf

Rest are the same in TAR/DOCKER/RPM.

@peterzhuamazon peterzhuamazon self-assigned this Jan 9, 2023
@peterzhuamazon
Copy link
Member

I will tweak the issues coming from install_tar script.

@peterzhuamazon
Copy link
Member

Update seems like all the configs only take effect in data folder.
https://github.com/opensearch-project/performance-analyzer/blob/6600a14899c00c7229ec7be79c034297f8577bac/src/main/java/org/opensearch/performanceanalyzer/config/PerformanceAnalyzerController.java#L28-L36

And seems like the missing error is fine as user can just add a config if needed.

Will let the performance analyzer team confirm. Thanks.

@peterzhuamazon
Copy link
Member

New development seems that even though these files not exist, they will still use the default value.
Even though the logs is outputting errors missing the files.
image

This means we only add for RPM at the time because RPM requires user to only use systemctl to start and stop, so we pre-enabled PA and RCA by default.

@peterzhuamazon
Copy link
Member

peterzhuamazon commented Jan 11, 2023

PA team confirms these details here:

These config files (performance_analyzer_enabled.conf) should be present in the data folder and not in either $OPENSEARCH_HOME/config or are they existing in $OPENSEARCH_HOME/config/opensearch-performance-analyzer

We can ignore the logs for now as batch_metrics will not affect other functionalities. I assume this happens only during startup. After both the process and plugin come up and if the file is still missing, we create this file with default value. If thats not case and we are seeing errors after long time, we should address the issue and create this file with default value.

Basically at startup if the files not present, it will error once but will auto create the file based on the defined default value.

We only explicitly enable it on RPM before is because they are not like TAR which is meant for easier installation and usages. On TAR user can choose to manually create the file on currently released versions at the time of the writing and restart the cluster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working docker
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants