Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ bug ] 阿里云mongodb无法闯将索引 #1138

Closed
czxin788 opened this issue Jul 21, 2021 · 5 comments
Closed

[ bug ] 阿里云mongodb无法闯将索引 #1138

czxin788 opened this issue Jul 21, 2021 · 5 comments

Comments

@czxin788
Copy link
Contributor

czxin788 commented Jul 21, 2021

重现步骤

提交两个sql报错
image

image

单独提交一个sql就没问题:
image

版本信息

应用版本/分支:

部署方式:Docker、手工部署

@fancy-lee
Copy link
Contributor

你的版本是多少,实例是怎么配置的?django_q启动了吗?看一下后台日志报错?

@weideguo
Copy link
Collaborator

遇到同样的问题,应该是云数据库导致的,获取到的slave信息为云的ip与端口,不能直连

创建补丁 mongo.py.20211117.patch

diff --git a/sql/engines/mongo.py b/sql/engines/mongo.py
index 8e7143f..eef5e1a 100644
--- a/sql/engines/mongo.py
+++ b/sql/engines/mongo.py
@@ -295,9 +295,10 @@ class MongoEngine(EngineBase):
         slave_msg = self.exec_cmd(sql)
         if slave_msg.lower().find('undefined') < 0:
             sp_host = slave_msg.replace("\"", "").split(":")
-            self.host = sp_host[0]
-            self.port = int(sp_host[1])
-            return True
+            #self.host = sp_host[0]
+            #self.port = int(sp_host[1])
+            #return True
+            return False
         else:
             return False

应用补丁
patch -p1 < mongo.py.20211117.patch

@czxin788
Copy link
Contributor Author

我现在升级到archery 1.9.0,还是报上面这个错误,搜了一下issue,原来以前旧版本就有这个问题,应该把你的代码合并到master分支, @weideguo
image

@czxin788
Copy link
Contributor Author

czxin788 commented Sep 7, 2022

经过确认,产生该问题的原因,正是weideguo所说,只有云mongodb才会发生。
发生该问题的场景是,同时写两行和两行以上createIndex才会发生,别的dml语句不会发生这个问题。
在执行createIndex语句时,mongo.py脚本要通过备节点去获取表行数,如果表行数大于500万,会提醒不能在高峰期创建索引。

获取备节点ip的sql语句是:

var host=""; rs.status().members.forEach(function(item) {i=1; if (item.stateStr =="SECONDARY") \
        {host=item.name } }); print(host);

对于自建mongodb,是可以正常获取到备节点ip的端口的,如下name字段:

rs.status().members
	{
		"_id" : 5,
		"name" : "192.168.1.33:27018",
		"health" : 1,
		"state" : 2,
		"stateStr" : "SECONDARY",
		"uptime" : 371036,
		"optime" : {
			"ts" : Timestamp(1662533428, 1),
			"t" : NumberLong(15)
		},
		"optimeDurable" : {
			"ts" : Timestamp(1662533428, 1),
			"t" : NumberLong(15)
		},
		"optimeDate" : ISODate("2022-09-07T06:50:28Z"),
		"optimeDurableDate" : ISODate("2022-09-07T06:50:28Z"),
		"lastHeartbeat" : ISODate("2022-09-07T06:50:37.258Z"),
		"lastHeartbeatRecv" : ISODate("2022-09-07T06:50:37.256Z"),
		"pingMs" : NumberLong(0),
		"lastHeartbeatMessage" : "",
		"syncingTo" : "192.168.1.33:27017",
		"syncSourceHost" : "192.168.1.33:27017",
		"syncSourceId" : 4,
		"infoMessage" : "",
		"configVersion" : 7
	},
	{
		"_id" : 6,
		"name" : "192.168.1.33:27019",
		"health" : 1,
		"state" : 2,
		"stateStr" : "SECONDARY",
		"uptime" : 370935,
		"optime" : {
			"ts" : Timestamp(1662533428, 1),
			"t" : NumberLong(15)
		},
		"optimeDurable" : {
			"ts" : Timestamp(1662533428, 1),
			"t" : NumberLong(15)
		},
		"optimeDate" : ISODate("2022-09-07T06:50:28Z"),
		"optimeDurableDate" : ISODate("2022-09-07T06:50:28Z"),
		"lastHeartbeat" : ISODate("2022-09-07T06:50:37.258Z"),
		"lastHeartbeatRecv" : ISODate("2022-09-07T06:50:37.254Z"),
		"pingMs" : NumberLong(0),
		"lastHeartbeatMessage" : "",
		"syncingTo" : "192.168.1.33:27017",
		"syncSourceHost" : "192.168.1.33:27017",
		"syncSourceId" : 4,
		"infoMessage" : "",
		"configVersion" : 7

可是,对于云mongodb,获取到的name字段,就不是真实的备节点ip地址了,如下:

 rs.status().members
	{
		"_id" : 1,
		"name" : "secondaryNode",
		"health" : 1,
		"state" : 2,
		"stateStr" : "SECONDARY",
		"uptime" : 55036037,
		"optime" : {
			"ts" : Timestamp(1662533378, 1),
			"t" : NumberLong(1)
		},
		"optimeDurable" : {
			"ts" : Timestamp(1662533378, 1),
			"t" : NumberLong(1)
		},
		"optimeDate" : ISODate("2022-09-07T06:49:38Z"),
		"optimeDurableDate" : ISODate("2022-09-07T06:49:38Z"),
		"lastHeartbeat" : ISODate("2022-09-07T06:49:42.818Z"),
		"lastHeartbeatRecv" : ISODate("2022-09-07T06:49:41.451Z"),
		"pingMs" : NumberLong(0),
		"lastHeartbeatMessage" : "",
		"syncingTo" : "",
		"syncSourceHost" : "",
		"syncSourceId" : -1,
		"infoMessage" : "",
		"configVersion" : 1
	},
	{
		"_id" : 2,
		"name" : "hiddenNode",
		"health" : 1,
		"state" : 2,
		"stateStr" : "SECONDARY",
		"uptime" : 55036037,
		"optime" : {
			"ts" : Timestamp(1662533378, 1),
			"t" : NumberLong(1)
		},
		"optimeDurable" : {
			"ts" : Timestamp(1662533378, 1),
			"t" : NumberLong(1)
		},
		"optimeDate" : ISODate("2022-09-07T06:49:38Z"),
		"optimeDurableDate" : ISODate("2022-09-07T06:49:38Z"),
		"lastHeartbeat" : ISODate("2022-09-07T06:49:41.670Z"),
		"lastHeartbeatRecv" : ISODate("2022-09-07T06:49:42.383Z"),
		"pingMs" : NumberLong(0),
		"lastHeartbeatMessage" : "",
		"syncingTo" : "",
		"syncSourceHost" : "",

看到云mongodb获取到的name值是secondaryNode和hiddenNode,这样archery就办法连上备节点执行sql语句了。

为了解决云mongodb不能创建索引的问题,就按照 @weideguo 的方法,把sql/engines/mongo.py的get_salve函数:

 def get_slave(self):
        """获得从节点的port和host"""

        sql = """var host=""; rs.status().members.forEach(function(item) {i=1; if (item.stateStr =="SECONDARY") \
        {host=item.name } }); print(host);"""
        slave_msg = self.exec_cmd(sql)
        if slave_msg.lower().find("undefined") < 0:
            sp_host = slave_msg.replace('"', "").split(":")
            self.host = sp_host[0]
            self.port = int(sp_host[1])
            return True
        else:
            return False

改成

     def get_slave(self):
        """获得从节点的port和host"""

        sql = """var host=""; rs.status().members.forEach(function(item) {i=1; if (item.stateStr =="SECONDARY") \
        {host=item.name } }); print(host);"""
        slave_msg = self.exec_cmd(sql)
        # 如果是阿里云的云mongodb,就不获取备节点ip和端口了,因为获取不到真实的备节点ip,那就干脆直接用主节点来执行sql
        if slave_msg.strip() == 'SECONDARY' or slave_msg.strip() == 'hiddenNode':
            return False
        if slave_msg.lower().find("undefined") < 0:
            sp_host = slave_msg.replace('"', "").split(":")
            self.host = sp_host[0]
            logger.warning(self.host + '  5555')
            self.port = int(sp_host[1])
            return True
        else:
            return False

@Aaron199
Copy link

Aaron199 commented Apr 13, 2023

但是华为云获取的member信息是ip:port方式,这个ip在vpc网络是访问不到的,修复方法使用":"判断,华为云就不适用了
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants