Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] Holding pipeline objects in a container in RedisCluster mode #520

Closed
srgorti opened this issue Sep 24, 2023 · 9 comments
Closed

Comments

@srgorti
Copy link

srgorti commented Sep 24, 2023

When trying to trying to replace hiredis-vip with Redis-plus-plus in our app, am observing a crash that is described below. Before that, our application uses Redis-cluster with (only) Lua scripts being run in pipeline mode. Currently, with hiredis-vip, a bunch of commands are batched using redisClusterAppendCommand and then redisClusterGetReplies is called for all of them, where each batch may target multiple cluster nodes. Based on my understanding of Redis-plus-plus, I am trying to prototype this as below.

  1. Create a pipeline based on the key. In the first version, the Pipeline is being created based on the slot-id, for which an extra get_cluster_slot function was added to ShardsPool - on the lines described here. Eventually, I plan to base Pipeline creation on Node after I get a basic test working.
  2. Run the eval command on this pipeline
  3. Save a pointer to the pipeline object in a map, with slot-id being the key.
  4. Continue this till the batch size is reached and then call pipeline.exec on each of the pipelines that have commands.
  5. Collect the replies and process them.

While this works for a batch-size of 1, with batch-size of 2, Step 1 crashes when creating the second pipeline. I would like to understand if this approach of saving the Pipeline objects is even supported - because somewhere Redis-plus-plus documentation recommends to limit usage of pipeline objects within a block. If it is supported (even if not recommended), any suggestions/best-practices on this usage approach.

Environment:

  • OS: AmazonLinux on EC2
  • Compiler: gcc 7.3.1
  • hiredis version: v1.2.0
  • redis-plus-plus version: 1.3.10

Additional context

From the stack bt, it is seen to crash when copying (from assignment= operator) the unordered set in the QueuedRedis class. This was happening after the second redisCluster.pipeline(key) call but before the assignment to the object on the map completed.

(gdb) bt
#0  std::__detail::_Hashtable_alloc<std::allocator<std::__detail::_Hash_node<unsigned long, false> > >::_M_deallocate_nodes (this=<optimized out>, __n=0x726f70736e617274)
    at /usr/include/c++/7/bits/hashtable_policy.h:2096
#1  std::_Hashtable<unsigned long, unsigned long, std::allocator<unsigned long>, std::__detail::_Identity, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, true, true> >::_M_move_assign (__ht=..., this=0x174d150) at /usr/include/c++/7/bits/hashtable.h:1148
#2  std::_Hashtable<unsigned long, unsigned long, std::allocator<unsigned long>, std::__detail::_Identity, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, true, true> >::operator= (__ht=..., this=0x174d150) at /usr/include/c++/7/bits/hashtable.h:468
#3  std::unordered_set<unsigned long, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<unsigned long> >::operator= (this=0x174d150) at /usr/include/c++/7/bits/unordered_set.h:96
#4  sw::redis::QueuedRedis<sw::redis::PipelineImpl>::operator= (this=0x174d120) at /home/XXXXX/tmp/rpp/redis++/include/sw/redis++/queued_redis.h:43
...

thanks,

@srgorti
Copy link
Author

srgorti commented Sep 25, 2023

Made a small program to repeat the behavior. With this, the crash happens on the first key itself.

#include <iostream>
#include <map>
#include <sw/redis++/redis++.h>

struct Context {
    sw::redis::RedisCluster *prc;
} ctx;

struct SlotPipeline {
    sw::redis::Pipeline  p;
    std::size_t          s;
};

static std::map<std::size_t, SlotPipeline*> slotPipelines;

void process_pair(const std::string& k, const std::string& v, bool do_exec) {
    std::cout << "key= " << k << " val= " << v << std::endl;

    struct SlotPipeline *sp = NULL;
    std::size_t slotId = sw::redis::ShardsPool::get_cluster_slot(k);
    std::cout << "cluster-slot: " << slotId << std::endl;
    std::map<std::size_t, SlotPipeline*>::iterator it = slotPipelines.find(slotId);
    if (it == slotPipelines.end()) {
        sp = (SlotPipeline *)malloc(sizeof(*sp));
        std::cout << "sp->p: type=" << typeid(sp->p).name() << std::endl;
        sp->p = ctx.prc->pipeline(k);
        slotPipelines[slotId] = sp;
    } else {
        sp = it->second;
    }
    sp->p.set(k, v);

    if (do_exec) {
        // TBD: fill this in
    }
}

int main()
{
    ctx.prc = new sw::redis::RedisCluster("tcp://127.0.0.1:30001");

    std::string keys[] = {
        "violet", "indigo", "blue", "green", "yellow", "orange", "red", ""
    };
    std::string values[] = {
        "flowers", "dyes", "skies", "trees", "bananas", "oranges", "apples"
    };

    for (int i = 0; !keys[i].empty(); i++) {
        process_pair(keys[i], values[i], false);
    }
    process_pair("black", "nights", true);
}

Output

 ./rpp-ex3
key= violet val= flowers
cluster-slot: 4135
sp->p: type=N2sw5redis11QueuedRedisINS0_12PipelineImplEEE
zsh: segmentation fault  ./rpp-ex3

@sewenew
Copy link
Owner

sewenew commented Sep 25, 2023

Since I don't know how you implement get_cluster_slot, I cannot reproduce your problem. However, your code has a bug. The length of keys is different from the length of values, and I'm not sure if that caused the problem.

Also please try-catch possible exceptions thrown by redis-plus-plus (you can have a try-catch block to protect the process_pair method), and print the exception error to see what happens.

@srgorti
Copy link
Author

srgorti commented Sep 25, 2023

Thanks for your response. I fixed the difference in the lengths of keys and values and fixed it by adding an empty string "" at the end of values. However, am still seeing the crash.

The diff for get_cluster_slot (wrt 1.3.10) is pasted below. Changes are limited to shards_pool.h and shards_pool.cpp.

--- shards_pool.h	2023-07-09 12:21:34.000000000 +0000
+++ ../../../../../redis-plus-plus/src/sw/redis++/shards_pool.h	2023-09-23 09:36:57.450514555 +0000
@@ -63,6 +63,10 @@

     Shards shards();

+    static std::size_t get_cluster_slot(const StringView &key) {
+        return ShardsPool::_cluster_slot(key);
+    }
+
 private:
     void _move(ShardsPool &&that);

@@ -80,6 +84,8 @@

     std::pair<SlotRange, Node> _parse_slot_info(redisReply &reply) const;

+    static std::size_t _cluster_slot(const StringView &key);
+
     // Get slot by key.
     std::size_t _slot(const StringView &key) const;


--- shards_pool.cpp	2023-07-09 12:21:34.000000000 +0000
+++ ../../../../../redis-plus-plus/src/sw/redis++/shards_pool.cpp	2023-09-23 09:38:34.619876365 +0000
@@ -277,7 +277,7 @@
     }
 }

-Slot ShardsPool::_slot(const StringView &key) const {
+std::size_t ShardsPool::_cluster_slot(const StringView &key) {
     // The following code is copied from: https://redis.io/topics/cluster-spec
     // And I did some minor changes.

@@ -307,6 +307,10 @@
     return crc16(k + s + 1, e - s - 1) & SHARDS;
 }

+Slot ShardsPool::_slot(const StringView &key) const {
+    return ShardsPool::_cluster_slot(key);
+}
+
 Slot ShardsPool::_slot() const {
     return _random(0, SHARDS);
 }

regards,

@sewenew
Copy link
Owner

sewenew commented Sep 26, 2023

Sorry, but I tried your code, and cannot reproduce your problem. Can you wrap your code with a try-catch block to see if any exception thrown by redis-plus-plus? You can output the error message, to see if something bad happens.

Regards

@srgorti
Copy link
Author

srgorti commented Sep 28, 2023

Thanks for your response. If it does not crash for you, that's a good sign for me - maybe some difference in the compiler version.
I have added the exception handling and removed the slot-id stuff. With my g++ version, it still crashes.

#include <iostream>
#include <map>
#include <sw/redis++/redis++.h>

struct Context {
    sw::redis::RedisCluster *prc;
} ctx;

struct SlotPipeline {
    sw::redis::Pipeline  p;
    std::size_t          s;
};

static std::map<std::size_t, SlotPipeline*> slotPipelines;

void process_pair(const std::string& k, const std::string& v, bool do_exec) {
    std::cout << "key= " << k << " val= " << v << std::endl;

    struct SlotPipeline *sp = NULL;
    sp = (SlotPipeline *)malloc(sizeof(*sp));
    std::cout << "sp->p: type=" << typeid(sp->p).name() << std::endl;
    sp->p = ctx.prc->pipeline(k);   // crashes here
    sp->p.set(k, v);

    if (do_exec) {
        // TBD: fill this in
    }
}

int main()
{
    ctx.prc = new sw::redis::RedisCluster("tcp://127.0.0.1:30001");

    std::string keys[] = {
        "violet", "indigo", "blue", "green", "yellow", "orange", "red", ""
    };
    std::string values[] = {
        "flowers", "dyes", "skies", "trees", "bananas", "oranges", "apples", ""
    };

    for (int i = 0; !keys[i].empty(); i++) {
        try {
            process_pair(keys[i], values[i], false);
        }
        catch (const sw::redis::ReplyError &err) {
            std::cout << "process_pair: ReplyError: %s"  << err.what() << std::endl;
        }
        catch (const sw::redis::TimeoutError &err) {
            // reading or writing timeout
            std::cout << "process_pair: TimeoutError: %s" <<  err.what() << std::endl;
        } catch (const sw::redis::ClosedError &err) {
            // the connection has been closed.
            std::cout << "process_pair: ClosedError: %s" <<  err.what() << std::endl;
        } catch (const sw::redis::IoError &err) {
            // there's an IO error on the connection.
            std::cout << "process_pair: IoError: %s" <<  err.what() << std::endl;
        }
        catch (const sw::redis::Error &err) {
           // other errors
           std::cout << "process_pair: Error: %s" <<  err.what() << std::endl;
        }
    }
    process_pair("black", "nights", true);
}

Compiler version:

/usr/bin/g++ -v
Using built-in specs.
COLLECT_GCC=/usr/bin/g++
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/7/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,objc,obj-c++,fortran,ada,go,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl --enable-libmpx --enable-libsanitizer --enable-gnu-indirect-function --enable-libcilkrts --enable-libatomic --enable-libquadmath --enable-libitm --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
Thread model: posix
gcc version 7.3.1 20180712 (Red Hat 7.3.1-17) (GCC)

@sewenew
Copy link
Owner

sewenew commented Oct 2, 2023

Does it throw any exception when it crashes? i.e. any of your try-catch block catches any exception? If it does, please show me the error info.

Also please check if you install multiple version of hiredis. If you do, uninstall all but keep only one of them. Otherwise, you might get some wired problem.

I'm on vacation, and sorry for the late reply...

Regards

@srgorti
Copy link
Author

srgorti commented Oct 2, 2023

Does it throw any exception when it crashes? i.e. any of your try-catch block catches any exception? If it does, please show me the error info.

No error/exception seen. That is what make it harder to debug this issue.

Also please check if you install multiple version of hiredis. If you do, uninstall all but keep only one of them. Otherwise, you might get some wired problem.

To avoid this situation, I have stopped linking both the test app and the real app with hiredis-vip. And also moved to static linking with libredis++.a and libhiredis.a. With this, no other version of hiredis influences this executable.

Menwhile, without using pipeline, our application is now working fine with Redis-plus-plus. The only issue is that we are not getting the performance that we were getting by using pipelining with hiredis-vip.

@sewenew
Copy link
Owner

sewenew commented Oct 2, 2023

I'd suggest you have a clean environment without hiredis-vip installed. Otherwise, you might get some wired problem. Like this issue mentioned.

Regards

@sewenew
Copy link
Owner

sewenew commented Nov 18, 2023

Close the issue, since there's no update.

Regards

@sewenew sewenew closed this as completed Nov 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants