Redis migrate slots丢key排查

1.问题

redis slots迁移的时候,在迁移之后key数量会变少.

2.排查

2.1思考

  • redis 3.x也是比较成熟的产品了,为什么会丢key?别人有没有遇到同样的问题?
  • 假设丢key了,如果key是因为expire丢失,那应该是正常,如果没有expire丢失,就是问题了,首先复现问题。

2.2复现问题

0.准备集群

造了两个节点的集群:10.1.100.100:20003和10.1.100.100:20004,最大可使用内存200M,并保证在测试过程中不会导致内存满等其他问题

10.1.100.100:20004> cluster nodes  
2aed426536067077179a3d23875b93b223802dea 10.1.100.100:20003 master - 0 1482132098352 6 connected  
03a53320815d8b5f774810f2d41329007d60ebf4 10.1.100.100:20004 myself,master - 0 0 7 connected 0-16383  

key格式:{test}i, 保证所有的key使用同一个slot。{test}i的slot为6918,并且测试前slot里面没有key.

10.1.100.100:20004> cluster keyslot {test}_i  
(integer) 6918
10.1.100.100:20004> cluster countkeysinslot 6918  
(integer) 0
10.1.100.100:20003> cluster countkeysinslot 6918  
(integer) 0

# 迁移函数,配合redis-trib fix迁移
def migrate_from_4_to_3(slot):  
    cmd="cluster setslot %s migrating 2aed426536067077179a3d23875b93b223802dea" % (slot,)
    cli4.execute_command(cmd)
    cmd="cluster setslot %s importing 03a53320815d8b5f774810f2d41329007d60ebf4" % (slot,)
    cli3.execute_command(cmd)

def migrate_from_3_to_4(slot):  
    cmd="cluster setslot %s migrating 03a53320815d8b5f774810f2d41329007d60ebf4" % (slot,)
    cli3.execute_command(cmd)
    cmd="cluster setslot %s importing 2aed426536067077179a3d23875b93b223802dea" % (slot,)
    cli4.execute_command(cmd)

1.非过期key测试

10.1.100.100:20003> cluster countkeysinslot 6918  
(integer) 0
10.1.100.100:20004> cluster countkeysinslot 6918  
(integer) 0

# 向10.1.100.100:20004写入20000个不带过期时间的key
for i in range(20000):  
    cmd="set {test}_%s %s" % (i, i)
    print cmd, cli4.execute_command(cmd)

10.1.100.100:20004> cluster countkeysinslot 6918  
(integer) 20000

# 迁移slot 6918
migrate_from_4_to_3(6918)  
redis-trib fix 10.1.100.100:20004

#check
10.1.100.100:20003> cluster countkeysinslot 6918  
(integer) 20000
10.1.100.100:20004> cluster countkeysinslot 6918  
(integer) 0

结论: 20000个key全部迁移,没有问题。

2.部分带过期时间的key测试

使用和上面相同的方法,测试{20000个不过期的key,20000个带过期时间的key}的情况。使用了{test}i的slot=6918和{bug}i的slot=7910这两个slot进行了测试。

如果迁移的过程中没有key正在过期,发现迁移后key的数量也会减少

如果有迁移的过程中有key正在过期,那么迁移完成后key的数量少于20000,并且多次实验测试少的key的数量不同,有的时候少几百个,有的时候少2000多个。

说明,如果在迁移slot的过程中,如果有key过期,那么会对那么没有过期时间的key造成影响,导致丢失一些不过期的key.

3.是不是redis-trib的问题?

redis-trib在判断是不是迁移完成时,只判断了getkeysinslot,当getkeysinslot返回空时就直接认为迁移完成了,直接退出。
所以在代码里面添加了countkeysinslot,当两者同时为0时,在尝试判断10次在退出试一下。

# 修改后的redis-trib.rb
...
# Migrate all the keys from source to target using the MIGRATE command
zerocounttime = 0  
while true  
    keys = source.r.cluster("getkeysinslot",slot,o[:pipeline])
    cntkeysinslot = source.r.cluster("countkeysinslot",slot)
    if keys.length == 0 && cntkeysinslot == 0
        while true
            keys = source.r.cluster("getkeysinslot",slot,o[:pipeline])
            cntkeysinslot = source.r.cluster("countkeysinslot",slot)
            if keys.length == 0 && cntkeysinslot == 0
                zerocounttime = zerocounttime + 1
                print "try #{zerocounttime}\n"
            end
            if zerocounttime > 10
                break
            end
        end
    end
    break if zerocounttime > 10
    zerocounttime = 0
    #do migrate keys
    ...

结论:问题同样复现,不是这里的问题。

4.redis新版本如何?

我们用的redis版本为3.2.1,新版本是不是也存在这个问题?

在github上clone了最新的代码

3.x的最新版本3.2.6, 问题可以复现。

4.0, 问题没有复现。并且在测试中发现,4.0的redis-server和当前3.2的无法兼容。

并且在测试中发现,4.0的redis-server和当前3.2的无法兼容。

例如10.1.100.100:20003是3.2.1,10.1.100.100:20004是4.0,cluster nodes如下:

10.1.100.100:20003> cluster nodes  
2aed426536067077179a3d23875b93b223802dea 10.1.100.100:20003 myself,master - 0 0 14 connected 5638  
03a53320815d8b5f774810f2d41329007d60ebf4 10.1.100.100:20004 master,fail? - 1482152712981 1482152711815 13 disconnected 0-5637 5639-16383  
10.1.100.100:20004> cluster nodes  
03a53320815d8b5f774810f2d41329007d60ebf4 10.1.100.100:20004@30004 myself,master - 0 0 13 connected 0-5637 5639-16383  
2aed426536067077179a3d23875b93b223802dea 10.1.100.100:20003@30003 master,fail? - 1482152711827 1482152711026 14 connected 5638  

全部是4.0时,cluster nodes如下:

10.1.100.100:20003> cluster nodes  
03a53320815d8b5f774810f2d41329007d60ebf4 10.1.100.100:20004@30004 master - 0 1482152867134 13 connected 0-5637 5639-16383  
2aed426536067077179a3d23875b93b223802dea 10.1.100.100:20003@30003 myself,master - 0 0 14 connected 5638  
10.1.100.100:20004> cluster nodes  
03a53320815d8b5f774810f2d41329007d60ebf4 10.1.100.100:20004@30004 myself,master - 0 0 13 connected 0-5637 5639-16383  
2aed426536067077179a3d23875b93b223802dea 10.1.100.100:20003@30003 master - 0 1482152864189 14 connected 5638  

所以继续使用3.2.6的排查。

5.问题在哪?

修改redis-trib代码添加迁移的key 的log,修改redis 3.2.6的代码,在源节点的log里面打印处理的key

...
# Migrate all the keys from source to target using the MIGRATE command
while true  
    keys = source.r.cluster("getkeysinslot",slot,o[:pipeline])
    break if keys.length == 0
    print "\nmigrating keys: #{keys}"  # debug log
    begin
        source.r.client.call(["migrate",target.info[:host],target.info[:port],"",0,@timeout,:keys,*keys])
    end
    ...
    print "."*keys.length if o[:dots]
    STDOUT.flush
end  
...

redis 3.2.6

//add log for restoreCommand
    ...
    /* Create the key and set the TTL if any */
    dbAdd(c->db,c->argv[1],obj);
    if (ttl) setExpire(c->db,c->argv[1],mstime()+ttl);
    signalModifiedKey(c->db,c->argv[1]);
    addReply(c,shared.ok);
    server.dirty++;
    if (sdslen(c->argv[1]->ptr) < 12){ // debug log
        serverLog(LL_WARNING,"store key ok, key:%s, ttl:%lld", c->argv[1]->ptr, ttl);
    }
}


// add log for migrateCommand
...
/* Create RESTORE payload and generate the protocol to call the command. */
for (j = 0; j < num_keys; j++) {  
    expireat = getExpire(c->db,kv[j]);
    if (expireat != -1) {
        ttl = expireat-mstime();
        if (ttl < 1) ttl = 1;
    }
    serverAssertWithInfo(c,NULL,rioWriteBulkCount(&cmd,'*',replace ? 5 : 4));
    if (server.cluster_enabled)
        serverAssertWithInfo(c,NULL,
            rioWriteBulkString(&cmd,"RESTORE-ASKING",14));
    else
        serverAssertWithInfo(c,NULL,rioWriteBulkString(&cmd,"RESTORE",7));
    serverAssertWithInfo(c,NULL,sdsEncodedObject(kv[j]));
    serverAssertWithInfo(c,NULL,rioWriteBulkString(&cmd,kv[j]->ptr,
            sdslen(kv[j]->ptr)));
    serverAssertWithInfo(c,NULL,rioWriteBulkLongLong(&cmd,ttl));
    serverLog(LL_WARNING,"prepare cmd, key:%s, ttl:%lld", kv[j]->ptr, ttl);//debug log
    /* Emit the payload argument, that is the serialized object using
     * the DUMP format. */
    createDumpPayload(&payload,ov[j]);
    serverAssertWithInfo(c,NULL,
        rioWriteBulkString(&cmd,payload.io.buffer.ptr,
                           sdslen(payload.io.buffer.ptr)));
    sdsfree(payload.io.buffer.ptr);

    /* Add the REPLACE option to the RESTORE command if it was specified
     * as a MIGRATE option. */
    if (replace)
        serverAssertWithInfo(c,NULL,rioWriteBulkString(&cmd,"REPLACE",7));
}
...
...
/* Create RESTORE payload and generate the protocol to call the command. */
for (j = 0; j < num_keys; j++) {  
    serverLog(LL_WARNING,"key:%s", kv[j]->ptr);  // debug log
    expireat = getExpire(c->db,kv[j]);
    if (expireat != -1) {
        ttl = expireat-mstime();
        if (ttl < 1) ttl = 1;
    }
    ...
}
...

问题复现,这次丢失的key为:

>>> len(notfound)
7  
>>>
>>> notfound
[4, 3994, 3995, 3996, 3997, 3998, 3999]

在redis-trib和source节点都能看到缺少的key的日志,很明显的看到key被迁移了。

在target的节点看到日志是这样的:

125838:M 21 Dec 11:40:36.390 # store key ok, key:{haha}_399, ttl:0  
125838:M 21 Dec 11:40:36.390 # store key ok, key:{haha}_3990, ttl:0  
125838:M 21 Dec 11:40:36.426 # store key ok, key:{haha}_3991, ttl:0  
125838:M 21 Dec 11:40:36.445 # store key ok, key:{haha}_3992, ttl:0  
125838:M 21 Dec 11:40:36.467 # store key ok, key:{haha}_3993, ttl:0  
125838:M 21 Dec 11:40:36.486 # store key ok, key:{haha}_3994, ttl:1  
125838:M 21 Dec 11:40:36.505 # store key ok, key:{haha}_3995, ttl:14  
125838:M 21 Dec 11:40:36.524 # store key ok, key:{haha}_3996, ttl:22  
125838:M 21 Dec 11:40:36.544 # store key ok, key:{haha}_3997, ttl:25  
125838:M 21 Dec 11:40:36.561 # store key ok, key:{haha}_3998, ttl:30  
125838:M 21 Dec 11:40:36.580 # store key ok, key:{haha}_3999, ttl:32  
125838:M 21 Dec 11:40:36.599 # store key ok, key:{haha}_4, ttl:35  
125838:M 21 Dec 11:40:36.632 # store key ok, key:{haha}_40, ttl:0  
125838:M 21 Dec 11:40:36.632 # store key ok, key:{haha}_400, ttl:0  
125838:M 21 Dec 11:40:36.632 # store key ok, key:{haha}_4000, ttl:0  
125838:M 21 Dec 11:40:36.632 # store key ok, key:{haha}_4001, ttl:0  

所有丢失的key的ttl都不对!

去源节点看一下ttl是啥?

126113:M 21 Dec 11:40:36.561 # key:{haha}_39979  
126113:M 21 Dec 11:40:36.561 # prepare cmd, key:{haha}_39979, ttl:30  
126113:M 21 Dec 11:40:36.561 # key:{haha}_3998  
126113:M 21 Dec 11:40:36.561 # prepare cmd, key:{haha}_3998, ttl:30  
126113:M 21 Dec 11:40:36.561 # key:{haha}_39980  
126113:M 21 Dec 11:40:36.561 # prepare cmd, key:{haha}_39980, ttl:32  
126113:M 21 Dec 11:40:36.580 # key:{haha}_39988  
126113:M 21 Dec 11:40:36.580 # prepare cmd, key:{haha}_39988, ttl:30  
126113:M 21 Dec 11:40:36.580 # key:{haha}_39989  
126113:M 21 Dec 11:40:36.580 # prepare cmd, key:{haha}_39989, ttl:32  
126113:M 21 Dec 11:40:36.580 # key:{haha}_3999  
126113:M 21 Dec 11:40:36.580 # prepare cmd, key:{haha}_3999, ttl:32  
126113:M 21 Dec 11:40:36.580 # key:{haha}_39990  
126113:M 21 Dec 11:40:36.580 # prepare cmd, key:{haha}_39990, ttl:35  

说明所有丢失key的ttl因为没有处理而使用了前一个key的ttl!

问题出在下面代码的for循环,对于不过期的key,ttl应该是0,但是如果前面有过期的key,ttl>0.那么在下一个处理不过期key时,expireat=-1,不会进入if,ttl还是使用前一个ttl,导致一个永不过期的key因为ttl>0而过期。

/* MIGRATE host port key dbid timeout [COPY | REPLACE]
 *
 * On in the multiple keys form:
 *
 * MIGRATE host port "" dbid timeout [COPY | REPLACE] KEYS key1 key2 ... keyN */
void migrateCommand(client *c) {  
    long long ttl, expireat;
    ttl = 0;
    ...

    /* Create RESTORE payload and generate the protocol to call the command. */
    /*
        问题出在这个for循环,对于不过期的key,ttl应该是0,但是如果前面有过期的key,ttl>0.
        在处理不过期key时,expireat=-1,导致ttl还是使用前一个ttl.
        导致一个永不过期的key因为ttl>0而过期。
    */
    for (j = 0; j < num_keys; j++) {
        /
        expireat = getExpire(c->db,kv[j]);
        if (expireat != -1) {
            ttl = expireat-mstime();
            if (ttl < 1) ttl = 1;
        }
        serverAssertWithInfo(c,NULL,rioWriteBulkCount(&cmd,'*',replace ? 5 : 4));
        if (server.cluster_enabled)
            serverAssertWithInfo(c,NULL,
                rioWriteBulkString(&cmd,"RESTORE-ASKING",14));
        else
            serverAssertWithInfo(c,NULL,rioWriteBulkString(&cmd,"RESTORE",7));
        serverAssertWithInfo(c,NULL,sdsEncodedObject(kv[j]));
        serverAssertWithInfo(c,NULL,rioWriteBulkString(&cmd,kv[j]->ptr,
                sdslen(kv[j]->ptr)));
        serverAssertWithInfo(c,NULL,rioWriteBulkLongLong(&cmd,ttl));

        /* Emit the payload argument, that is the serialized object using
         * the DUMP format. */
        createDumpPayload(&payload,ov[j]);
        serverAssertWithInfo(c,NULL,
            rioWriteBulkString(&cmd,payload.io.buffer.ptr,
                               sdslen(payload.io.buffer.ptr)));
        sdsfree(payload.io.buffer.ptr);

        /* Add the REPLACE option to the RESTORE command if it was specified
         * as a MIGRATE option. */
        if (replace)
            serverAssertWithInfo(c,NULL,rioWriteBulkString(&cmd,"REPLACE",7));
    }

6.提PR?

去github看一眼最新的代码,发现就在几天前的12.14号被@badboy发现并fix了,好悲伤~

https://github.com/antirez/redis/pull/3673/files