Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Obtain Bitcask write locks using system call flock/2 #257

Closed
wants to merge 1 commit into from

Conversation

andytill
Copy link

@andytill andytill commented Feb 12, 2018

Previous to this change, the OS pid was used to lock the bitcask database and prevent two bitcask instances performing write operations at the same time. If the pid in the lock file is currently taken by an OS process then the bitcask DB may not be written to. Bitcask is not able to clean up the write locks because of a hard shutdown it may not be able to delete the "stale" lock file and obtain a new one for the following reasons:

  • Another process takes the pid in the lock file
  • Riak takes the pid in the lock file

This has been raised in issues #72, #226 and basho/riak#535.

This change uses the flock system call to obtain exclusive write access to the file across OS processes, as long as the other processes use flock
to obtain the lock.

The change maintains the previous behaviour where the DB can be opened even if a write lock cannot be obtained, but returns an error on write operations.

The O_EXCL flag has been removed from the call to open in the nifs because if the lock file still exists, it returns the eexist error but now the proof of the write lock is not the file existing but the call to flock.

…le creation

Previous to this change, the OS pid was used to lock the bitcask database
and prevent two bitcask instances performing write operations at the same
time. However, if bitcask is not able to clean up the write locks because
of a hard shutdown and another process takes that the pid in written to
the lock file, bitcask is not able to take the lock even though it is the
only running instance.

This change uses the flock system call to obtain exclusive write access
to the file across OS processes, as long as the other processes use flock
to obtain the lock.
@russelldb
Copy link
Member

Is this something for which it is even possible to create a Riak test for?

Copy link
Member

@russelldb russelldb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code is clean. The tests all pass for me. Raised a question about copyright, but that's not something I personally care about either way, just be aware it may spawn discussion.

I don't have the expertise to say whether this is the right approach to the problem from a c/systems level, though the tests certainly show that it works.

I'd feel better with a more competent c/systems eye on it before it is merged (@tburghart, @Vagabond, (@mrallen1, I notice you HEART this PR) ANY OTHER?)

+1 from me, for what it is worth

@@ -3,6 +3,7 @@
// bitcask: Eric Brewer-inspired key/value store
//
// Copyright (c) 2010 Basho Technologies, Inc. All Rights Reserved.
// Copyright (c) 2018 Workday, Inc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary? We had a chat about this recently (I'll copy you in via mail.)

@russelldb
Copy link
Member

Oh yeah, and if you do merge it, please close the related issues.

@ghost
Copy link

ghost commented Mar 6, 2018

Is there a Riak top level project branch somewhere which is already configured to use your branch @andytill ? The code looks +1 but , just to be sure, I would like to run it against some data if possible.

@andytill
Copy link
Author

andytill commented Mar 6, 2018

Sorry no, it's all linked in our internal riak_ee.

Is there a Riak top level project branch somewhere which is already configured to use your branch @andytill ? The code looks +1 but , just to be sure, I would like to run it against some data if possible.

@Licenser
Copy link
Contributor

Licenser commented Mar 6, 2018

Might it be worth considering fnctl instead of flock for portabilities sake?

https://www.perkin.org.uk/posts/solaris-portability-flock.html

is a good summary of the topic.

@russelldb
Copy link
Member

Thanks @Licenser. You linked to https://github.com/basho/leveldb/blob/44cb7cbc85590280c9a73856470d5880f4015927/util/env_posix.cc#L493 in slack, I'm just commenting here to bring that link into this discussion

@andytill
Copy link
Author

andytill commented Mar 6, 2018

Very difficult I think to reproduce the OS pid on the restarted Riak to provoke the original bug. With the change it is possible to trigger the locked error by starting starting Riak, perform one write, starting a second Riak and performing a write with that which will return an error.

Is this something for which it is even possible to create a Riak test for?

@Licenser
Copy link
Contributor

Licenser commented Mar 6, 2018

Oh yes, sorry @russelldb should have added that too.

@russelldb
Copy link
Member

russelldb commented Mar 6, 2018 via email

@Vagabond
Copy link
Contributor

Vagabond commented Mar 6, 2018

I'd definitely echo the suggestion to look at fnctl, the link Heinz posted even has some compatibility shims in addition to the eleveldb one.

@ghost
Copy link

ghost commented Mar 6, 2018

On the CI server (my laptop) the quickcheck tests fail - output attached.

bitcast-test-failure.txt

It would be brilliant if we could get the tests to pass - then +1 from me.

With the current implementation I caused data loss (last year) when opening a bitcask data directory twice, despite the the second open was flagged as read_only (via a second beam) any attention to this area feels useful.

Imho - fnctl is only a consideration for Solaris users - @heinz is not using Bitcask on Solaris, which means there are zero stakeholders I know of for whom this would cause direct inconvenience.

@ghost
Copy link

ghost commented Mar 6, 2018

wrt above comment - emfile - I've increased file descriptors and rerun the quickcheck tests.

@russelldb
Copy link
Member

@bryanhuntesl that failure is emfile, are you sure you have your ulimit high enough?

@ghost
Copy link

ghost commented Mar 6, 2018

Increased emfile limit to 10000 tests still failing - output attached.

bitcask-tests-failure.txt

@russelldb
Copy link
Member

@bryanhuntesl can you add some info (erl + eqc version, platform) I ran this 10 times today without failure, on my macbook

@ghost
Copy link

ghost commented Mar 6, 2018

Certainly - Macbook - i7/16GB

Erlang R16B02-basho1 (erts-5.10.3) [source] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:false]

▷  echo $ERL_LIBS (re-formatted)
:/Users/bryanhunt/quickcheck-1.36.1:
/Users/bryanhunt/quickcheck-1.36.1/eqc-1.36.1:
/Users/bryanhunt/quickcheck-1.36.1/pulse-1.36.1:
/Users/bryanhunt/quickcheck-1.36.1/pulse_otp-1.36.1

@russelldb
Copy link
Member

Not gonna moan about basho1 (should be basho10). Interesting. All the same as me. Ah well. Good luck someone with that one!

@ghost
Copy link

ghost commented Mar 6, 2018

Just to confirm - I am running 16b02-basho10 - seems like it's been truncated in the erl console output.

@Licenser
Copy link
Contributor

Licenser commented Mar 7, 2018

I am using bitcask on SmartOS. But that aside I see little reason not to use a POSIX compliant function when there is one.

@ghost
Copy link

ghost commented Mar 7, 2018

Sorry, my mistake, I thought you were doing everything with raw files/Postgres.

@Licenser
Copy link
Contributor

Licenser commented Mar 7, 2018

So many systems ;) in dalmatinerDB yes, it's files/postgres/(leveldb sadly that's tied to _core), snarl my AAA system uses bitcask for the timeouts. I'm happy to make a pullrequest on the pullrequest to move it to fnctl, it seems just to be once place that needs changes.

@ghost
Copy link

ghost commented Mar 7, 2018

Thanks for clarification @Licenser

@andytill
Copy link
Author

andytill commented Mar 7, 2018

@Licenser thanks for offering to make this Solaris compatible, I'd welcome this change to the PR.

@Licenser
Copy link
Contributor

Licenser commented Mar 8, 2018

The lock test does seem to fail on OSX (unmodified code):

    bitcask_lockops_tests: lock_cannot_be_obtained_on_already_locked_file_within_same_os_process_test...*failed*
in function bitcask_lockops_tests:'-lock_cannot_be_obtained_on_already_locked_file_within_same_os_process_test/0-fun-0-'/1 (test/bitcask_lockops_tests.erl, line 34)
**error:{assertEqual,[{module,bitcask_lockops_tests},
              {line,34},
              {expression,"bitcask_lockops : acquire ( write , Dir )"},
              {expected,{error,locked}},
              {value,{ok,<<>>}}]}
  output:<<"">>

update 1

Odd this did work on the second run, not sure as to why.

update 2

The eqc tests segfault :(

  bitcask_nifs: keydir_get_put_test_...........................................................................................make: *** [eunit_nif] Segmentation fault: 11

update 3

the calls tack would be:

Process 22623 stopped
* thread #16, name = '2_scheduler', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x00000000147dbea8 beam.smp`enif_send(env=0x00000000b06ba868, to_pid=<unavailable>, msg_env=0x0000000015f40818, msg=566859) at erl_nif.c:324 [opt]
   321
   322 	    if (env != NULL) {
   323 		c_p = env->proc;
-> 324 		if (receiver == c_p->common.id) {
   325 		    rp_locks = ERTS_PROC_LOCK_MAIN;
   326 		    flush_me = 1;
   327 		}
Target 0: (beam.smp) stopped.
(lldb) bt
* thread #16, name = '2_scheduler', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x00000000147dbea8 beam.smp`enif_send(env=0x00000000b06ba868, to_pid=<unavailable>, msg_env=0x0000000015f40818, msg=566859) at erl_nif.c:324 [opt]
    frame #1: 0x0000000014aea28a bitcask.so`msg_pending_awaken(env=0x00000000b06ba868, keydir=0x00007fa0dbf0ba10, msg=566859) at bitcask_nifs.c:2752
    frame #2: 0x0000000014ae5c4f bitcask.so`merge_pending_entries(env=0x00000000b06ba868, keydir=0x00007fa0dbf0ba10) at bitcask_nifs.c:2826
    frame #3: 0x0000000014ae5abd bitcask.so`itr_release_internal(env=0x00000000b06ba868, handle=0x0000000015ac1850) at bitcask_nifs.c:1995
    frame #4: 0x0000000014ae6449 bitcask.so`bitcask_nifs_keydir_resource_cleanup(env=0x00000000b06ba868, arg=0x0000000015ac1850) at bitcask_nifs.c:2920
    frame #5: 0x00000000147de2d2 beam.smp`nif_resource_dtor(bin=<unavailable>) at erl_nif.c:1420 [opt]
    frame #6: 0x00000000147b2173 beam.smp`sweep_off_heap [inlined] erts_bin_free(bp=0x0000000015ac1828) at erl_binary.h:331 [opt]
    frame #7: 0x00000000147b2165 beam.smp`sweep_off_heap(p=0x00000000161824b0, fullsweep=0) at erl_gc.c:2371 [opt]
    frame #8: 0x00000000147b04ec beam.smp`erts_garbage_collect [inlined] do_minor(p=<unavailable>, new_sz=<unavailable>, objv=<unavailable>, nobj=<unavailable>) at erl_gc.c:1180 [opt]
    frame #9: 0x00000000147af16f beam.smp`erts_garbage_collect at erl_gc.c:880 [opt]
    frame #10: 0x00000000147aeef2 beam.smp`erts_garbage_collect(p=<unavailable>, need=<unavailable>, objv=0x00000000b06bac78, nobj=1) at erl_gc.c:452 [opt]
    frame #11: 0x00000000147ada07 beam.smp`erts_gc_after_bif_call(p=0x00000000161824b0, result=<unavailable>, regs=<unavailable>, arity=<unavailable>) at erl_gc.c:373 [opt]
    frame #12: 0x00000000146b3418 beam.smp`process_main at beam_emu.c:2881 [opt]
    frame #13: 0x00000000147450a1 beam.smp`sched_thread_func(vesdp=0x00000000151782c0) at erl_process.c:8118 [opt]
    frame #14: 0x000000001486ea92 beam.smp`thr_wrapper(vtwd=0x00007ffeeb55ae50) at ethread.c:114 [opt]
    frame #15: 0x00007fff717ab6c1 libsystem_pthread.dylib`_pthread_body + 340
    frame #16: 0x00007fff717ab56d libsystem_pthread.dylib`_pthread_start + 377
    frame #17: 0x00007fff717aac5d libsystem_pthread.dylib`thread_start + 13

@andytill
Copy link
Author

Still waiting to get my eqc license approved by work, then I'll try to reproduce.

@llelf
Copy link

llelf commented Dec 29, 2018

Withfcntl(…, F_SETLK, …) (the only thing POSIX has), any close would lead to the process losing all its locks to that file.

@martincox martincox deleted the branch basho:2.0 January 26, 2022 08:32
@martincox martincox closed this Jan 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants