Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repeated crashes during merge operations [JIRA: RIAK-1903] #217

Open
binarytemple opened this issue Jun 25, 2015 · 4 comments
Open

Repeated crashes during merge operations [JIRA: RIAK-1903] #217

binarytemple opened this issue Jun 25, 2015 · 4 comments

Comments

@binarytemple
Copy link

This is a 1.4.1 (bitcask 1.6.3) deployment. Seeing repeated crashes during merge operations. Couldn't find any similar existing issues. No expectation of fix in this version, but creating for identification purposes/in case it exists in 1.7.

2015-06-20 05:10:24 =ERROR REPORT====
** Generic server <0.15562.5572> terminating 
** Last message in was {'DOWN',#Ref<0.0.242202.167542>,process,<0.15503.5572>,normal}
** When Server state == {state,undefined,undefined}
** Reason for termination == 
** {{badmatch,{error,badarg}},[{bitcask_file,handle_info,2,[{file,"src/bitcask_file.erl"},{line,170}]},{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,607}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}
2015-06-20 05:10:24 =CRASH REPORT====
  crasher:
    initial call: bitcask_file:init/1
    pid: <0.15562.5572>
    registered_name: []
    exception exit: {{{badmatch,{error,badarg}},[{bitcask_file,handle_info,2,[{file,"src/bitcask_file.erl"},{line,170}]},{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,607}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]},[{gen_server,terminate,6,[{file,"gen_server.erl"},{line,747}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}
    ancestors: [<0.15503.5572>]
    messages: []
    links: []
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 610
    stack_size: 24
    reductions: 423
  neighbours:
@Basho-JIRA Basho-JIRA changed the title Repeated crashes during merge operations Repeated crashes during merge operations [JIRA: RIAK-1903] Jun 25, 2015
@hcs42
Copy link

hcs42 commented Jun 25, 2015

This failure is happens in this line of bitcask.erl because we fail to close a bitcask file. In 1.4.1, this line looks like this:

ok = file:close(Fd),

The ok was modified to _ by @slfritchie and @engelsanchez in this commit with the following commit message:

Buildbot isn't quite right in the head: I've no idea why I can run this test 3000x without failure, but Buildbot fails

bitcask_qc: merge2_test...
=ERROR REPORT==== 5-May-2014::04:56:18 ===
** Generic server <0.2324.0> terminating
** Last message in was {'DOWN',#Ref<0.0.0.50927>,process,<0.2317.0>,normal}
** When Server state == {state,undefined,undefined}
** Reason for termination ==
** {{badmatch,{error,badarg}},
    [{bitcask_file,handle_info,2,[{file,"src/bitcask_file.erl"},{line,193}]},
     {gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,604}]},
     {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}

Fix: Really, we're stopping, so change the pattern match to '_' so we don't care.

@slfritchie, @engelsanchez, do you think that if this file:close returns {error, badarg} in production, that is a problem? Thank you.

@engelsanchez
Copy link
Contributor

Sounds like the file has already been explicitly closed before getting the 'DOWN' message. So this is called:

ok = file:close(Fd),

And then we try to close the same already closed file descriptor and get a badarg. It seem it would be more robust to change the fd to undefined after closing it, and checking for undefined in the file:close for the 'DOWN' message.

@hcs42
Copy link

hcs42 commented Jun 26, 2015

Thank you, I'm happy with closing the ticket then.

@engelsanchez
Copy link
Contributor

Leave it open @hcs42. It's a small race worth fixing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants