Skip to content

ClientDataModel

David Anderson edited this page Aug 22, 2023 · 3 revisions

Client data model

This document describes changes in 6.13 to the client to support distributed storage.

Current

FILE_INFO elements

  • status (present, not present, error)
  • urls
  • bool generated_locally
  • bool upload_when_present
  • bool uploaded
  • bool sticky

Problems

Many. Example: suppose the server asks the client to upload a file that the client doesn't have. Since generated_locally is false and the file is not present, the client will try to download it (from the upload URL!).

Proposed

FILE_INFO elements:

  • status
  • upload_urls
  • download_urls
  • bool uploaded
  • bool sticky

Policy:

  • If a file has a download URL and is not present, download it
  • If a file has an upload URL, is present, and uploaded is false, upload it
  • start a job if its input files are either present or optional_input

Handling <file_info> elements in scheduler replies:

  • if referenced from an app version or workunit, store URLs in download_urls
  • if referenced from a result, store URL in upload_urls.

Deprecated fields in scheduler replies

  • <generated_locally>
  • <upload_when_present>

Handling upload requests:

  • Clear "uploaded" flag
  • If the file isn't present, mark result as error and put appropriate text in stderr_out.

Compatibility: we'll change <file_info> to <file> in client_state.xml. We'll parse <file_info> elements (for upward compatibility). We won't provide backward compatibility (i.e., if you upgrade to 6.13.x, then downgrade to 6.12, all tasks and app versions will disappear).

Locally-generated input files

One (hypothetical) class of files: input files which, if not present, are generated computationally by the app. Such files should be listed (in sched reply) as sticky optional input files with no download URL, and as optional output files (this causes them to be marked as present).

The app must use file locking to ensure that two jobs don't try to generate the file at the same time.

Upload certificates

Upload certificates are a mechanism to prevent upload server DoS attacks, and also to prevent an attack in which bad guys upload result files for other users' jobs. Currently no projects that I know of use this mechanism, but it is worth keeping available.

Current: the upload certificate is a signature of the rest of the <file_info> element.

Problem: this makes <file_info> elements sensitive to white space changes, and it makes it difficult to parse <file_info> using XML_PARSER.

Solution: the upload certificate is a signature of

<name>x</name><max_nbytes>N</max_nbytes>

(no white space, N written with %.0f).

Server changes

  • file_upload_handler: parse the request with XML_PARSER, then convert to above form
  • process_result_template.cpp: generate signatures as above
Clone this wiki locally