On this page:

22.6smileutil: Upload CSV Bulk Import File

 

The upload-csv-bulk-import-file command may be used to upload a CSV file to an ETL Importer module by submitting it to the ETL Import Endpoint.

22.6.1Usage

 
bin/smileutil upload-csv-bulk-import-file -b "username:password" -f "/path/to/sourcefile.csv" -u "http://localhost:9000" -i "etl_module"

22.6.2Options

 
  • -f [filename or directory] (or --filename) – This argument should point to an individual file or to a directory (in which case all files in the directory will be processed). Note that any files with an extension ending in .gz or .bz2 will be expanded during processing.
  • -i [module id] (or --module-id) – This argument should supply the ID of the ETL module on the same node as the JSON Admin API module.
  • -u [url] (or --url) – The base directory for the JSON Admin API server.
  • -b [username:password] (or --basic-auth [username:password]) – (optional) If specified, provides a username and password that will be supplied to the server in an HTTP Basic Authorization header in the form of "username:password". If the value supplied is "PROMPT", smileutil will prompt the user to enter credentials interactively.
  • -m [path] (or --move-after) – (optional) If supplied, source files will be moved to the given directory after they have been uploaded.
  • -s [number] (or --split-rows) – (optional) If supplied, this is the number of rows to send in each batch. See Sending Batches for a Single File below.
  • -q [number] (or --quit-after) – (optional) If supplied, the command will exit after processing this number of files.
  • -r [number] (or --retry-count) – (optional) If supplied, the command will automatically retry after any failures, up to the specified number of times. This is useful in cases where network problems might interrupt a failed upload. For example, if this parameter is supplied a value of 2 up to three attempts will be made to deliver an individual file before aborting.
  • -k [count] (or --skip-rows) – (optional) If specified, the command will skip the first N rows instead of delivering them. Note that the very first row in a given file is assumed to be a header and is not skipped or counted. Also note that if multiple files are being transmitted using a directory as the source value, the first N rows across all files are skipped (not the first N rows in each file).

22.6.3Sending Batches for a Single File

 

The -s or --split-rows argument may be used

Each row is indeed its own transaction, and this is not affected by the "-s" parameter. The -s command affects how many rows get sent to the server at a time. The only reason we want to send batches of rows (as opposed to sending the whole file at once) is so that the server can send back progress to the user. In other words, say the file has 1,000,000 rows. If we send them all at once, we need to wait until all 1,000,000 rows are processed before smileutil gets any response back from the server, so there isn't much in the way of visible signs of progress. On the other hand, if we break this up into increments of 1,000, the user gets lots of feedback since they will see an update every 1000 rows. Other than this difference, there is no effect on performance or behavior from the "-s" argument.