Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents

workload

Workload refers both to the package and the tool. The package is built around an abstraction, called Generator, and the tool has various bits for working with generators.

...

make bin/workload will build the workload tool and put it at bin/workload.

Some notable fixtures:

bank

...

Bank

...

models

...

a

...

set

...

of

...

accounts

...

with

...

currency

...

balances

...

json

...

JSON

...

reads

...

and

...

writes

...

to

...

keys

...

spread

...

(by

...

default,

...

uniformly

...

at

...

random)

...

across

...

the

...

cluster

...

kv

...

KV

...

reads

...

and

...

writes

...

to

...

keys

...

spread

...

(by

...

default,

...

uniformly

...

at

...

random)

...

across

...

the

...

cluster

...

roachmart

...

Roachmart

...

models

...

a

...

geo-distributed

...

online

...

storefront

...

with

...

users

...

and

...

orders

...

tpcc

...

TPC-C

...

simulates

...

a

...

transaction

...

processing

...

workload

...

using

...

a

...

rich

...

schema

...

of

...

multiple

...

tables

...

tpch

...

TPC-H

...

is

...

a

...

read-only

...

workload

...

of

...

"analytics"

...

queries

...

on

...

large

...

datasets.

...

ycsb

...

YCSB

...

is

...

the

...

Yahoo!

...

Cloud

...

Serving

...

Benchmark

...

workload run

workload run <generator> [flags] is the fundamental operation of the workload tool. It runs the queryload of the specified generator, printing out per-operation stats every second: opcount, latencies (p50, p90, p95, p99, max). It also prints out some totals when it exceeds any specified --duration or --max-ops or when ctrl-c'd. The output looks something like

_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)

...

1s

...

0

...

15753.6

...

15825.4

...

0.8

...

2.0

...

3.3

...

15.2

...

read

...

1s

...

0

...

822.2

...

826.0

...

2.0

...

4.1

...

6.3

...

14.2

...

write

...

2s

...

0

...

14688.7

...

15257.0

...

0.8

...

2.2

...

3.9

...

17.8

...

read

...

2s

...

0

...

792.0

...

809.0

...

2.2

...

4.5

...

7.3

...

11.5

...

write

...

3s

...

0

...

15569.2

...

15361.0

...

0.8

...

2.0

...

3.4

...

16.8

...

read

...

3s

...

0

...

857.0

...

825.0

...

2.0

...

4.5

...

7.3

...

10.5

...

write

...

[...]

...

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total

...

11.5s

...

0

...

176629

...

15425.0

...

0.9

...

0.8

...

2.0

...

3.5

...

28.3

...

read

...

11.5s

...

0

...

9369

...

818.2

...

2.5

...

2.1

...

5.5

...

8.9

...

18.9

...

write

...

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result

...

11.5s

...

0

...

185998

...

16243.2

...

1.0

...

0.8

...

2.4

...

4.5

...

28.3

...

BenchmarkWorkload/generator=ycsb

...

185998

...

61564

...

ns/op

...

There are a few common flags and each generator can define its own, so be sure to check out workload run <generator> --help to see what’s available. By default, run expects the cluster to be on the default port on localhost, but this can be overridden: workload run <generator> <crdb uri>.

...

Fixtures are all stored under gs://cockroach-fixtures/workload on Google Cloud Storage. This bucket is multi-regional, so it can be used without paying bandwidth costs from any GCE instance in the US. The bandwidth costs for large fixtures are significant, so think twice before using this from other parts of the world.

...

workload csv-server starts an http server that returns CSV data for generator tables. If you start one next to each node in a cockroach cluster, you can point them at it in the IMPORT and the CSVs are never written anywhere. I’d love to package this up in one command, but for now the best way to make a large fixture is

roachprod

...

create

...

<cluster>

...

-n

...

<nodes>

...

roachprod

...

put

...

<cluster>

...

<linux

...

build

...

of

...

cockroach>

...

cockroach

...

roachprod

...

put

...

<cluster>

...

<linux

...

build

...

of

...

workload>

...

workload

...

roachprod

...

start

...

<cluster>

...

for

...

i

...

in

...

{1..<nodes>};

...

do

...

roachprod

...

ssh

...

<cluster>:$i

...

--

...

'./workload

...

csv-server

...

--port=8081

...

&>

...

logs/workload-csv-server.log

...

<

...

/dev/null

...

&';

...

done

...

roachprod

...

ssh

...

<cluster>:1

...

--

...

./workload

...

fixtures

...

make

...

<generator>

...

[flags]

...

--csv-server=http://localhost:8081

...

workload check

workload check <generator> [flags] runs any consistency checks defined by the generator. These are expected to pass after initial table data load (whether via batched inserts or fixtures) or after any duration of queryload. In particular, this may be useful in tests that want to validate the correctness of a cluster that is being stressed in some way (crashes, extreme settings, etc).

...

TODO (In the meantime, check out the --help.)

roachtest

TODO (In the meantime, check out the --help.)See Roachtest overview

Intro to roachprod and workload (raw notes from a presentation Dan gave)

...

Roachprod is designed to be as unopinionated as possible. It's mostly a way to start/stop/run commands on many nodes at once, with a bit of crdb knowledge baked in.

Install roachprod:

go

...

get

...

-u

...

github.com/cockroachdb/roachprod

...

The best source of documentation about roachprod is definitely the --help:

roachprod

...

--help

...

Start a local roachprod cluster with 3 nodes:

roachprod

...

create

...

local

...

-n

...

3

...

Download cockroach onto it:

roachprod

...

run

...

local

...

"curl

...

https://binaries.cockroachdb.com/cockroach-v2.0.0.darwin-10.9-amd64.tgz

...

|

...

tar

...

-xJ;

...

mv

...

cockroach-v2.0.0.darwin-10.9-amd64/cockroach

...

cockroach"

...

  • sorry, this should be easier

Run a unix command on one node:

roachprod

...

run

...

local:1

...

"ls"

...

Start a cluster!

roachprod

...

start

...

local

...

  • If export COCKROACH_DEV_LICENSE=crl-<sekret license string in #backup>aW5n is in your .bashrc, start will also apply the license.

View the admin ui for node 1 (note the "1"):

roachprod

...

admin

...

local:1

...

--open

...

View the admin ui for all nodes:

roachprod

...

admin

...

local

...

--open

...

Open a SQL shell:

roachprod

...

sql

...

local:1

...

Shut it down:

roachprod

...

destroy

...

local

...

Make one in the cloud (so fast!):

roachprod

...

create

...

dan-crdb

...

-n

...

4

...

  • the cluster must start with your username

  • it will be shutdown and collected if you leave it running (the deadline can be increased via roachprod extend)

Download cockroach onto it:

roachprod

...

run

...

dan-crdb

...

"curl

...

https://binaries.cockroachdb.com/cockroach-v2.0.0.linux-amd64.tgz

...

|

...

tar

...

-xvz;

...

mv

...

cockroach-v2.0.0.linux-amd64/cockroach

...

cockroach"

...

Start it up (leave node 4 for the load generator):

roachprod

...

start

...

dan-crdb:1-3

...

  • note the 1-3 means nodes 1 to 3

workload

Workload is many things, but it's most important purpose is to run queryloads and report throughput/latency statistics. Each of these (kv, tpcc, roachmart) is a "generator" and defines it's own set of initial data. Generators are configured by flags, which are very powerful and generator implementors can use them to control almost any behavior.

Workload's help is also good:

roachprod

...

ssh

...

dan-crdb:1

...

--

...

./workload

...

--help

...

roachprod

...

ssh

...

dan-crdb:1

...

--

...

./workload

...

run

...

--help

...

roachprod

...

ssh

...

dan-crdb:1

...

--

...

./workload

...

run

...

kv

...

--help

...

roachprod

...

ssh

...

dan-crdb:1

...

--

...

./workload

...

run

...

tpcc

...

--help

...

Download workload onto your GCE cluster:

roachprod

...

run

...

dan-crdb

...

"curl

...

-L

...

https://edge-binaries.cockroachdb.com/cockroach/workload.LATEST

...

>

...

workload;

...

chmod

...

a+x

...

workload"

...

Run TPCC warehouses=1:

roachprod

...

ssh

...

dan-crdb:4

...

--

...

./workload

...

run

...

tpcc

...

--init

...

--warehouses=1

...

--duration=1m

...

{pgurl:1-3}

...

  • note the {pgurl} which roachprod expands into the connection url

Wipe your cluster:

roachprod

...

wipe

...

dan-crdb

...

And restart it:

roachprod

...

start

...

dan-crdb:1-3

...

Init-ing TPCC with large numbers of warehouses would be slow, so use a fixture:

roachprod

...

admin

...

dan-crdb:1

...

--open

...

roachprod

...

ssh

...

dan-crdb:1

...

--

...

./workload

...

fixtures

...

load

...

tpcc

...

--warehouses=10

...

Now run it:

roachprod

...

ssh

...

dan-crdb:4

...

--

...

./workload

...

run

...

tpcc

...

--warehouses=10

...

--duration=1m

...

{pgurl:1-3}

...