Overview

This tutorial describes all the steps necessary to take an enzo movie dataset, and generate an animation from it. We will use the supernova dataset as an example. The major steps are as follows:

Preparation

Edit enzo movie headers for use by the rest of the toolchain.

Get Dataset Info

amr_stats for number of levels, number of grids, time-range, etc.
extrema_scan for scalar field range.

Subsetting

amr_subsetter for making reduced files for use with Maya™+Boxviewer

Temporal Interpolation

Generating the time curve with Maya™ or manually.
frame_extractor for performing the temporal interpolation

Conventions

The raw supernova dataset is stored on a machine named ‘cobalt’, and it has the following directory structure:

/projects/cosmic/bwoshea/fs_sn_movie/

movie_data_1/

MoviePack*.idx.*

MoviePack*.mdat.0_*

movie_data_2/

MoviePack*.idx.*

MoviePack*.mdat.0_*

The reason that there are two movie_data directories is that the simulation had to be restarted (most likely due to a crash of cobalt). This is not an uncommon outcome, so this tutorial will cover this case.

For ease of refering to this directory, assume we've put the following in our bash profile:

export DATA_DIR=/projects/cosmic/bwoshea/fs_sn_movie

We'll also assume that we are running all of this on cobalt, an SGI Altix machine, or on co-viz1 thruough co-viz8, which are SGI Prisms.

Text which is to be entered by the user, either on the command line or via a text editor, will be printed in monospace bold:

> amr_stats $PROJECT_DIR/movieHeader1.dat

whereas output to the console will be output in monospace:

Stats for dataset: /projects/cosmic/mahall/fs_supernova/movieHeader1.dat:

In-depth

Preparation

The first thing to do is to set up a directory for any intermediate files. I set mine up in:

/projects/cosmic/mahall/fs_supernova

Let's assume we've done a

export PROJECT_DIR=/projects/cosmic/mahall/fs_supernova

Also, we need to write some movieHeader.dat files for the enzo output. The movieHeader.dat files contain information about the datatypes, file locations, and other global information about the simulation. Now, enzo should produce movieHeader.dat files for us, but these are usually incomplete. Furthermore, in this case, the movieHeader.dat files didn't seem to make it into the data directories. Luckily, they are not that hard to write. For the full description, see the enzo_movie man page.

Unfortunately, we can't combine the two partial runs into one, so we will need to create two movieHeader.dat files, one for the first set of data ( $DATA_DIR/movie_data_1 ), and one for after the restart ( $DATA_DIR/movie_data_2 ). We'll keep the movieHeader files in $PROJECT_DIR.

[movieHeader1.dat]

MovieVersion = 1.4

Endianness = LITTLE

CoordFloatSize = 8

RootReso = 128

DataFloatSize = 8

RecordSize = 88

NumFields = 1

NumCPUs = 32

FileStem = /projects/cosmic/bwoshea/ \

fs_sn_movie/movie_data_1/MoviePack

MinFilenum = 147

MaxFilenum = 401

FieldNames = BaryonDensity

[movieHeader2.dat]

MovieVersion = 1.4

Endianness = LITTLE

CoordFloatSize = 8

RootReso = 128

DataFloatSize = 8

RecordSize = 88

NumFields = 1

NumCPUs = 32

FileStem = /projects/cosmic/bwoshea/ \

fs_sn_movie/movie_data_2/MoviePack

MinFilenum = 304

MaxFilenum = 554

FieldNames = BaryonDensity

[Note: For purposes of formatting, the FileStem lines were broken up into two lines. For the actual movieHeader files, the FileStem must appear on a single line]

How did we get these fields? For most of them, we asked the person who ran the simulation. The simulation was run on 32 processors (NumCPUs=32) on a little-endian machine (Endianness=LITTLE). The grid coordinates were stored as double-precision floating points (CoordFloatSize=8), as were the scalar data values (DataFloatSize=8). The root grid had a resolution of 128^3 (RootReso=128), and one scalar field representing baryon density was output (NumFields=1, FieldNames=BaryonDensity).

The FileStem gives the prefix for the data files - we can use that to point to the correct data directories.To find MinFilenum and MaxFilenum, we need to list each directory. All data files have the form:

MoviePack[NNN].[TYPE]_[CPU]

The 'Filenum' is the first number in the filename (the [NNN]). By listing the directories, we are able to determine the min and max.

Dataset Info

Now is the time you might want to get some extra information about the AMR simulation. You will probably need the time range of the simulation, as well as the scalar range of the data (for rendering). If you can get this information directly from the provider of the data, that is probably best. Otherwise, there are some tools to discover this information.

To get general information about the dataset, you can use amr_stats. To run it, just pass the locations of all the movieHeader.dat files:

> amr_stats $PROJECT_DIR/movieHeader1.dat \

$PROJECT_DIR/movieHeader2.dat

amr_stats may take several minutes (~15) to process a dataset of the size of the supernova simulation. Don't be alarmed if you see warnings of the form:

Missing:/projects/cosmic/bwoshea/fs_sn_movie/movie_data_2/MoviePack554.idx_0013

In most cases, these can be safely ignored (unless ALL the data files are listed as missing)

For the supernova, the output produced is:

Stats for dataset: /projects/cosmic/mahall/fs_supernova/movieHeader1.dat:

# of grids: 19510608

# of levels: 13

Domain: ( (0 0 0) (1 1 1) )

Data presorted: Unsorted at grid #1

l=12 t=9.61696 >= l=11 t=9.61696

Sorting: ... Done.

Level #0 : 128 grids over 4 timesteps.

Time range =( 9.6179515767634154 - 9.6209515776616978)

Level #1 : 256 grids over 4 timesteps.

Time range =( 9.6179515767634154 - 9.6209515776616978)

Level #2 : 1536 grids over 4 timesteps.

Time range =( 9.6179515767634154 - 9.6209515776616978)

Level #3 : 6016 grids over 4 timesteps.

Time range =( 9.6179515767634154 - 9.6209515776616978)

Level #4 : 12533 grids over 8 timesteps.

Time range =( 9.6179515767634154 - 9.6217875501794339)

Level #5 : 52796 grids over 206 timesteps.

Time range =( 9.6179515767634154 - 9.621928090434027)

Level #6 : 100619 grids over 194 timesteps.

Time range =( 9.6175233198460859 - 9.6219280904340287)

Level #7 : 151778 grids over 240 timesteps.

Time range =( 9.6169856839857708 - 9.6219280904340287)

Level #8 : 95923 grids over 486 timesteps.

Time range =( 9.6169707002360489 - 9.6219280904340287)

Level #9 : 30417 grids over 947 timesteps.

Time range =( 9.6169653806876898 - 9.6219280904340287)

Level #10 : 114664 grids over 1813 timesteps.

Time range =( 9.6169629737649576 - 9.6219280904340287)

Level #11 : 1402366 grids over 4988 timesteps.

Time range =( 9.6169617623832195 - 9.6219294703418008)

Level #12 : 17541576 grids over 12194 timesteps.

Time range =( 9.6169611614504085 - 9.621930663237892)

# of fields: 1

Field #0 Name: BaryonDensity

Field Type: Cell-centered

Field Precision: Float64

# subfields: 19510608

----------------------------------------------------------------------------

This gives us the number of levels (13), the number of subgrids (>19 million) and the extent of the root level grid (for enzo, I believe this will always be a cube with side-length 1) Ignore the comments about presorted data and sorting. Then, for each level, amr_stats lists how many grids exist at that level, followed by how many different timesteps exist for that level. On the next line, the time values for the first and last timestep are listed. When extracting individual frames, this range of time values will be important.

Note that there is a bit of oddness here in that the first timestep for levels 6 and above is actually earlier than the first root grid timestep. This probably shouldn't be happening, and those responsible have been sacked^H^H^H^H^H^Hinformed. However, if we start at the first listed timestep, we can assume that we have a valid time-range of 9.6169611614504085 to 9.621930663237892 (the latest listed timestep) for the purposes of the temporal interpolation.

We also get information about the individual data fields. In this case, there is only one. It tells us that that field is named BaryonDensity and consists of double-precision floating point data (which we know, since we wrote that in the movieHeader.dat file). It also tells us that the field is cell-centered. All enzo output is cell centered, so that's also not a surprise. Finally, we can see that there is field data for all 19,510,608 subgrids, which tells us that all availible data was written out (enzo does have the ability to only output a specified fraction of the data).

The quickest way to get the scalar range of the field data is to ask whoever ran the simulation. The scalar range is needed for applying the transfer function correctly, though an approximate range will probably suffice. In lieu of asking the progenitor of the data, you can scan the data yourself, using the tool extrema_scan. extrema_scan will scan a list of files consisting purely of floats (or purely of doubles) and return the minimum and maximum values encountered. For the enzo_movie format, scalar data is stored in files whose names contain .mdat.0_, so we scan all these files. You can pass the list of files on the command line, but in this case, there are too many data files to list on the command line, and the shell complains. So we'll pass the file names in on the standard input:

> find $DATA_DIR -name "*mdat.0*" | extrema_scan -v -

If the simulation was run on a machine that stores data in the opposite byte-order, you can pass a -s option to swap byte order. In this case, it is unnecessary.

By specifying -v, extrema_scan will list each file as it processes it, and output the current min/max values when it does so. Omit this if you find the output too noisy. In either case, the last line of output of the program will be a pair of numbers, which will be the scalar min and maximum. Save this somewhere!

For a dataset the size of the supernova simulation, it can take several hours to scan for extrema, so one must be patient! You may want to redirect the output into a log file, so that if the process ceases for any reason (i.e. cobalt crashes) you can restart from where you left off.

Taking Subsets for Interactive use

For use with Maya + Boxviewer, it can be useful to subset the data. Due to the number of subgrids, even just loading the boxes can consume gigabytes of RAM. There is a tool called amr_subsetter to do this. It takes as its input an enzo movie file or files, a list of levels, and outputs an enzo_amr index file. It should be noted that the resulting index file should be used only for viewing boxes, and not for any application where any field data might need to be used (such as a rendering process).

To use the tool, the first step is to decide which levels to include in the subset. For reasonable memory usage and performance, you probably don't want to include more than a few million grids. Looking at the output from amr_stats in the last section, this means that we should probably exclude level 12, since it alone has over 17M grids. Also, usually in enzo output, the first few levels are static. Thus they don't impart much structural information and can clutter things up a lot, so you might want to only choose the root-level grids (level 0), and levels 4-11. You can do this by invoking:

> cd $PROJECT_DIR

> amr_subsetter -l 0,4-11 subset.idx \

movieHeader1.dat movieHeader2.dat \

> subsetHeader.dat

(The -l option takes a comma separated list of levels, or level ranges. Spaces are not allowed in the list. The above is equivalent to -l 0,4,5,6,7,8,9,10,11. This will create a single binary file subset.idx with the subset data in it. amr_subsetter will also print the following movie header file to stdout (which in the above command, we have redirected to the file subsetHeader.dat):

MovieVersion = 1.4

Endianness = LITTLE

CoordFloatSize = 8

DtFloatSize = 8

DataFloatSize = 8

RootReso = 128

IndexFilePattern = subset.idx

NumCPUs = 1

MaxFilenum = 0

RecordSize = 88

NumFields = 1

FieldNames = BaryonDensity

You can see the effect of amr_subsetter by running amr_stats on the output file:

> amr_stats subsetHeader.dat

Stats for dataset: subsetHeader.dat:

# of grids: 1961224

# of levels: 12

Domain: ( (0 0 0) (1 1 1) )

Data presorted: Unsorted at grid #3

l=11 t=9.61697 >= l=9 t=9.61697

Sorting: ... Done.

Level #0 : 128 grids over 4 timesteps.

Time range =( 9.6179515767634154 - 9.6209515776616978)

Level #1 : 0 grids over 0 timesteps.

Level #2 : 0 grids over 0 timesteps.

Level #3 : 0 grids over 0 timesteps.

Level #4 : 12533 grids over 8 timesteps.

Time range =( 9.6179515767634154 - 9.6217875501794339)

Level #5 : 52796 grids over 25 timesteps.

Time range =( 9.6179515767634154 - 9.621928090434027)

Level #6 : 100619 grids over 209 timesteps.

Time range =( 9.6175233198460859 - 9.6219280904340287)

Level #7 : 151778 grids over 264 timesteps.

Time range =( 9.6169856839857708 - 9.6219280904340287)

Level #8 : 95923 grids over 471 timesteps.

Time range =( 9.6169707002360489 - 9.6219280904340287)

Level #9 : 30417 grids over 965 timesteps.

Time range =( 9.6169653806876898 - 9.6219280904340287)

Level #10 : 114664 grids over 1813 timesteps.

Time range =( 9.6169629737649576 - 9.6219280904340287)

Level #11 : 1402366 grids over 5736 timesteps.

Time range =( 9.6169617623832195 - 9.6219294703418008)

# of fields: 1

Field #0 Name: BaryonDensity

Field Type: Cell-centered

Field Precision: Float64

# subfields: 1961224

Indeed, there are now only 2 million grids in the dataset. A better way to subset would probably involve reducing the number of timesteps, rather than reducing the number of levels, since it might be important to see the very fine levels (which tend to be most numerous). I'm working on that now.

Anyway, you can look at the boxes using Maya+BoxViewer by giving it the location of subsetHeader.dat (assuming subset.idx is in the same directory)

Temporal Interpolation

As you can see from the output of amr_stats in section 2, an enzo dataset represents the output of a simulation over time. To render a frame, we need to grab a single timestep from that simulation. Unfortunately for us, the simulation does not advance all of the spatial regions in lock step, as traditional volume animations do. Instead, some regions of space (those only covered by the "coarse", or level 0 grids) are updated in time very infrequently, with rather large jumps in time. In this case there are only 4 level 0 timesteps, which would make for a very poor 900 frame animation. On the other hand, there are over 12,000 level 12 timesteps, so spatial regions covered by those are being updated far faster than we can take advantage of. To add to the confusion, which regions of space are covered by which levels changes over time, so some point which is in a level 12 (freq. updating) region at the beginning of the simulation might be in a level 6 region by the end.

The problem of extracting a smooth animation from such a temporally incoherent simulation is the job of frame_extractor. It takes a fair bit of computation, and the datasets can be gigantic, so be prepared to use HPC resources. However, it should not be too dificult to set up. One needs a dataset, and a list of points in time for which one would like to extract temporally coherent frames. Then, for each point in time (call it 't'), frame_extractor produces a multi-level dataset where every region of space represents the state of the simulation at time t. frame_extractor also performs the important step of converting cell-centered data to vertex centered data. (To efficiently render smooth looking images, vertex centered data are needed. Without difficult and computationally expensive rendering techniques, cell-centered data will produce very blocky images)

The first step in frame extraction is to produce a time curve. This is just a text file consisting of a list of time values that you wish to produce frames for. This can be produced in Maya, or perhaps another tool.

A quick way to produce a constant slope time curve is with the following python script:

start_time = 9.6169611614504085

end_time = 9.621930663237892

delta_t = end_time - start_time

num_frames = 50

for frame_no in range(0,num_frames):

print "%.18g" % (start_time + delta_t * frame_no/float(num_frames - 1))

This script can be saved as a file make_times.py, and invoked as:

> python make_times.py >time_curve_1.txt

Start_time and end_time are the values we found in step 2, after running amr_stats. This script, when run, will print out 50 evenly spaced timesteps to standard out. You can capture that and put it into a file for use with frame_extractor. However, note that it is unlikely that evenly spaced timesteps will be appropriate for an AMR simulation, due to the drastically different time scales in the simulation. The technique described above is best for a test run, or a first look at the data.

(Note about time values: The meaning and units of the time values are not fixed across simulations. In order to make sense of the time values, you will need to talk to the provider of the simulation. In this case, I found out that each unit of time 20,800,000 years, and time zero represents the big bang. So this simulation takes place at approximately 200 million years after the big bang, and the supernova event lasts about 103,000 years)

So, assume we have somehow generated a time curve $PROJECT_DIR/time_curve_1.txt. As the frame_extractor will produce at least 2 output files for each timestep (an XML metadata file, and a binary file containing field data), let's create a directory to hold the output:

> mkdir $PROJECT_DIR/TimeCurve1

Then to run the frame extractor:

> cd $PROJECT_DIR

> frame_extractor -v -t time_curve_1.txt \

-o 'TimeCurve1/supernova_%04d' \

movieHeader1.dat movieHeader2.dat

A breakdown of the options follows:

-v :

Produce vertex centered output. You want this.

-t time_curve_1.txt :

extract frames at times listed in the file 'time_curve_1.txt'

-o 'TimeCurve1/supernova_%04d':

Produce output filenames beginning with the given pattern.

Any printf pattern based of %d is replaced with the current

frame number (zero-based), so %04d will create filenames

containing numbers in the form 0000, 0001, 0002, etc

movieHeader1.dat movieHeader2.dat:

As always, the list of enzo movieHeader files for the

simulation

Running this requires a fair bit of memory (>4G for the supernova) and a lot of time. One way to speed this up is to invoke frame_extractor on multiple machines in a sort of cheesy parallelism. To support this, frame_extractor provides the -b, -e, and -s options (standing for beginning, ending, and skip). The idea is to run on N machines, each machine computing every Nth frame. By passing '-s N' to frame_extractor, it will process only every Nth frame, starting at either 0 (by default), or a frame number passed in with the -b option. So to run on co-viz1 through co-viz4, log in to each machine, and on each machine 'cd' to the PROJECT_DIR. Then start the following:

co-viz1> frame_extractor -s 4 -b 0 -v \

-t time_curve_1.txt \

-o 'TimeCurve1/supernova_%04d' \

movieHeader1.dat movieHeader2.dat

co-viz2> frame_extractor -s 4 -b 1 -v <same as above>

co-viz3> frame_extractor -s 4 -b 2 -v <same as above>

co-viz4> frame_extractor -s 4 -b 3 -v <same as above>

(You can make a script to ease this process, and I sure hope to parallelize frame_extractor from within, to make this process less painful)