The CYCLIZE Program

The CYCLIZE program can be divided into four main parts. The first part is defining the "cyclization parameters", which describes how to perform calculation. The second part is defining the "sequence parameters", which describes how the chain on which to perform the simulations is constructed. The third part is actually calling the C programs and running the simulation. The fourth part is analyzing the results.

1) Defining the "cyclization parameters" hash

The cyclization parameters hash (usually called %cyclize_parameters) contains all the information necessary for running the Monte Carlo simulation EXCEPT information on the chain itself. For example:

%cyclize_parameters = (
    "whole_chains"     => 1e8,
    "nrad_stats"       => 1e7,
    "icalcs"           => 100,
    "radial_cutoff"    => 60,
    "axial_cutoff"     => 40,
    "torsional_cutoff" => 36,
    "nkeepers"         => 10,
)

These cyclization parameters will perform a calculation with:

1x10^8 whole chains generated

1x10^7 whole chains analyzed for radial distribution

100 independent calculations

60 angstrom radial cutoff

40 degree axial cutoff (+/- 40 degrees from zero)

36 degree torsional cutoff (+/-36 degrees from zero)

10 cyclized chains will be kept (in the cyclized_chains.dat file)

2) Generating the "sequence parameters" hash

The "sequence parameters" hash (usually called %seq) contains the information necessary to construct the particular chain being studied. Ultimately, it is assumed that all DNA chain information can be represented in terms of seven values per base pair step: tilt, tilt flex, roll, roll flex, twist, twist flex, and rise per helix axis (dz).

For instance, two basepairs of B-form DNA at 25 degrees C might look like this using these values:

num       tilt    flex    roll    flex   twist    flex      dz
  1  B   0.000   4.842   0.000   4.842  34.450   4.388   3.400
  2  B   0.000   4.842   0.000   4.842  34.450   4.388   3.400

CYCLIZE holds this information in a sequence parameters hash.

A number of PERL subroutines have been written to fascilitate creating the %seq hash, click on them to see how to use them, or look at some of the samples scripts.

3) Running the Monte Carlo simulation and analysis

In order to perform the Monte Carlo simulation, the "cyclize parameters" and "sequence parameters" hashes must be defined. Once they have been built, the PERL subroutine "Cyclize" will perform the MC simulation

Cyclize

This PERL subroutine calls two C programs to perform the simulation and analysis, generate_chains and analyze_chains. Below is a description of how to use each program (note: you needn't know any of this, you can simply call Cyclize and the subroutine will call these programs appropriately):

generate_chains
This program creates uses the sequence.in file as input and create two files called "chain_1.dat" and "chain_2.dat" (assuming 2 partitions) that contains "nchains" half-chains generated using Monte Carlo.

USAGE:
generate_chains file seed nchains nparts

Where  file = DNA params file
       seed = random number generator seed (odd 6 digit integer)
    nchains = total number of chains to generate
     nparts = total number of chain partitions to generate

analyze_chains
This program takes as input the chain_1.dat and chain_2.dat and combines the chains following the algorith described in the "method" section of this manual. Ultimately, the total number of whole chains analyzed will be (nchains^2) / nicalcs.

USAGE:
analyze_chains file nchains nparts nrad_stats nkeepers rcut acut tcut

Where:       file = DNA parameters file
          nchains = number of half-chains to analyze
          nicalcs = number of independent calculations
       nrad_stats = how many whole chains to keep radial stats?
         nkeepers = number of cyclized chains to store to disk
             rcut = radial cutoff (in angstroms)
             acut = axial cutoff (in degrees)
             tcut = torsional cutoff (in degrees)

4) Further analysis of the results

A number of additional programs/scripts are provided to fascilitate further analysis of the results after finishing with the MC step:

Output files

generate_chains.header
Contains information on how the chains were generated.

analyze_chains.header
Contains information on how the chains were analyzed.

analyze_chains.stats
Contains all the details on the statistical analysis of the results ( chi squareds, per parameter error values, etc... )

radial.xm radial_fit.xm
The radial distribution function (and the fitted function) in an easy to graph format.

axial.xm axial_fit.xm
The axial distribution function (and the fitted function) in an easy to graph format.

torsional.xm torsional_fit.xm
The torsional distribution function (and the fitted function) in an easy to graph format.

chain_cyclized_freq.xm
This file is a frequency graph of the "most represented" half-chains found in the cyclized whole chains. That is, if your results are being affected by a single rare bent half chains, it will be shown here as having a large % frequency.

distfreq_halfchain.xm
This file is the distribution frequency data representing the end-to-end distance of the half chains.

distfreq_wholechain.xm
This file is the distribution frequency data representing the end-to-end distance of the half chains found in successfully cyclized whole chains.

Analysis scripts

show_results.pl
A not-so-well-named PERL script that automagically graphs (using XMGR) all three distribution functions in a nice format. The script uses cyclize_xmgr.params (found in the directory /usr/local/lib/cyclize) as input parameters, thus, edit this file if your version of XMGR has problems (v4.1.2 of XMGR was used in making this file).
USAGE: show_results

show_chains
A C program that reads in the "chain_cyclized.dat" file and save any number of the cyclized chains in a "rotation matrix" ascii text form.
USAGE: show_chains sequence.in chain_cyclized.dat 1

rotmat2pdb.pl
This PERL script converts the "rotation matrix" acsii text file from show_chains into a viewable PDB file format.
USAGE: rotmat2pdb.pl chain_*.rotmat
USAGE: rotmat2pdb.pl wire chain_*.rotmot

get_j.pl
This PERL script "greps" the J-factor from the analyze_chains.header output file and writes it out in a nice form, along with the error.
USAGE: get_j.pl directory_*/an*header

Jon Lapham
Yale University
Department of Chemistry - Crothers Lab