The CYCLIZE Program
The CYCLIZE program can be divided into four main parts. The first part is defining the "cyclization parameters", which describes how to perform calculation. The second part is defining the "sequence parameters", which describes how the chain on which to perform the simulations is constructed. The third part is actually calling the C programs and running the simulation. The fourth part is analyzing the results.

1) Defining the "cyclization parameters" hash

The cyclization parameters hash (usually called %cyclize_parameters) contains all the information necessary for running the Monte Carlo simulation EXCEPT information on the chain itself. For example:
%cyclize_parameters = (
    "whole_chains"     => 1e8,
    "nrad_stats"       => 1e7,
    "icalcs"           => 100,
    "radial_cutoff"    => 60,
    "axial_cutoff"     => 40,
    "torsional_cutoff" => 36,
    "nkeepers"         => 10,
)
These cyclization parameters will perform a calculation with:
  • 1x10^8 whole chains generated
  • 1x10^7 whole chains analyzed for radial distribution
  • 100 independent calculations
  • 60 angstrom radial cutoff
  • 40 degree axial cutoff (+/- 40 degrees from zero)
  • 36 degree torsional cutoff (+/-36 degrees from zero)
  • 10 cyclized chains will be kept (in the cyclized_chains.dat file)

  • 2) Generating the "sequence parameters" hash

    The "sequence parameters" hash (usually called %seq) contains the information necessary to construct the particular chain being studied. Ultimately, it is assumed that all DNA chain information can be represented in terms of seven values per base pair step: tilt, tilt flex, roll, roll flex, twist, twist flex, and rise per helix axis (dz).

    For instance, two basepairs of B-form DNA at 25 degrees C might look like this using these values:

    num       tilt    flex    roll    flex   twist    flex      dz
      1  B   0.000   4.842   0.000   4.842  34.450   4.388   3.400
      2  B   0.000   4.842   0.000   4.842  34.450   4.388   3.400
    

    CYCLIZE holds this information in a sequence parameters hash.

    A number of PERL subroutines have been written to fascilitate creating the %seq hash, click on them to see how to use them, or look at some of the samples scripts.

  • print_params
  • read_sequence
  • find_positions
  • create_sequence
  • fill_params
  • fill_name
  • fill_Atract_params
  • calc_half_chains

  • 3) Running the Monte Carlo simulation and analysis

    In order to perform the Monte Carlo simulation, the "cyclize parameters" and "sequence parameters" hashes must be defined. Once they have been built, the PERL subroutine "Cyclize" will perform the MC simulation
  • Cyclize
  • This PERL subroutine calls two C programs to perform the simulation and analysis, generate_chains and analyze_chains. Below is a description of how to use each program (note: you needn't know any of this, you can simply call Cyclize and the subroutine will call these programs appropriately):

    generate_chains
    This program creates uses the sequence.in file as input and create two files called "chain_1.dat" and "chain_2.dat" (assuming 2 partitions) that contains "nchains" half-chains generated using Monte Carlo.

    USAGE:
    generate_chains file seed nchains nparts
    
    Where  file = DNA params file
           seed = random number generator seed (odd 6 digit integer)
        nchains = total number of chains to generate
         nparts = total number of chain partitions to generate
    

    analyze_chains
    This program takes as input the chain_1.dat and chain_2.dat and combines the chains following the algorith described in the "method" section of this manual. Ultimately, the total number of whole chains analyzed will be (nchains^2) / nicalcs.

    USAGE:
    analyze_chains file nchains nparts nrad_stats nkeepers rcut acut tcut
    
    Where:       file = DNA parameters file
              nchains = number of half-chains to analyze
              nicalcs = number of independent calculations
           nrad_stats = how many whole chains to keep radial stats?
             nkeepers = number of cyclized chains to store to disk
                 rcut = radial cutoff (in angstroms)
                 acut = axial cutoff (in degrees)
                 tcut = torsional cutoff (in degrees)
    

    4) Further analysis of the results

    A number of additional programs/scripts are provided to fascilitate further analysis of the results after finishing with the MC step:
  • Output files
  • generate_chains.header
    Contains information on how the chains were generated.
  • analyze_chains.header
    Contains information on how the chains were analyzed.
  • analyze_chains.stats
    Contains all the details on the statistical analysis of the results ( chi squareds, per parameter error values, etc... )
  • radial.xm radial_fit.xm
    The radial distribution function (and the fitted function) in an easy to graph format.
  • axial.xm axial_fit.xm
    The axial distribution function (and the fitted function) in an easy to graph format.
  • torsional.xm torsional_fit.xm
    The torsional distribution function (and the fitted function) in an easy to graph format.
  • chain_cyclized_freq.xm
    This file is a frequency graph of the "most represented" half-chains found in the cyclized whole chains. That is, if your results are being affected by a single rare bent half chains, it will be shown here as having a large % frequency.
  • distfreq_halfchain.xm
    This file is the distribution frequency data representing the end-to-end distance of the half chains.
  • distfreq_wholechain.xm
    This file is the distribution frequency data representing the end-to-end distance of the half chains found in successfully cyclized whole chains.
  • Analysis scripts
  • show_results.pl
    A not-so-well-named PERL script that automagically graphs (using XMGR) all three distribution functions in a nice format. The script uses cyclize_xmgr.params (found in the directory /usr/local/lib/cyclize) as input parameters, thus, edit this file if your version of XMGR has problems (v4.1.2 of XMGR was used in making this file).
    USAGE: show_results
  • show_chains
    A C program that reads in the "chain_cyclized.dat" file and save any number of the cyclized chains in a "rotation matrix" ascii text form.
    USAGE: show_chains sequence.in chain_cyclized.dat 1
  • rotmat2pdb.pl
    This PERL script converts the "rotation matrix" acsii text file from show_chains into a viewable PDB file format.
    USAGE: rotmat2pdb.pl chain_*.rotmat
    USAGE: rotmat2pdb.pl wire chain_*.rotmot
  • get_j.pl
    This PERL script "greps" the J-factor from the analyze_chains.header output file and writes it out in a nice form, along with the error.
    USAGE: get_j.pl directory_*/an*header

  • Jon Lapham
    Yale University
    Department of Chemistry - Crothers Lab