RenderFarm Design Tips
Rendering, one of the biggest bottle-neck in any evolving
studio with either single or multi-project environment.
The studio may be pure 3D or in VFX (visual effects) or
a combination of it.
Rendering is a scientific process to incorporate the necessary
information to create the necessary frames needed for further
compositing or for the sequence of frames for editing table.
Initially We had tried Muster and Rush as our render Tools
these are generally called Network job Submission Tools,
but the disadvantages are huge I/O on the network traffic
when the number of servers keeping growing the performance
is not linear. the above tools starts performing in a small
farm factor environments or multiple of small farm factors,
but we can't have a huge single farm with dynamic load balancing
and multiple-internal groups where a job can moved between
a multi-farm.
Also OS (Operating System) Plays a major Role in the performance
of the Renderfarm:-
Windows has its own overheads when it comes to bigger farms,
so windows work great with a 20 o 50 compute farm size with
the combination of Render job submission tool. Even in this
scenario a typical LINUX environment can crunch about 30%
better in the same given environment, to boost the same
resources to 100%+ performance it need to be powered with
Grid computing or super computing.
Also the performance of a Good renderfarm matters with the
Type of Network, Storage and pipe-line design.
20 to 50 servers Renderfarm:- Figure-1
Network:- If the Renderfarm is about 20 to 50 servers means
its a small farm which can do a very good job if the network
is designed in a non-blocking architecture using Stackable
switch if the budget is lot then even a cascaded switch
will do a good job, the Storage Node and Master node should
be in the same subnet if possible it should be under same
LAN switch. If the LAN switches are Cascaded then it always
better add an additional NIC on the Storage-Node and Master-Node
make this NIC a part of the same Compute-node LAN.
Master Node & Storage Node:-
For a Small setup its advisable have Master-Node & Storage
node as the same system, but keeping an eye in the load
is always better. The typical config of a good MS and SN
can be a dual CPU with 4GB RAM with Dual GIG or QUAD Gig
Teamed for File serving and the onboard can be used for
Job-submission interface window.
50 to 100 servers Renderfarm:-
Figure-2
Network:- In this scenario its advisable still stick to
edge switches with stacking capability like above, but what
need to looked upon is the connectivity from the SN and
CN, for better performance the load on the multi-switch
can be split across multi-SN and CN.
Master Node & Storage
Node:-
Almost this is below moderate setup, it’s advisable
to have a Master-Node & Storage node as separate system.
The config of a good MS and SN can be a dual CPU with 4GB
RAM with Dual GIG or QUAD Gig Teamed for File serving and
the onboard can be used for Job-submission interface window.
100 to 200 servers Renderfarm:- Figure-3
Network:- In this case it needs a simple core switch comes
in place which has a non-blocking architecture with 10 gig
support for edge switches, an edge switch can have a dual
10 gig up-link to the core switch to have a near-non-blocking
architecture. Understanding the design we can create 3 to
4 groups of Compute farms across edge switches to avoid
any same-time I/O hit on the back-plane of the edge switch
which is carrying only a dual 10Gig over a 48 port gig.
Master Node &
Storage Node:-
Here comes a little bigger side of a NAS storage to support
the load of File I/O on the Storage node, in simple just
assume 200 servers working at 10MBps(Its Big “B”)
in a windows environment i.e., the bandwidth of the SN needed
is 2000MBps i.e., 2GBps that huge for any older boxes, having
new 4G SAN and Dual core Cpu’s and support of 10gigs
and IB(Infini Band’s) its not a tough job to achieve
the above bandwidth.
200+ servers RenderFarm:- Figure-4
Network:- The entire scenario Changes here we will not have
much option to go for Edge switches unless until we are
sure about the Data load path, but a non-blocking Core switch
is good to handle such huge farm for better Network I/O.
The SN, MN and CN are all connected to the core switch or
to the nearest Edge switch with 10Gig Dual uplinks.
Master Node & Storage Node:-
This type of setup needs GNS (Global Name Space) real-time
load balancing for Multi-path data accessing Multi-LUN file
access. The advantage of such setup is the Clients don’t
have to use a single static mount point to access the respective
file-system in the storage also this helps the setup to
have a natural HA(High availability) redundant path incase
of any File-server crashing.
Note:- Not to forget Infini-Band is in the market to have
more Bandwidth across the network. Will sure cost more for
more performance.