Configuring Network for Big data

One of the major challenges for any beginner with Big data infrastructure is: settingup the network. Here are few pionts that might ease your job.

  • Big data networks require Gbps network (fiber optical cable). Your traiditional 10mbps or 100mbps networks may suffice for experiemental studies, but in production you may want to go with Gigabit networks capable of handling the loads. Wifi is a NO NO.
  • Pay special attention to the routers and port-forwarding NAT details. Incorrect configurations here can waste many hours than you can imagine
  • Few software systems, such as Hadoop, may even require you to setup ‘password-less SSH’. Be thorough with how and who has access to such systems. Security should be paramount concern for your big data systems.
  • In the initial stages it would do good for you if you turn off ‘firewall’. Instead plan for a DMZ with NAT for security.
  • For your experimental studies, when configuring VMs, you typically want to have two NICs in Virtual Box.
    1. An NAT adapater that is primarily for out-going connections (for being able to access the internet).
    2. A Host-Only adapter that primarily acts as ‘incoming’ connection acceptor from peer-VMs in your private network.
  • Beware of DHCP configurations. If you are accessing the machine by ipaddresses in your scripts, then DHCP can ruin your day.
    • Some DHCP servers can not guarantee unique addresses or sustainable addresses (e.g. Virtual Box)
    • If DHCP is ON, then it is strongly recommended to access the machines with names (rather than IP addresses).
  • DHCP is the best option if you are using image cloning (to avoid manually configuring the network)
    • In that case get a reliable DHCP server (with long enough address lease time and near-zero address reuse possibility)
    • A DHCP server that issues (and caches) addresses based on MAC ID is strongly recommended (than something that randomly picks an address in FIFO order).
  • When configuring Static IP for a VM, then you should also configure the DNS srevers properly.
    • For NAT adapter, static configuration (inside the VM) would look something like:
      • IP-Address: 10.0.2.x
      • Subnet Mask: 255.255.255.0
      • Gateway: 10.0.2.2
    • For Host-Only adapter, static configuration (inside the VM) should look something like:
      • IP-Address: 192.168.56.x
      • Subnet Mask: 255.255.255.0
      • Gateway: 192.168.56.1 (the Host’s IP-Address)
      • DNS Servers: (one or many of below)
    • Some of the tools that can help you configure these properly are:
      • nm-tool: http://linux.die.net/man/1/nm-tool
      • route -n: http://linux.about.com/od/commands/l/blcmdl8_route.htm
      • traceroute: http://linux.die.net/man/8/traceroute
  • DNS Servers: You want to configure the fastest (most close geographically) DNS server to your location. You can use ping to determine the latency for each and select the best out of below.
    • Google DNS Servers:
      • 8.8.8.8
      • 8.8.4.4
    • Open DNS Servers:
      • 208.67.222.222
      • 208.67.220.220
    • BSNL / TATA DNS Servers:
      • 218.248.255.212
      • 218.248.241.2
      • 203.124.230.12
      • 203.124.230.21
      • 208.67.222.222
      • 208.67.220.220

Analytics Training

Comments