Got a nice mention on this post for my CHUG presentation a while back.
Since then we’ve actually ditched the linux-ha setup. Turns out if you buy hardware that is halfway decent, the chances of failure pretty low. If you keep your NN metadata on NFS storage you can get up and running quickly again someplace else. Best thing to do is to use a service DNS CNAME for your NN and JT services (as things like Hive like to write the URI into the DB) so moving the service is just a DNS change.
I’m trying to come up with a followup presentation on progress (or lack thereof) on those next steps. The working title for the presentation is “Logs are Evil” which really gets at the root of the problem with logs. Any piece of data that your system generates needs a destination and a lifecycle (retention period, etc). If you don’t do this early, like my kid’s messy room, soon becomes an unwieldy cleanup project.
So World ipv6 Day is coming June 8, 2011. Should I be stocking water and canned goods in the basement like the doomsayers in 2000 did?
What would it take to actually get ready? Well, turns out for most folks, nothing. This is because the companies we pay each month (Comcast/RCN/AT&T) for our internet hookup aren’t providing a ipv6 pipe to my house. No big deal.
But in trying to simply learn more about it, I discovered this post. In it, the author explains how to create a ipv6 tunnel to your house using an Apple Airport Extreme router. Strictly speaking, this router isn’t necessary to create a tunnel, but Apple did a good job of making this brain dead easy. NOTE: you will need to have a static IP for this to work as the tunnel broker needs to know where it is routing the packets.
Using the information in the post, I signed up for a free ipv6 tunnel provided by the good folks over at tunnelbroker.net. I discovered that the ipv6 enabled computers in my house recognized the newly created ipv6 block and picked up an address using the default ip assignment scheme in ipv6 using the computer’s mac address. There is a decent picture describing this mechanism here. Keep in mind, my ipv6 enabled devices (read: Macs) already had an ipv6 address in a similar form starting with fe80:: for the first 64 bits. This is the non-routable block.
The next step was to create AAAA records in DNS that mapped to my new externally routable ipv6 address. But here was an added bonus provided by the tunnelbroker folks: reverse dns PTR records. You see, even though I’ve had a static ip for years, I could never setup the reverse DNS (residential account) as the provider would map it themselves. So the A record for cooldomain.com would be 220.127.116.11, but the PTR record for 18.104.22.168 would be 4-3-2-1.sdk-bsr1.chi-stk.il.static.cable.rcn.com (yuck). Does this matter for most people — of course not! It is just a nice to have.
The other interesting thing to note is that if you stick to the default addressing scheme based on MAC address, you don’t really need DHCP anymore to hand out addresses like we have in ipv4.
The final steps were then simply reconfiguring ssh and apache to listen on those new ipv6 ips (or not if you didn’t want them visible)
WORD OF CAUTION: That warm fuzzy feeling you have sitting behind your NAT’ed router that doesn’t forward incoming connections by default? That is gone if you set this up. It is still “hard” for somebody to portscan your address since the ipv6 address space is so large, but by creating DNS entries, you are giving the bad guys a starting point.
So we are back to “why would I do all this”? Well, other than the exercise (or bragging rights), you probably shouldn’t. That is if you are Joe average.
If you run a website/service, your computer is already publicly addressable. Getting ready for ipv6 is simply creating AAAA records in DNS, changing some listen bind addresses on your services and volia! The convention seems to be to create a DNS entry like ‘ipv6.mycooldomain.com’ that has only a AAAA record (but no A record) so that it only resolved in ipv6-land for testing. That is of course, after you get an ipv6 address from your service provider. Perhaps you already have one? Check your network settings (ifconfig) and look for an inet6 address that isn’t the local ‘fe80::’ address. If there a public ipv6 address, you just have to create the AAAA records in DNS and you are done.
So what is somebody like Google going to do on June 8th?
% host -t aaaa ipv6.google.com
ipv6.google.com is an alias for ipv6.l.google.com.
ipv6.l.google.com has IPv6 address 2001:4860:800f::93
% host www.google.com
www.google.com is an alias for www.l.google.com.
www.l.google.com has address 22.214.171.124
www.l.google.com has address 126.96.36.199
www.l.google.com has address 188.8.131.52
www.l.google.com has address 184.108.40.206
On June 8th, that second command will return ipv6 addresses as well (similar to the first command in bold). That’s about it. I can hardly contain my excitement ;)
One month this last summer I started a skunkworks project (among other things) to enhance the Leon Levy Expeditation to Ashkelon’s use of technology by using an iPad as a data terminal. The dig had already done away with the data entry of old with all data going back to a database at the University of Chicago for several years now. You can’t imagine the amount of detail they collect and try and correlate. Computers are very much a necessity as the volume of data grows.
Then my wife calls me. Looks like these guys got done first. So much for some nice Apple press.
Darn my regular paying job always coming first! Well, here is a screenshot from the prototype I am working on.
Last night I gave a talk at the Chicago Hadoop Users Group on using Hadoop to help Orbitz collect and search large volumes of application logs.
Slides are posted if you are interested.
I think there is some useful information in there including:
- High Availability Name Node/Job Tracker Configuration
- Mostly “real time” log collection into HDFS (current and future direction)
- Creating an interactive “grep” webapp from batch oriented Map/Reduce
- Lots of code & configuration details
As a developer, when you want to have one machine talk with another (and you want your packets delivered reliably and in-order) you open a TCP connection. Ah, TCP, that tried and true communication mechanism that probably drives most of the Internet today.
While speeds have come a long way, packet loss and latency are two very real factors that can really eat into your TCP throughput.
Packet loss you say? This is 2010! Well, turns out that overloaded switches like to drop packets as a form of congestion control. So yes, there is still packet loss in 2010.
Latency from my house is usually in the 50-100ms range. And if you are going overseas you are looking at least 100ms…
I’m not a networking expert, but I know enough to understand this picture:
When developing programs and services that are contained in 1 data center, you are often dealing with near zero latency and packet loss. Once you are talking mobile or multiple data centers, you are living in the short parts of that graph.
There are tricks people try, like multiple streams, so the effect of loss doesn’t impact the tcp window as much. This is how programs like GridFTP are able to make maximum use of the network. The theory is summed up nicely in this picture:
The TCP sawtooth is less impacted per smaller stream so the overall variation is less than putting all your eggs in 1 basket. Interesting idea and will be available to us Java programmers when SCTP becomes more baked in Java 7. But this assumes that your problem CAN be broken into multiple streams.
Many have tried building their own solutions on UDP, but they really move the flow control and packet delivery guarantee into the Application layer. Is this a bad thing? Well, it is if I have to write it. After all, I wouldn’t presume to think that I could do a better job than people who spend all their time researching this stuff.
Which brings me to a couple of years ago when I became aware of a project run by a researcher at UIC called UDT. Among other things, it is an attempt at an alternative flow control with things like congestion avoidance in mind. The paper is interesting and can be found here.
So the idea is to take this:
And turn it into this:
Professor Gu has a great powerpoint presentation here (which these pictures are from) that shows the highlights if the original paper is too dry a read.
The implementation, found at http://udt.sourceforge.net/ is C++, wasn’t a simple plug-in replacement for TCP in a Java application. There were some earlier attempts a JNI wrappers, but we all know how that goes in practice. At the time, our project didn’t have the bandwidth to write a native Java implementation so using UDT went on the shelf…
Looking recently, it seems that somebody else thought a native Java implementation drop-in was a good idea and has made the code available here.
Now I’m not suggesting that UDT should replace TCP for general use, but for some applications (large data transfers on high speed/high latent networks) it looks like a nice alternative…
Might be worth a look again…
Well, I finally broke down and created a personal blog. While I’ll sometimes post useless information or observations to the Group Inanity Blog or to my Twitter feed, it seems that I needed a place to put things of a less inane nature.
So 14 years after signing up for the goofy.net domain (so I could have a vanity/constant email address) there is finally some content here that doesn’t look like a parked domain.
Lord help you all…
Luckily, the Internet is now so full of noise, this will most likely go largely unnoticed.