<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>sidenotes &#187; tech</title>
	<atom:link href="http://chetnichols.org/category/tech/feed/" rel="self" type="application/rss+xml" />
	<link>http://chetnichols.org</link>
	<description>and other random technical bantering, by chet nichols</description>
	<lastBuildDate>Sat, 24 Jul 2010 19:47:19 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>writing a custom Netscaler health check for NTP</title>
		<link>http://chetnichols.org/2010/04/11/writing-a-custom-netscaler-health-check-for-ntp/</link>
		<comments>http://chetnichols.org/2010/04/11/writing-a-custom-netscaler-health-check-for-ntp/#comments</comments>
		<pubDate>Sun, 11 Apr 2010 09:12:48 +0000</pubDate>
		<dc:creator>chet</dc:creator>
				<category><![CDATA[netscaler]]></category>
		<category><![CDATA[perl]]></category>
		<category><![CDATA[tech]]></category>

		<guid isPermaLink="false">http://chetnichols.org/?p=18</guid>
		<description><![CDATA[There are lots of load balancing products on the market today. Some are good, some are bad. Opinions aside, one such product is the Citrix Netscaler. Now, the Netscaler has built in health checking for many different types of backend service protocols: HTTP, FTP, DNS, and more. However, one protocol not included is NTP. If [...]]]></description>
			<content:encoded><![CDATA[<p>There are lots of load balancing products on the market today. Some are good, some are bad. Opinions aside, one such product is the Citrix Netscaler.</p>
<p>Now, the Netscaler has built in health checking for many different types of backend service protocols: HTTP, FTP, DNS, and more. However, one protocol not included is NTP. If you happen to be load balancing a farm of NTP servers, and want to ensure you&#8217;re not routing to a hosed NTP daemon (even if it&#8217;s ping-able), it would be nice to be able to health check the service [and take it out of rotation if it's not responding].</p>
<p>Luckily, the Netscaler provides (with documentation) a very flexible way to write your own custom health checks. Using the system, you can test whatever you want, in any way you want, with just a little Perl. Even better, the custom health check subsystem provides a few useful things to simplify the whole process, including:</p>
<ul>
<li>passing in the service IP, and port, as arguments to your script</li>
<li>passing in a custom argument string (key value pairs, user/pass info, whatever)
<li>handling the timeout for you from a higher level; if your check doesn&#8217;t respond within -respTimeout seconds, it will be considered a failure.</li>
</ul>
<p>Now, for just the basics, this monitor will actually be pretty easy to put together. We are given the IP/port to send an NTP request to, and we don&#8217;t need to worry about a timeout, because the underlying subsystem will handle that for us. All we really need to do is send a request and [hope] for a response!</p>
<p>That being said, we&#8217;ll need to do a couple things to get to that point. These things are:</p>
<ol>
<li>Load the health check subsystem module.</li>
<li>Load the Socket module, and create a socket to use to send the request [using the parameters passed in as args].</li>
</ol>
<p>From that point, we can build and send our request. Since we don&#8217;t need to worry about handling a timeout, we can just sit there waiting for a response. First up:</p>
<pre>
use IO::Socket;
use Netscaler::KAS;
</pre>
<p>We&#8217;ll need both of these modules &#8211; one will tap into the health check subsystem, and the other is so we can send our UDP-based request.</p>
<p>Next is our function, which we&#8217;ll just call <i>ntp_probe</i>. It assigns a few variables from <i>$_[0]</i> (the service IP) and <i>$_[1]</i> (the service port) as arguments, which get passed in from work within probe().</p>
<p>So, the magic-fu to this is that, later on, we will pass our <i>ntp_probe</i> function as a coderef to the KAS probe() command. That will ultimately result in our function being called with the arguments passed in; it is similar to this:</p>
<pre>
$custom_function->($host,$port,$args);
</pre>
<p>If you look at the KAS code (and since it&#8217;s perl, you can), you can see the exact line; this just gives you an idea of what&#8217;s going on.</p>
<p>In any case, after that, we&#8217;ll create a new UDP socket, send the NTP request message down it, and either:</p>
<ul>
<li>return 1 if something wasn&#8217;t defined (ip, port, or sock) &#8211; failure!</li>
<li>return 0 if we got a response &#8211; success!</li>
</ul>
<p>Again, we don&#8217;t need to worry about timeouts, since the subsystem will handle that for us. So, here&#8217;s the function in it&#8217;s entirety:</p>
<pre>
sub ntp_probe {

    my $host    = $_[0];
    my $port    = $_[1];
    my $req     = "\010"."\0"x47;

    if(!$host || !$port) {
        return(1,"Host or port not specified.");
    }

    my $sock = IO::Socket::INET->new(
        Proto     => "udp",
        PeerAddr  => $host,
        PeerPort  => $port,
    );

   if($sock) {
        $sock->send($req);
        $sock->recv($_,1);
        return 0;
    }

    return(1,"Could not create socket");

}
</pre>
<p>Not bad, was it? Now that we have loaded our modules, and defined a function, the next step is to call the probe() method, with a coderef to our new function, to tell KAS what function to use for probing:</p>
<pre>
probe(\&#038;ntp_probe);
</pre>
<p>And you&#8217;re off to the races! Just don&#8217;t forget to start your script with <i>#!/usr/bin/perl</i>, and it should be good to go <img src='http://chetnichols.org/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<p>Now, you might be asking &#8220;how do I actually configure the Netscaler to use this?&#8221; That&#8217;s also just as easy. Let&#8217;s pretend our new script is called <i>customNTP.pl</i>. Just plop it into the <b>/nsconfig/monitors/</b> directory, and do something like this from the CLI:</p>
<pre>
add lb monitor "custom-ntp-mon" USER -scriptName customNTP.pl
   -scriptArgs 1 -dispatcherIP 127.0.0.1 -dispatcherPort 3013
   -interval 20 -respTimeout 3
</pre>
<p>With that line, you will now have a working NTP healthcheck (20 second intervals, 3 second timeout). Just bind it to your service(s) that need it and enjoy!. Note that the type of monitor is <b>USER</b>; this is the type used for custom health checks.</p>
<p>Well, that&#8217;s all for this evening. There are bits and pieces of info like this around the web, on the Citrix AppExpert site, etc. And, even though it wasn&#8217;t too difficult to figure out, hopefully this will help someone out some day.</p>
<p>Take care!</p>
]]></content:encoded>
			<wfw:commentRss>http://chetnichols.org/2010/04/11/writing-a-custom-netscaler-health-check-for-ntp/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>apache mod_status: the misleading ReqPerSec metric</title>
		<link>http://chetnichols.org/2010/04/03/apache-mod_status-the-misleading-reqpersec-metric/</link>
		<comments>http://chetnichols.org/2010/04/03/apache-mod_status-the-misleading-reqpersec-metric/#comments</comments>
		<pubDate>Sat, 03 Apr 2010 18:18:40 +0000</pubDate>
		<dc:creator>chet</dc:creator>
				<category><![CDATA[apache]]></category>
		<category><![CDATA[tech]]></category>

		<guid isPermaLink="false">http://chetnichols.org/?p=77</guid>
		<description><![CDATA[To be honest, I had never thought twice about it. At my past employer, we had implemented tracking request rate for the Apache hosts by parsing out Total Accesses (ie: Total Requests) from server-status (thanks to our friendly module mod_status), publish that every minute, and let our metrics collection subsystem do a delta between the [...]]]></description>
			<content:encoded><![CDATA[<p>To be honest, I had never thought twice about it. At my past employer, we had implemented tracking request rate for the Apache hosts by parsing out Total Accesses (ie: Total Requests) from server-status (thanks to our friendly module mod_status), publish that every minute, and let our metrics collection subsystem do a delta between the current and previous to determine a request rate to publish.</p>
<p>At my new employer, I put something together where the collection subsystem won&#8217;t refer to the previous total for deltas: it just takes whatever you give it and relays that to the metrics backend. Most applications provide you with a request/sec metric these days, so I figured I&#8217;d just use that. Instead of doing deltas on TotalAccesses, I started just relaying ReqPerSec instead. Simple enough, right? For those of you not familiar, here&#8217;s an example of server-status output:</p>
<pre>
Total Accesses: 400
Total kBytes: 800
CPULoad: 5
Uptime: 12345
ReqPerSec: 20
BytesPerSec: 20000
BytesPerReq: 1024
BusyWorkers: 15
IdleWorkers: 5
</pre>
<p>Well, amidst my metrics collection testing for the new system, I stumbled upon an interesting detail: I would be nailing the Apache instance with a flood of requests using my friend ab (aka: apachebench). However, I immediately noticed ReqPerSec wasn&#8217;t accurate at all. For example, I would be averaging a test of 300-400 req/s, but ReqPerSec would be displaying 21 req/s. It was as if I wasn&#8217;t sending it requests.</p>
<p>That being said, the TotalAccesses metric was reflecting the requests I was sending, so I knew the web server was receiving them. Interesting.</p>
<p>It got me thinking: &#8220;I wonder how Apache determines ReqPerSec?&#8221; Time for some source code action! Sure enough, here&#8217;s what I found:</p>
<pre language="c">
ap_rprintf(r, "ReqPerSec: %g\n",
    (float) count / (float) up_time);
</pre>
<p>Interesting! All Apache does to calculate ReqPerSec is take the total number of requests throughout the life of the parent (<b>count</b>) and divide that by the total uptime in seconds (<b>up_time</b>). So, really, ReqPerSec is just an *average* requests per second over the parent&#8217;s lifetime. That&#8217;s NOT what we want. Maybe it&#8217;s useful in some scenarios, but when you need a live metric to let you know what your server is ACTUALLY doing, this isn&#8217;t the one.</p>
<p>If you want an accurate request rate, <b>your best bet is going to be to collect Total Accesses and do deltas off of the previous collection</b>. I&#8217;m glad that&#8217;s how we did it before!</p>
<p>To a point, this goes into the expense of metrics. For Apache, Total Accesses is just a simple counter. It returns an accurate value live, on-the-fly, and makes it very easy for the sys admin to collect and process. Very little work is done on it&#8217;s end. Even the current implementation of ReqPerSec has very little cost: it just does some simple division on-demand, using counters it already tracks anyway.</p>
<p>To keep an actual average request rate, there would be some more resource involved. It would need to set a timer (ie: every 60 seconds), have a callback for the timer (to actually handle the calculation), and store counters in memory to calculate deltas. However, in reality, this will really have a very minimal impact on server performance and system load (a few extra CPU hits every minute, storing two little 64-bit counters, etc).</p>
<p><b>So what did I end up doing? I hacked up my own mod_status using shared memory!</b></p>
<p>Using the existing mod_status code, I pulled some code from a shared memory module example and smashed them together to calculate an accurate ReqPerSec metric on the fly, returned when you request server-status.</p>
<p><b>How does it work?</b></p>
<p>When you request server-status, it will go into shared memory, pull out the total accesses and uptime from the previous server-status request, do deltas on the current total accesses and uptime, then return the rate. It will then put the current total accesses and uptime back into shared memory for the next poll (<b>base</b> is our shared memory baseline).</p>
<pre>
ap_rprintf(r, "ReqPerSec: %g\n", (
    ( (float)count - (float)base->prev_req_count ) /
    ( (float)up_time - (float)base->prev_uptime ) )
);
</pre>
<p>There is of course all of the shared memory code, but the end result is the calculation above. We poll every 60 seconds, so we incur a small cost every minute, but, like I mentioned before, it&#8217;s not noticeable, and it gives us the view we need into the environment.</p>
<p><i>Some of you may also wonder what happens to the first request, since there isn&#8217;t any data in our shared memory space yet. For now, it will be inaccurate, but only for the first request in the lifetime of the parent, which I can deal with until I make it better</i>.</p>
<p>Well, that&#8217;s all for this one. It was an interesting discovery in the world of Apache metrics collection, and an interesting adventure to hack together the shared memory module. I&#8217;m sure there are other scenarios out there like this one; if you come across one, feel free to drop a comment!</p>
<p>I&#8217;m off to the Apple Store now- it&#8217;s iPad launch day <img src='http://chetnichols.org/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://chetnichols.org/2010/04/03/apache-mod_status-the-misleading-reqpersec-metric/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>use perl: pack() to send floats between intel/sparc with proper endian-ness.</title>
		<link>http://chetnichols.org/2010/02/22/use-perl-pack-to-send-floats-between-intelsparc-with-proper-endian-ness/</link>
		<comments>http://chetnichols.org/2010/02/22/use-perl-pack-to-send-floats-between-intelsparc-with-proper-endian-ness/#comments</comments>
		<pubDate>Mon, 22 Feb 2010 08:13:39 +0000</pubDate>
		<dc:creator>chet</dc:creator>
				<category><![CDATA[perl]]></category>
		<category><![CDATA[tech]]></category>

		<guid isPermaLink="false">http://chetnichols.org/?p=78</guid>
		<description><![CDATA[So I like using Perl, if anyone hasn&#8217;t realized yet. Python seems awesome and all, but I haven&#8217;t had the time to start writing things in Python. I&#8217;m still at the stage when, if a tool needs to be put together, it needs to be done ASAP; I can&#8217;t delay trying to figure out Python [...]]]></description>
			<content:encoded><![CDATA[<p>So I like using Perl, if anyone hasn&#8217;t realized yet. Python seems awesome and all, but I haven&#8217;t had the time to start writing things in Python. I&#8217;m still at the stage when, if a tool needs to be put together, it needs to be done ASAP; I can&#8217;t delay trying to figure out Python particulars. </p>
<p>In any case, one of our metrics publishing systems is wrapped by a few different object classes I put together in Perl (follows a certain protocol created by another group, etc). Of the many data types supported by our metrics receivers, we can do things like 32-bit unsigned integers, single precision (32-bit) floating points, and a few others.</p>
<p>Now, when I would publish the 32-bit uint&#8217;s, it would work great- no issues. The receivers would expect a byte stream, so I would use pack() to send my values as binary. For example:</p>
<pre code="Perl">
$val=29;
print $sock pack('N',$val);
</pre>
<p>Easy, whatever. However, when I would send values as a single precision float, it would totally be hosed on the other end. For example:</p>
<pre code="Perl">
$val="30.0";
print $sock pack('f',$val);
</pre>
<p>On the other end, they would get some massive negative number. I don&#8217;t deal with this too much, things usually just work. Thankfully, my mind was jogged in regards to standards- there are lots of standards around byte order for integers (16 bit, 32 bit, 64 bit), but not as much for float; one architecture may use big-endian, and another architecture might use little-endian.</p>
<p>In my searches, I found that Intel stores floating points in little-endian byte order, and Sparc stores them in big-endian byte order. Sure enough, I was trying to publish from an Intel-based BSD host to a Sparc based Solaris host. Aha!</p>
<p>Luckily, pack() can *also* handle this for us! With a simple carat added to the templates section of pack(), I can easily pack the value in big-endian byte order and resolve our issue. I do that like so:</p>
<pre code="Perl">
$val="30.0";
print $sock pack('f>',$val);
</pre>
<p>And, ta da! Worked like a charm. Was it difficult to get it working? No. But, it was fun to have to deal with something like this. On any given day, you don&#8217;t really worry about endian-ness; things just work. This was one of those days where I actually had to use the old noggin, and that was refreshing.</p>
]]></content:encoded>
			<wfw:commentRss>http://chetnichols.org/2010/02/22/use-perl-pack-to-send-floats-between-intelsparc-with-proper-endian-ness/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>calculating dequeue time to tune load balancer queue handling</title>
		<link>http://chetnichols.org/2009/12/26/calculating-dequeue-time-to-tune-load-balancer-queue-handling/</link>
		<comments>http://chetnichols.org/2009/12/26/calculating-dequeue-time-to-tune-load-balancer-queue-handling/#comments</comments>
		<pubDate>Sat, 26 Dec 2009 11:14:03 +0000</pubDate>
		<dc:creator>chet</dc:creator>
				<category><![CDATA[scalability]]></category>
		<category><![CDATA[tech]]></category>

		<guid isPermaLink="false">http://chetnichols.org/?p=73</guid>
		<description><![CDATA[Maybe there&#8217;s already some obvious answer to this that I just ignored sometime in years past, but at some point, you may use a load balancer that will allow you to queue requests up in the event the backend apps are running slower than normal. Instead of killing the apps, the load balancer sets requests [...]]]></description>
			<content:encoded><![CDATA[<p>Maybe there&#8217;s already some obvious answer to this that I just ignored sometime in years past, but at some point, you may use a load balancer that will allow you to queue requests up in the event the backend apps are running slower than normal. Instead of killing the apps, the load balancer sets requests aside until the app is ready for more, then it sends it off.</p>
<p>However, what happens if the app is <strong>REALLY</strong> in trouble? Your connection queue could very easily grow into thousands of piled up requests all waiting to be serviced. At some point, you&#8217;re going to have to start dropping requests from the queue: it will get so large, that even after the app comes back, you may never recover from the pile of requests (or if you do, it might be 60 seconds worth, at which point your user will probably have given up anyway).</p>
<p>So, here we go.</p>
<p><strong>First</strong>: how many requests per second (max_requests) can your application slice handle? In this case, your application slice will be however many services are being load balanced. Depending on the max_requests, that will allow us to know how much headroom will be available for a given slice utilization; we will use that headroom to determine how many requests over and above the current utilization are available to help dequeue.</p>
<p><strong>Second</strong>: What is our average utilization? If on a given day we are 60% utilized, then we should in theory have 40% capacity available to help dequeue in the event of a surge/backup/whatever. That is, once we begin to dequeue, we will actually be running at a full 100% until we&#8217;re back from being under, then we&#8217;ll be back at the usual 40%.</p>
<p><strong>Third</strong>: How long were we backing up for, ie: what is the worst-case scenario we want to prepare for? If underlying apps rely on a database, and the database locks up for 10 seconds, then that will be 10 seconds of queueing we will get to enjoy. If we can&#8217;t afford 10 seconds, then what is the most we&#8217;re willing to deal with?</p>
<p>Equation time! Here are the three variables we have to deal with:</p>
<p><strong>max_requests</strong> = number of requests/sec that our load balanced slice can handle (ie: 400 .. as in 400 req/s)</p>
<p><strong>utilization</strong> = percentage of our average utilization that we want to plan around (ie: 0.6 .. as in 60% avg utilization)</p>
<p><strong>queue_time</strong> = number of seconds we were queueing for (ie: 10 .. in which the db locked up)</p>
<p>And here are some things we can derive:</p>
<p><strong>current_requests</strong> = ( max_requests * utilization) <em><--- ie: req/s utilized</em></p>
<p><strong>available_requests</strong> = ( max_requests * ( 1 &#8211; utilization ) ) <em><--- req/s available ... this is also what you should consider headroom</em></p>
<p><strong>queue_size</strong> = ( current_requests * queue_time )</p>
<p>At this point, it&#8217;s simple:</p>
<p><strong>time_to_dequeue</strong> = ( queue_size / available_requests )</p>
<p>Basically, all we&#8217;re saying is, take the number of current requests/sec we&#8217;re handling, and multiply it by the number of seconds we were queueing. This will give us the number of requests that are backed up.</p>
<p>Once our queuing has stopped, and we can begin to dequeue, we will still be handling the usual rate of requests (ie: 60% of our max_requests), but we have that 40% of headroom (ie: available_requests) to handle the queue, so we divide our total queue_size up by the available_requests rate, and that will give us the number of seconds our load balancer will take to dequeue.</p>
<p>Expanded, the equation is this:</p>
<p><strong>time_to_dequeue</strong> = ( ( ( max_requests * utilization ) * queue_time ) / ( max_requests * ( 1- utilization ) ) )</p>
<p>But wait! The beauty of this is, the time_to_dequeue is going to be the same for any given max_requests (you can see that from the equation). Regardless of our request rate, the rate at which we dequeue will be same relative to the percentage of utilization and a given queue_time.</p>
<p>So, we can simplify, removing anything having to do with request rates, allowing us to place time calculations for anything where we know the utilization and time we were queueing.</p>
<p><strong>time_to_dequeue</strong> = ( ( utilization * queue_time ) / ( 1 &#8211; utilization ) )</p>
<p>Now, a lot of the times, we need to know how LARGE the queue will be at the given peak (to configure the load balancer accordingly), and in that case, we will need to know request rates.</p>
<p>Either way, I found this to be an interesting little exercise- got my mind thinking about math, and having an equation was much easier to help me build a table of data and pump it into gnuplot.</p>
<p>Enjoy!</p>
]]></content:encoded>
			<wfw:commentRss>http://chetnichols.org/2009/12/26/calculating-dequeue-time-to-tune-load-balancer-queue-handling/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>adventures in perl: foreach returns pointers to elements</title>
		<link>http://chetnichols.org/2009/12/17/adventures-in-perl-foreach-returns-pointers-to-elements/</link>
		<comments>http://chetnichols.org/2009/12/17/adventures-in-perl-foreach-returns-pointers-to-elements/#comments</comments>
		<pubDate>Thu, 17 Dec 2009 08:58:50 +0000</pubDate>
		<dc:creator>chet</dc:creator>
				<category><![CDATA[perl]]></category>
		<category><![CDATA[tech]]></category>

		<guid isPermaLink="false">http://chetnichols.org/?p=55</guid>
		<description><![CDATA[I&#8217;m not sure how I&#8217;ve never run into this issue before. In some work I was doing recently, I ran into what I thought was a nasty bug [in my code] but couldn&#8217;t explain it. Without thinking, I started trying to undo this, fix that, hack this, and ignore that. After the meeting I was [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m not sure how I&#8217;ve never run into this issue before. In some work I was doing recently, I ran into what I thought was a nasty bug [in my code] but couldn&#8217;t explain it. Without thinking, I started trying to undo this, fix that, hack this, and ignore that. </p>
<p>After the meeting I was in finished, I was able to sit down and actually put some thought into what was happening. The one thing that stood out was interesting, but until then, I had no clue it was how Perl interpreted [in that scenario]. So, I wrote a short test script, sure enough proving the theory, and was able to fix my error.</p>
<p>Take a look at this:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
</pre></td><td class="code"><pre class="perl" style="font-family:monospace;"><span style="color: #b1b100;">my</span> <span style="color: #0000ff;">@animals</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#40;</span><span style="color: #ff0000;">&quot;cat&quot;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">&quot;dog&quot;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">&quot;emu&quot;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">&quot;frog&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #b1b100;">foreach</span> <span style="color: #b1b100;">my</span> <span style="color: #0000ff;">$animal</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">@animals</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #000066;">printf</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">&quot;do not eat the %s<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">$animal</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>which dumps out this:</p>
<p><code>do not eat the cat<br />
do not eat the dog<br />
do not eat the emu<br />
do not eat the frog<br />
</code></p>
<p><b>In my head</b>, the code above would work like this: for each element in the array @animals, copy the data from that element into a new scalar $animal, then print it out. Simple enough. Now, consider this sample:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
</pre></td><td class="code"><pre class="perl" style="font-family:monospace;"><span style="color: #b1b100;">my</span> <span style="color: #0000ff;">@animals</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#40;</span><span style="color: #ff0000;">&quot;cat&quot;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">&quot;dog&quot;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">&quot;emu&quot;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">&quot;lemur&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #b1b100;">foreach</span> <span style="color: #b1b100;">my</span> <span style="color: #0000ff;">$animal</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">@animals</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #000066;">printf</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">&quot;do not eat the %s<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">$animal</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #0000ff;">$animal</span> <span style="color: #339933;">=</span> <span style="color: #000066;">sprintf</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">&quot;rabid %s&quot;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">$animal</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #000066;">printf</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">&quot;<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #b1b100;">foreach</span> <span style="color: #b1b100;">my</span> <span style="color: #0000ff;">$animal</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">@animals</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #000066;">printf</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">&quot;do not eat the %s<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">$animal</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>I had *EXPECTED* it to output this:</p>
<p><code>do not eat the cat<br />
do not eat the dog<br />
do not eat the emu<br />
do not eat the frog</p>
<p>do not eat the cat<br />
do not eat the dog<br />
do not eat the emu<br />
do not eat the frog<br />
</code></p>
<p>For the first loop, again, <b>my thinking</b> was that we copy each array element&#8217;s data into the new scalar $animal, print it, modify it (by adding &#8216;rabid&#8217; to it), but do nothing with the modification (we are just modifying $animal, which should be assigned by copy, of which would be lost when we iterate to the next element). Then, in our next loop, we iterate over @animals again, initializing $animal yet again for each element, so we just hit the un-modified @animals array and see the same thing.</p>
<p>[un]Surprisingly, <b>I was totally wrong</b>. This was the output I got:</p>
<p><code>do not eat the cat<br />
do not eat the dog<br />
do not eat the emu<br />
do not eat the frog</p>
<p>do not eat the rabid cat<br />
do not eat the rabid dog<br />
do not eat the rabid emu<br />
do not eat the rabid frog<br />
</code></p>
<p>Wow! What happened? Turns out, like the title suggests, when you use foreach in Perl, <b>it actually assigns $animal to be a pointer</b> (reference, whatever) to that array element, and not a copy. When you make any changes, the change applies directly to the element the array. As another example, it&#8217;s for-loop equivalent would be this:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
</pre></td><td class="code"><pre class="perl" style="font-family:monospace;"><span style="color: #b1b100;">my</span> <span style="color: #0000ff;">@animals</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#40;</span><span style="color: #ff0000;">&quot;cat&quot;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">&quot;dog&quot;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">&quot;emu&quot;</span><span style="color: #339933;">,</span><span style="color: #ff0000;">&quot;lemur&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #b1b100;">for</span><span style="color: #009900;">&#40;</span> <span style="color: #b1b100;">my</span> <span style="color: #0000ff;">$index</span><span style="color: #339933;">=</span><span style="color: #cc66cc;">0</span> <span style="color: #339933;">;</span> <span style="color: #0000ff;">$index</span> <span style="color: #339933;">&lt;</span> <span style="color: #0000ff;">@animals</span> <span style="color: #339933;">;</span> <span style="color: #0000ff;">$index</span><span style="color: #339933;">++</span> <span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #000066;">printf</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">&quot;do not eat the %s<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">$animals</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">$index</span><span style="color: #009900;">&#93;</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #0000ff;">$animals</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">$index</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #000066;">sprintf</span><span style="color: #009900;">&#40;</span> <span style="color: #ff0000;">&quot;rabid %s&quot;</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">$animals</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">$index</span><span style="color: #009900;">&#93;</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>Fun! Doing some research after the fact, it turns out there has been some mild discussion on the topic. Some consider it a bug, but <b>in reality, it works by design</b>. What does this mean for you? Well, if you happen to be iterating over an array using foreach, and want to modify each element, and plan on looping over the array multiple times, make sure you understand what&#8217;s going on behind the scenes, or else you&#8217;ll run into the same issue I did. You can create a temporary variable (which will assign by copy), ie:</p>
<p><code>my $new_animal = $animal;</code></p>
<p>or you can iterate using a for-loop and just do it like this:</p>
<p><code>my $animal = $animals[$index];</code></p>
<p>There of course may be some cases where you want to take advantage of the referencing feature, and if you do, all the more power to you.</p>
<p>Well, that&#8217;s all I have for now. Hopefully this helps someone out some day. Enjoy!</p>
]]></content:encoded>
			<wfw:commentRss>http://chetnichols.org/2009/12/17/adventures-in-perl-foreach-returns-pointers-to-elements/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>nxdomain redirection with powerdns and lua</title>
		<link>http://chetnichols.org/2009/11/25/nxdomain-redirection-with-powerdns-and-lua/</link>
		<comments>http://chetnichols.org/2009/11/25/nxdomain-redirection-with-powerdns-and-lua/#comments</comments>
		<pubDate>Wed, 25 Nov 2009 11:31:16 +0000</pubDate>
		<dc:creator>chet</dc:creator>
				<category><![CDATA[tech]]></category>

		<guid isPermaLink="false">http://chetnichols.org/?p=20</guid>
		<description><![CDATA[For those of you unfamiliar, NXDOMAIN is a DNS query response for non-existent domain. An example would be if you start typing google.com, but sneeze halfway through, and end up going to gegooele.com. That domain doesn&#8217;t exist, and you will ultimately receive an NXDOMAIN response from your upstream DNS server instead of an A record [...]]]></description>
			<content:encoded><![CDATA[<p>For those of you unfamiliar, NXDOMAIN is a DNS query response for non-existent domain. An example would be if you start typing google.com, but sneeze halfway through, and end up going to gegooele.com. That domain doesn&#8217;t exist, and you will ultimately receive an NXDOMAIN response from your upstream DNS server instead of an A record with the IP of the domain.</p>
<p>These days, a lot of ISPs have been doing NXDOMAIN redirection. Instead of handing back the NXDOMAIN response, they will intercept it and send you back an A record with an IP of one of their web servers (lets say 1.2.3.4 as an example). Your machine will ultimately think that gegooele.com has the IP 1.2.3.4, connect to that IP, and do a GET request with the HTTP host header of gegooele.com. The ISP&#8217;s web server, which specifically handles NXDOMAIN &#8220;redirected&#8221; HTTP requests, will then take that host header, pass it through some type of dynamic script, and display a nice custom error page with possible corrections, advertisements, etc.</p>
<p>Now, here&#8217;s where PowerDNS comes in: there are companies out there whose specialty is to sell DNS appliances and software solutions, solely for the purpose of doing NXDOMAIN redirection. However, if you are in the market for doing something like this (which is a whole separate discussion altogether, since there are strong opinions about this subject), you can save yourself a chunk of change by instead using PowerDNS (pdns-recursor, to be exact) with built-in Lua scripting. PowerDNS is open source, free, awesome.</p>
<p>We&#8217;re going to skip the build/compile details for now, and assume you have the latest PowerDNS recursor installed, and that Lua scripting has been compiled in. Chances are, it has. If not, check out <a href="http://www.powerdns.com">http://www.powerdns.com</a> and download the latest source. Once it&#8217;s installed, you just need to write a Lua script to modify DNS results, configure your recursor to use the script, and start up pdns-recursor. Let&#8217;s get to it.</p>
<p>First, you need to define a Lua script to handle DNS responses. Currently, Lua will only handle two types of responses: nxdomain, and preresolve. Preresolve is a response to give before any resolution has taken place. This is to basically hijack requests with a static response.</p>
<p>To define, we&#8217;ll add this line to our <em>recursor.conf</em>:</p>
<p><code>lua-dns-script=/opt/pdns/bin/nxdomain.lua</code></p>
<p>Now, let&#8217;s write our script. At it&#8217;s most basic level, it&#8217;s really easy:</p>
<p><code>function nxdomain (ip,domain,query_type)<br />
  ips={}<br />
  if query_type ~= pdns.A then return -1, ret end<br />
    ips[1]={ query_type=pdns.A, content="1.2.3.4" }<br />
    ips[2]={ query_type=pdns.A, content="5.6.7.8" }<br />
    return 0, ips<br />
end</code></p>
<p>This script will first make sure our NXDOMAIN response is the result of a request for an A record. As long as it is, we will build an A record response to return instead of an NXDOMAIN. Per this code, the client will ultimately receive an A rotor response with two records.</p>
<p>Now that our script has been written and added to our recursor.conf, let&#8217;s start up the recursor and try it out!</p>
<p><i>$ /opt/pdns/sbin/pdns_recursor &#8211;config-dir=/opt/pdns/etc</i></p>
<p>Assuming the script is loaded successfully, you will see the following from the startup output:</p>
<p><b>Nov 25 03:09:36 Loaded &#8216;lua&#8217; script from &#8216;/opt/pdns/bin/nxdomain.lua&#8217;</b></p>
<p>Looks good. Now, let&#8217;s see what happens when we try to hit our non-existent gegooele.com:</p>
<p><code>$ host gegooele.com<br />
gegooele.com has address 1.2.3.4<br />
gegooele.com has address 5.6.7.8</code></p>
<p>And there we go! NXDOMAIN redirection without the need for a commercial solution. Of course, there are more things you can do [with the Lua script]. WIth the proper log level, you can have it log every time the handler is called. For example, you can add a simple print line:</p>
<p><code>print ("nxdomain handler received: ", ip, domain, query_type)</code></p>
<p>You can also have it make decisions based on incoming IP, hostname/domain, and query type. However, the point of this was just to show the basics with what you can do. For more Lua syntax, and PowerDNS capabilities with Lua, check out the Lua website and PowerDNS wiki. Enjoy!</p>
]]></content:encoded>
			<wfw:commentRss>http://chetnichols.org/2009/11/25/nxdomain-redirection-with-powerdns-and-lua/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>efficient reads on perl file handles: while vs. foreach</title>
		<link>http://chetnichols.org/2009/11/15/efficient-reads-on-perl-file-handles-while-vs-foreach/</link>
		<comments>http://chetnichols.org/2009/11/15/efficient-reads-on-perl-file-handles-while-vs-foreach/#comments</comments>
		<pubDate>Sun, 15 Nov 2009 11:58:47 +0000</pubDate>
		<dc:creator>chet</dc:creator>
				<category><![CDATA[tech]]></category>

		<guid isPermaLink="false">http://chetnichols.org/?p=30</guid>
		<description><![CDATA[A while back, I had put a Perl script together to read in some large files, do some data mining, and dump out the results I wanted to see (well, maybe they weren&#8217;t the results I was hoping for, but results nonetheless). Now, when the script ran, it would go through it&#8217;s first few routines [...]]]></description>
			<content:encoded><![CDATA[<p>A while back, I had put a Perl script together to read in some large files, do some data mining, and dump out the results I wanted to see (well, maybe they weren&#8217;t the results I was hoping for, but results nonetheless). Now, when the script ran, it would go through it&#8217;s first few routines nice and quick-like, but when it went to the following routine, it would become unbearably slow. The first couple of times, I didn&#8217;t think much of it; I liked feeling that the script was hard at work (of course).</p>
<p>However, after a few more runs, it was getting kind of ridiculous: the routine in question was the one that did reads from the file handle. Now, I can understand on a multi-gigabyte file it might take a while (read in each line, parse it through a regex, and populate a data structure). However, it was causing the entire system to hang, and that made no sense at all given it&#8217;s relatively easy instructions. Here&#8217;s an example of what I was doing:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
</pre></td><td class="code"><pre class="perl" style="font-family:monospace;"><span style="color: #b1b100;">my</span> <span style="color: #0000ff;">$fd</span> <span style="color: #339933;">=</span> IO<span style="color: #339933;">::</span><span style="color: #006600;">File</span><span style="color: #339933;">-&gt;</span><span style="color: #006600;">new</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">&quot;data.out&quot;</span><span style="color: #339933;">,</span>O_RDONLY<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #b1b100;">foreach</span> <span style="color: #b1b100;">my</span> <span style="color: #0000ff;">$line</span><span style="color: #009900;">&#40;</span><span style="color: #339933;">&lt;</span><span style="color: #0000ff;">$fd</span><span style="color: #339933;">&gt;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
   <span style="color: #0000ff;">@data</span> <span style="color: #339933;">=</span> <span style="color: #0000ff;">$line</span> <span style="color: #339933;">=~</span> <span style="color: #009966; font-style: italic;">/regex_pattern/</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>Simple enough. So, on the next run, I started tracking the PID. Now, I saw two things: 1) the CPU was pinned, and 2) memory utilization was going through the roof. Bingo. Basically, my multi-gigabyte file was being read into memory, eating up all the remaining free physical memory, which was forcing the box to start swapping (which was eating up the CPU). Now it made sense why things were going ridiculously slow.</p>
<p>Now, what didn&#8217;t make sense was the fact this was happening at all. For each line of data, I was only pulling about 10% of it into my data structure (bytes parsed vs bytes per line), so I shouldn&#8217;t see memory utilization match that of the file. Additionally, I had worked with large files like this before, on systems with less memory, and never had an issue. What was different?</p>
<p>I took lunch, thought about it, and somewhere in there thought to myself, &#8220;hrmm, I wonder if it has to do with the foreach loop.&#8221; In the past, I had always used a while-loop, but didn&#8217;t really consider there might be an extreme difference between how each was interpreted when reading a file handle. I went back to my desk and modified the code to use a while loop instead:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
</pre></td><td class="code"><pre class="perl" style="font-family:monospace;"><span style="color: #b1b100;">my</span> <span style="color: #0000ff;">$fd</span> <span style="color: #339933;">=</span> IO<span style="color: #339933;">::</span><span style="color: #006600;">File</span><span style="color: #339933;">-&gt;</span><span style="color: #006600;">new</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">&quot;data.out&quot;</span><span style="color: #339933;">,</span>O_RDONLY<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #b1b100;">while</span><span style="color: #009900;">&#40;</span><span style="color: #b1b100;">my</span> <span style="color: #0000ff;">$line</span> <span style="color: #339933;">=</span> <span style="color: #339933;">&lt;</span><span style="color: #0000ff;">$fd</span><span style="color: #339933;">&gt;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
   <span style="color: #0000ff;">@data</span> <span style="color: #339933;">=</span> <span style="color: #0000ff;">$line</span> <span style="color: #339933;">=~</span> <span style="color: #009966; font-style: italic;">/regex_pattern/</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>Sure enough, it tore right through the file, did it&#8217;s parsing, and was done very shortly after. Interesting! So, what happened?</p>
<p>Basically, it&#8217;s a major difference in how Perl handles while loops vs. foreach loops.</p>
<p>In a while-loop, Perl will blindly shift lines one-by-one off of the array/fd you pass until it reaches an EOF. Once it does, it breaks out and you&#8217;re done. This is how it should work (and does exactly what we expect it to do), so that makes sense.</p>
<p>Now, in a foreach-loop, the Perl interpreter needs to know the end of the array before it begins iterating over it (whereas with a while loop we just go until EOF). If the data is already an array, that&#8217;s not a problem. However, if the data is a file handle, then the only way it can know the end is by reading the entire file into memory and ultimately create an array of the data to return back to the foreach iterator.</p>
<p>Most of the time, people aren&#8217;t running through multi-gigabyte files, so doing a foreach or a while on a file handle won&#8217;t really be noticeable, even though there are two completely different logic paths being followed behind the scenes.</p>
<p>Ultimately, <strong>use a while loop when reading off of a file handle</strong>. It&#8217;s cleaner, uses less overhead, and has obviously proven itself to be more fit for this task.</p>
<p>There are a lot of Perl hackers out there (many of whom I have worked with) who may end up commenting or shed some deeper technical insight. By all means, I would love to hear it! If you think I&#8217;ve done a good job summarizing and detailing the issue, that works just as well! Enjoy!</p>
]]></content:encoded>
			<wfw:commentRss>http://chetnichols.org/2009/11/15/efficient-reads-on-perl-file-handles-while-vs-foreach/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>bring OS X to its knees, thanks to sshd via launchd</title>
		<link>http://chetnichols.org/2009/11/11/bring-os-x-to-its-knees-thanks-to-sshd-via-launchd/</link>
		<comments>http://chetnichols.org/2009/11/11/bring-os-x-to-its-knees-thanks-to-sshd-via-launchd/#comments</comments>
		<pubDate>Wed, 11 Nov 2009 09:16:55 +0000</pubDate>
		<dc:creator>chet</dc:creator>
				<category><![CDATA[apple]]></category>
		<category><![CDATA[tech]]></category>

		<guid isPermaLink="false">http://chetnichols.org/?p=24</guid>
		<description><![CDATA[In my previous post, I had discussed an interesting issue I came across where we were hitting MaxStartups in our modified, standalone sshd configuration, but not via the Mac OS X (and Mac OS X Server) default configuration (since an sshd parent is forked off via the launchd service for each individual SSH connection). Now, [...]]]></description>
			<content:encoded><![CDATA[<p>In my previous post, I had discussed an interesting issue I came across where we were hitting <em>MaxStartups</em> in our modified, standalone sshd configuration, but not via the Mac OS X (and Mac OS X Server) default configuration (since an sshd parent is forked off via the launchd service for each individual SSH connection).</p>
<p>Now, that got me thinking. If sshd NEVER hits <em>MaxStartups</em> via the OS X default launchd setup, then it should be relatively easy to DoS an OS X box, right? By default, OS X only allows 256 maximum user processes and 512 file descriptors, so with each new fork of an sshd parent, you can easily eat into that space and essentially block anyone from doing anything once you hit the limit. Even better (or worse), I&#8217;m guessing most OS X (and OS X Server) users don&#8217;t change any of these defaults: they keep sshd via launchd, and never run into any type of issue where they&#8217;d have to increase maxprocs or max file descriptors.</p>
<p>With this idea in my head, I chose to use my wife&#8217;s Macbook Pro as our test server; it could use a little exercise. So, I did what any OS X user would do: I went to System Preferences &gt; Sharing, and then enabled Remote Login. Of course, if  I do a &#8216;ps&#8217; to see if it&#8217;s running, I won&#8217;t see anything, because launchd is just hanging out waiting for someone to hit port 22.</p>
<p>Next thing, my Macbook Pro will have the same system defaults, so I have to increase those to be able to exceed them limit on the receiving end. That&#8217;s easy enough- let&#8217;s increase our maxprocs and max fds:</p>
<p><code>chets-laptop $ sudo ulimit -n 2048<br />
chets-laptop $ sudo ulimit -u 1024</code></p>
<p>There we go. Now, with my wife&#8217;s laptop willing and ready, let&#8217;s take out our trusty for loop:</p>
<p><code>chets-laptop $ for i in `jot 512 1`; do ssh 192.168.1.110 &amp; done</code></p>
<p>Now, with that running, I jump over to my wife&#8217;s laptop (Terminal already open), and I try to do an &#8216;ls&#8217;. Sure enough:</p>
<p><code>wifes-poor-laptop $ ls<br />
-bash: fork: Resource temporarily unavailable</code></p>
<p>Whoops! At this point, with a flood of SSH connections incoming, the only real thing I can do to stop the DoS is unload sshd from launchd. However, since the machine has hit the maximum number of processes allowed, I can&#8217;t even run the command to unload sshd from launchd. I can&#8217;t kill the sshd processes either, because I&#8217;d only be killing each individual parent, and launchd would just keep forking them off with each new connection. Regardless, killing won&#8217;t work anyway, just like unloading sshd from launchd:</p>
<p><code>wifes-poor-laptop $ sudo launchctl unload /System/Library/LaunchDaemons/ssh.plist<br />
-bash: fork: Resource temporarily unavailable</code></p>
<p>also tried:</p>
<p><code>wifes-poor-laptop $ sudo killall sshd<br />
-bash: fork: Resource temporarily unavailable</code></p>
<p>Ouch. Of course, I also tried opening some apps (System Preferences, Address Book, etc), and those would just bounce and bounce forever; the system had hit the hard limit, so there was nothing that could be done.</p>
<p>At this point, the <em>ONLY</em> way I could stop the attack was to take the machine off of the network, resulting in the sshd forks timing out and exiting. My wife&#8217;s laptop was actually wired at the time; if it were on Airport, I&#8217;m not sure if the system would have required an extra process resource to actually disable Airport, meaning a machine on the net via Airport would be totally, 100% useless.</p>
<p>Regardless, I think this is something people should be aware of before starting up Remote Login. Of course, running any type of remote login daemon is always a security risk, but if you&#8217;re on an open network with a public facing IP, you&#8217;re pretty much exposing your machine to being easily DoS&#8217;d by someone who knows how to write a for loop. You could, of course, set up TCP wrappers to buffer some of that, run ipfw, and whatever else, but I&#8217;m guessing that most OS X users aren&#8217;t going to be doing much in the way of this.</p>
<p>My suggestion: set up your hosts.allow/deny, configure ipfw to look for connection flooding on port 22, take sshd out of launchd, and, of course, only run sshd if you really need to. If someone out there wants to lock up your machine, and you&#8217;re running the OS X defaults, they can do it in about 10 seconds, and you won&#8217;t have a clue what hit you until you suddenly can&#8217;t do anything with your machine anymore.</p>
<p>All things aside, I love OS X and think it&#8217;s a great OS. There are just a few design decisions here and there that still need to be tweaked. Running sshd via the equivalent of xinetd is one of them. Good luck!</p>
]]></content:encoded>
			<wfw:commentRss>http://chetnichols.org/2009/11/11/bring-os-x-to-its-knees-thanks-to-sshd-via-launchd/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ssh_exchange_identification adventures</title>
		<link>http://chetnichols.org/2009/11/08/ssh_exchange_identification-adventures/</link>
		<comments>http://chetnichols.org/2009/11/08/ssh_exchange_identification-adventures/#comments</comments>
		<pubDate>Sun, 08 Nov 2009 10:59:34 +0000</pubDate>
		<dc:creator>chet</dc:creator>
				<category><![CDATA[apple]]></category>
		<category><![CDATA[tech]]></category>

		<guid isPermaLink="false">http://chetnichols.org/blog/?p=10</guid>
		<description><![CDATA[I ran into a fun issue on Friday evening with one of our servers, so I thought I&#8217;d toss it out into the world just incase anyone else runs into a similar issue. We have a rather heavily loaded administrative server, an Xserve, running Mac OS X Server. Additionally, we run a modified sshd configuration, [...]]]></description>
			<content:encoded><![CDATA[<p>I ran into a fun issue on Friday evening with one of our servers, so I thought I&#8217;d toss it out into the world just incase anyone else runs into a similar issue.</p>
<p>We have a rather heavily loaded administrative server, an Xserve, running Mac OS X Server. Additionally, we run a modified sshd configuration, where sshd runs as a standalone service, instead of via launchd as is with the default system installation. Is it relevant? Yes.</p>
<p>So, Friday evening rolls around (of course), and I get an e-mail from a co-worker saying that when she tries to log in, she gets that oh-so-common error message:</p>
<p><strong>ssh_exchange_identification: Connection closed by remote host</strong></p>
<p>Usually, in all of my experiences, this has been related to TCP wrappers; you&#8217;ve got a hosts.allow or hosts.deny set up, and you&#8217;re blocking (purposely or by accident), and just need to fix the config. Sometimes you&#8217;re totally screwed, but whatever. In our case, however, we aren&#8217;t using TCP wrappers. We could, since the default sshd in OS X does support it:</p>
<p><code>[chet@myhost]$ strings /usr/sbin/sshd  | grep wrap<br />
Connection refused by tcp wrapper<br />
libwrap refuse returns</code></p>
<p>or, if you have Xcode installed, you can use otool (the OS X equivalent to ldd in Linux):</p>
<p><code>[chet@myhost]$ otool -L /usr/sbin/sshd | grep libwrap<br />
/usr/lib/libwrap.7.dylib (compatibility version 7.0.0, current version 7.6.0)</code></p>
<p>However, we don&#8217;t have a hosts.(allow|deny), so that&#8217;s not the issue. Thinking further, I remembered someone mentioning that the box was under more load than usual, and may have hit some type of system limit (maxprocs, maxttys, etc). To test, I basically flooded the box with ssh connection requests using a super-basic loop that a 3rd grader would write:</p>
<p><code>$ for connection in `jot 500 1`; do<br />
$   ssh myhost &amp;<br />
$ done</code></p>
<p>Sure enough, right when I started the loop, I immediately began getting ssh_exchange_identification errors thrown back at me, whereas a single login not amidst a loop would work just fine. Bingo. So, it wasn&#8217;t a system limit per-se, but it was some type of sshd limit that was being triggered within the daemon itself.</p>
<p>Looking through the man pages, I came across an sshd_config option called MaxStartups. Here&#8217;s what the man page had to say:</p>
<blockquote><p><strong>MaxStartups</strong></p>
<p><em> Specifies the maximum number of concurrent unauthenticated con-</em><br />
<em> nections to the sshd daemon.  Additional connections will be</em><br />
<em> dropped until authentication succeeds or the LoginGraceTime</em><br />
<em> expires for a connection.  The default is 10.</em></p></blockquote>
<p>This sounded like a totally plausible cause for the issue. When I would flood, it would make requests so fast that sshd couldn&#8217;t keep up with authentications, so sshd would reach the MaxStartups limit and start blocking (to protect against DoS attacks, etc). To test it out, I changed <em>MaxStartups</em> from 10 to 200, ran my flood loop again, and sure enough, was able to connect without any errors. Perfect.</p>
<p>However, how come we hadn&#8217;t run into it before? Did it have something to do with us running sshd as a standalone daemon instead of via launchd? The standalone configuration is new for us, so we&#8217;re still keeping our eye out for bugs. To test, I ran my same flooding loop on an OS X Server machine running the system default (via launchd) sshd configuration. Sure enough, the issue did <strong>NOT</strong> present itself for that machine. Interesting.</p>
<p>Now, why is this, you may ask? It has to do with the way the sshd daemon runs via launchd.</p>
<p>The launchd service is very similar to xinetd when it comes to running network services: launchd knows what network services are configured under it, and it knows the ports those network services should accept connections from. So, launchd itself will listen on those ports. When a new connection request comes in (example: ssh), launchd will see it&#8217;s for port 22/sshd, fork off an sshd parent, and your connection gets handed off to that parent. If another connection comes in, launchd will fork off yet another sshd parent, and that 2nd user will get the 2nd parent. See the issue?</p>
<p>With the <em>MaxStartups</em> option in sshd_config, it relies on the sshd parent being the process handling all incoming connections, and forking off children for each new connection. If it sees that &gt;= <em>MaxStartups</em> children are in an un-authenticated state, it will block new connections from establishing.</p>
<p>In the case of launchd, however, each connection is a parent forked from launchd, so no parent will ever see more than 1 connection (since it will have no children). In that respect, it totally subverts the sshd_config option for <em>MaxStartups</em>, opening up your machine to DoS attacks via sshd floods. Awesome.</p>
<p>In any case, now that we understand the issue, how come it was presenting itself under a scenario where people <em>weren&#8217;t</em> trying to flood it with ssh connections? The answer lies somewhere between system load, DirectoryServices, and the pam_securityserver.so PAM module.</p>
<p>When an ssh request comes in, and PAM is enabled, sshd will use the options defined in the /etc/pam.d/sshd file to determine what needs to be done to authorize the user. In our case, we have this configuration (slightly modified from the default OS install, since we&#8217;re not running via launchd):</p>
<p><code>[chet@myhost] $ cat /etc/pam.d/sshd<br />
# sshd: auth account password session<br />
auth       required       pam_nologin.so<br />
auth       optional       pam_afpmount.so<br />
auth       sufficient     pam_securityserver.so<br />
auth       sufficient     pam_unix.so<br />
auth       required       pam_deny.so<br />
account    required       pam_securityserver.so<br />
password   required       pam_deny.so<br />
session    required       pam_permit.so<br />
session    optional       pam_afpmount.so</code></p>
<p>Looking at the file, we notice a couple of entries for pam_securityserver.so. This module is actually the bread and butter of the authentication process: it queries the local DirectoryServices agent running on the machine to do the user authentication, and will receive a success or failure from the agent. The DirectoryServices agent can be configured to authenticate for local users, remote users from an LDAP directory server, or a mix of both. It&#8217;s versatile, but it does a good amount of work. In the case of it being bound to a remote LDAP server, it can also be very slow (depending on if you&#8217;re caching account results or not).</p>
<p>Our issue was two-fold: at the time, the host was heavily loaded with some CPU <strong>and</strong> network intensive processes. The CPU intensive processes were slowing down the communication between pam_securityserver.so and DirectoryServices, and the network intensive processes were slowing down the communication between DirectoryServices and our remote LDAP server. Between the both of these, authentications became super slow, and incoming, un-authentication connections were slowly creeping up (especially due to the scripted cron-job logins that had no logic to drop a hung login &#8211; awesome).</p>
<p>However, at that time, things were actually still okay. Slow, but people could still get in. Unfortunately (or luckily), someone sent out a chat message saying the host was acting slow, which of course prompted everyone to try logging in at the same time. At that point, 10 MaxStartups was hit almost immediately (ie: 10 people tried logging in, and since authentications were taking forever, all were stuck as un-authenticated), resulting in the ssh_exchange_identification error being sent back to all subsequent logins immediately (this was a nice bit of detail to note: the sshd daemon itself was still very quick to respond, but as soon as it had to hand an authentication back off to PAM, it just sat around waiting).</p>
<p>Ultimately, the resolution for this was we increased MaxStartups for our heavily-loaded machines (which is still better than the un-regulated launchd sshd), and we will be setting up  separate server to handle all the jobs that were running.</p>
<p>Looking at other options, I noticed while writing this that pam_securityserver.so comes <em>BEFORE</em> pam_unix.so. As a test, I may try using pam_unix.so first, allowing root logins to bypass DirectoryServices, hitting passwd/shadow directly, and hopefully allowing us to jump in on what seems like a hung system due to DS being super slow. I&#8217;ll try to get back with an update if I end up testing this.</p>
<p>That&#8217;s all from me for now. Feel free to comment on your own experiences, knowledge, other ideas, or if you&#8217;ve run into the same thing. Enjoy!</p>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 124px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">[chet@myhost]$ strings /usr/sbin/sshd  | grep wrap</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 124px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">Connection refused by tcp wrapper</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 124px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">libwrap refuse returns</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 124px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">@(#)$OpenBSD: monitor_wrap.c,v 1.40 2005/05/24 17:32:43 avsm Exp $</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 124px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">outgoing seqnr wraps around</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 124px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">incoming seqnr wraps aroun</div>
]]></content:encoded>
			<wfw:commentRss>http://chetnichols.org/2009/11/08/ssh_exchange_identification-adventures/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
