[TriLUG] OT - gigabit switches

Thu Sep 21 02:18:34 EDT 2006

Ryan Leathers wrote:

><...snip...>
>In short, if a way can be
>devised to use L3 data in order to populate the table used by the
>switching process, then it is possible to achieve the performance
>benefits inherent in the process while retaining the value of decisions
>based on L3 hierarchical addressing.  To go deeper into how this is done
>we really need to talk about specific implementations since there is
>more than one way to peel this onion, but this is the gist of it.
><...snip...>
>From a practical standpoint, to me, unless we're in a seriously deep
>network design discussion, making the distinction between routing and L3
>switching is splitting hairs.
>  
>
Kudos to Ryan for going into great depth, yet resisting the urge to just 
discuss the Cisco specifics.  :)  Unfortunately, I'm practically amazed 
you got through that description with out using the terms CAM or TCAM.  
Allow me to take a moment to elucidate on this idea, because the terms 
CAM and 'cam table' get so horribly misused and throw around that I have 
taken to clarifying the origins at every opportunity.  No one has yet 
provided the opportunity, so I'll create one.  :)

First, we must explain how a switch works its magic internally.  It 
needs to take MAC addresses, and associate them with ports, and do so 
very quickly.  This is done by employing a large bank of a rather 
unusual type of memory, called "Content Addressable Memory", or CAM for 
sort.  It's usually referenced as CAM memory, or the number of "CAMs" 
that you have, referring to the quantity of memory (more on this in a 
second).  So what does this CAM do for a switch?  It allows the switch 
to tell the memory, "At address 00:00:00:C1:B2:A3, store the number 4".  
Then, when the time comes, it can look up the memory address 
00:00:00:C1:B2:A3 and get back a 4.  This is, in essence, the 
mac-address to port lookup table that a switch uses.  In fact, the old 
Cisco switches used to call the mac address table the "CAM table", 
because literally, that's how it was implemented.  It was of interest to 
know how much CAM memory a switch had, in order to know how many MAC 
addresses a switch could know about at any given time.  This is still 
something quoted on the side of most switch boxes, that most people 
don't pay attention to.  It's also a reasonable differentiator between 
cheap switches and expensive switches, and directly relates to how much 
power they consume (for reasons we'll see in a moment).

So, how does this fancy-pants memory work?  (If you're really not 
interested, you can skip this paragraph.)  In essence, this is at the 
heart of why switches are fast.  The normal way to implement this in 
software would be to take the MAC address, create a hash of it, and 
store it at that memory location, so you could quickly access it later.  
You then have to deal with potential hash collisions as a corner case, 
but that's roughly it.  The CAM simply implements this at the hardware 
layer, with transistors.  As you might imagine, this isn't terribly 
difficult, but does consume quite a many more transistors than your 
average memory circuit (which is basically just a set of tiny 
capacitors).  As a result, CAMs chew up modestly more transistors than 
simple RAM, and this is why big fancy switches often more than just 
modest amounts of power, and come with big noisy fans to match.  Even if 
you're not using them, powering up all those transistors can generate a 
healthy amount of heat.

So, as Ryan suggested, there is a magical way of moving this routing 
decision from a software decision by a CPU* to a hardware decision made 
in a lighting fast switch-like fashion.  A layer3 routing decision, at 
it's heart, is a similar kind of beast.  You want to take a certain 
input (this time the destination IP address, as opposed to the MAC 
address as before), and look up the result.  The problem here is that 
the mappings are no longer a simple 1-1 map, there is subnetting 
involved.  Enter, the TCAM: Ternary Content Addressable Memory.  Now 
this is cool stuff.  TCAMs are effectively a hardware implementation of 
a routing table.  You can program in a certain number of values (think, 
routes), and masks to those values (think subnet masks of those routes), 
and when you look up the location of an IP address in the TCAM, you get 
back the corresponding route, based on what you programmed into it in 
the beginning.  In fact, TCAMs are so cool, and versatile, that all 
manner of subsystems in the a typical router can make use of them.  You 
could use them for route matching, ACLs, you can use them as simple MAC 
table lookups (a poor use, but if you're just doing switching, why not 
-- it does happen :) ), etc, etc.  As a result, you can change how these 
are associated with various subsystems of fancier routers, but it 
usually requires a full restart of the system (unfortunate but true).  I 
should also mention that TCAMs have substantially more transistors than 
their simple RAM brethren, and also far more than CAMs.  Thus, they 
consume proportionately more power, and because of the cost of making 
CAMs and TCAMs, are also not cheap.  The capabilities of a given router 
or L3 switch have to be balanced with how much heat and power they can 
reasonably expected to dissipate and consume, respectively, and how many 
TCAMs they can include to meet a given price point.  More CAMs and TCAMs 
results in a more capable device, that is more expensive to produce, and 
expensive to operate.

So, now that I've babbled on about CAMs and TCAMs, perhaps some of you 
are thinking, "Gee, my Linux box doesn't have any of that fancy stuff."  
You would be unfortunately correct.  Generally speaking, a Linux box can 
make up for *some* of these short comings in raw horse power.  Consider 
that dual cores running at 3+Ghz is really a lot of raw speed.  The 
counter argument there is that speed only helps so much.  A modern 
router doing an equivalent of CEF (Cisco Express Forwarding, aka making 
the routing decisions in hardware, via TCAMs, as described above) gets 
the packet in one interface and out the other in about the time it takes 
a Linux kernel to process the interrupt that the packet has arrived and 
read it into memory.  It's still got to make the routing decision, then 
write it back to the Ethernet card.  This is assuming you don't have an 
iptables module loaded or anything of the sort, which will interject 
more delay that would happen in a single clock via a TCAM on the router.

So it might sound like I'm trashing Linux for routing.  And in some 
ways, I am.  I suppose the lesson to take away is that when latency and 
throughput are of utmost importance, Linux is not necessarily the 
fastest way to go.  On the flip side, it's often much much more 
flexible, and usually orders of magnitude cheaper.  It's usually a 
decision you only have to make at the edge, because 24-ports or more of 
GigE in a Linux box to be used a L3 switch isn't exactly easy to come by 
yet.  At the edge, your usually not as concerned with latency, and will 
be glad to have the improved flexibility and convenience of a Linux box.

Okay, I'm done for the night.  :)
Aaron S. Joyner

* - A "route processor" in the networking world, a generic Intel or AMD 
CPU running the Linux kernel in most PC environments.