[TriLUG] OT - gigabit switches
Aaron S. Joyner
aaron at joyner.ws
Thu Sep 21 02:18:34 EDT 2006
Ryan Leathers wrote:
><...snip...>
>In short, if a way can be
>devised to use L3 data in order to populate the table used by the
>switching process, then it is possible to achieve the performance
>benefits inherent in the process while retaining the value of decisions
>based on L3 hierarchical addressing. To go deeper into how this is done
>we really need to talk about specific implementations since there is
>more than one way to peel this onion, but this is the gist of it.
><...snip...>
>From a practical standpoint, to me, unless we're in a seriously deep
>network design discussion, making the distinction between routing and L3
>switching is splitting hairs.
>
>
Kudos to Ryan for going into great depth, yet resisting the urge to just
discuss the Cisco specifics. :) Unfortunately, I'm practically amazed
you got through that description with out using the terms CAM or TCAM.
Allow me to take a moment to elucidate on this idea, because the terms
CAM and 'cam table' get so horribly misused and throw around that I have
taken to clarifying the origins at every opportunity. No one has yet
provided the opportunity, so I'll create one. :)
First, we must explain how a switch works its magic internally. It
needs to take MAC addresses, and associate them with ports, and do so
very quickly. This is done by employing a large bank of a rather
unusual type of memory, called "Content Addressable Memory", or CAM for
sort. It's usually referenced as CAM memory, or the number of "CAMs"
that you have, referring to the quantity of memory (more on this in a
second). So what does this CAM do for a switch? It allows the switch
to tell the memory, "At address 00:00:00:C1:B2:A3, store the number 4".
Then, when the time comes, it can look up the memory address
00:00:00:C1:B2:A3 and get back a 4. This is, in essence, the
mac-address to port lookup table that a switch uses. In fact, the old
Cisco switches used to call the mac address table the "CAM table",
because literally, that's how it was implemented. It was of interest to
know how much CAM memory a switch had, in order to know how many MAC
addresses a switch could know about at any given time. This is still
something quoted on the side of most switch boxes, that most people
don't pay attention to. It's also a reasonable differentiator between
cheap switches and expensive switches, and directly relates to how much
power they consume (for reasons we'll see in a moment).
So, how does this fancy-pants memory work? (If you're really not
interested, you can skip this paragraph.) In essence, this is at the
heart of why switches are fast. The normal way to implement this in
software would be to take the MAC address, create a hash of it, and
store it at that memory location, so you could quickly access it later.
You then have to deal with potential hash collisions as a corner case,
but that's roughly it. The CAM simply implements this at the hardware
layer, with transistors. As you might imagine, this isn't terribly
difficult, but does consume quite a many more transistors than your
average memory circuit (which is basically just a set of tiny
capacitors). As a result, CAMs chew up modestly more transistors than
simple RAM, and this is why big fancy switches often more than just
modest amounts of power, and come with big noisy fans to match. Even if
you're not using them, powering up all those transistors can generate a
healthy amount of heat.
So, as Ryan suggested, there is a magical way of moving this routing
decision from a software decision by a CPU* to a hardware decision made
in a lighting fast switch-like fashion. A layer3 routing decision, at
it's heart, is a similar kind of beast. You want to take a certain
input (this time the destination IP address, as opposed to the MAC
address as before), and look up the result. The problem here is that
the mappings are no longer a simple 1-1 map, there is subnetting
involved. Enter, the TCAM: Ternary Content Addressable Memory. Now
this is cool stuff. TCAMs are effectively a hardware implementation of
a routing table. You can program in a certain number of values (think,
routes), and masks to those values (think subnet masks of those routes),
and when you look up the location of an IP address in the TCAM, you get
back the corresponding route, based on what you programmed into it in
the beginning. In fact, TCAMs are so cool, and versatile, that all
manner of subsystems in the a typical router can make use of them. You
could use them for route matching, ACLs, you can use them as simple MAC
table lookups (a poor use, but if you're just doing switching, why not
-- it does happen :) ), etc, etc. As a result, you can change how these
are associated with various subsystems of fancier routers, but it
usually requires a full restart of the system (unfortunate but true). I
should also mention that TCAMs have substantially more transistors than
their simple RAM brethren, and also far more than CAMs. Thus, they
consume proportionately more power, and because of the cost of making
CAMs and TCAMs, are also not cheap. The capabilities of a given router
or L3 switch have to be balanced with how much heat and power they can
reasonably expected to dissipate and consume, respectively, and how many
TCAMs they can include to meet a given price point. More CAMs and TCAMs
results in a more capable device, that is more expensive to produce, and
expensive to operate.
So, now that I've babbled on about CAMs and TCAMs, perhaps some of you
are thinking, "Gee, my Linux box doesn't have any of that fancy stuff."
You would be unfortunately correct. Generally speaking, a Linux box can
make up for *some* of these short comings in raw horse power. Consider
that dual cores running at 3+Ghz is really a lot of raw speed. The
counter argument there is that speed only helps so much. A modern
router doing an equivalent of CEF (Cisco Express Forwarding, aka making
the routing decisions in hardware, via TCAMs, as described above) gets
the packet in one interface and out the other in about the time it takes
a Linux kernel to process the interrupt that the packet has arrived and
read it into memory. It's still got to make the routing decision, then
write it back to the Ethernet card. This is assuming you don't have an
iptables module loaded or anything of the sort, which will interject
more delay that would happen in a single clock via a TCAM on the router.
So it might sound like I'm trashing Linux for routing. And in some
ways, I am. I suppose the lesson to take away is that when latency and
throughput are of utmost importance, Linux is not necessarily the
fastest way to go. On the flip side, it's often much much more
flexible, and usually orders of magnitude cheaper. It's usually a
decision you only have to make at the edge, because 24-ports or more of
GigE in a Linux box to be used a L3 switch isn't exactly easy to come by
yet. At the edge, your usually not as concerned with latency, and will
be glad to have the improved flexibility and convenience of a Linux box.
Okay, I'm done for the night. :)
Aaron S. Joyner
* - A "route processor" in the networking world, a generic Intel or AMD
CPU running the Linux kernel in most PC environments.
More information about the TriLUG
mailing list