Observations and discussion
Throughout the implementation and experimentation with NCRAWL, we have observed the following effects in practice. We also discuss issues that have been keeping us busy.
Effects of the multi-rate environment
The transmission (bit) rate that is used at every link seems to be indeed extremely important in network coding, for different reasons at each link direction! Let us consider the above scenario with Alice, Bob and Jack. As we explained earlier, when f(Jack, Alice) » f(Jack, Bob), then Jack's encoded packet would have to be transmitted at rate equal to f(Jack, Bob); otherwise Bob cannot receive the packet with a high probability. We observe that this affects the long-term throughput at Alice tremendously, depending on the topological scenario! Furthermore, our node-grouping architecture is very conducive to network coding! We observe that when Alice is sending packets towards Jack with a high rate (i.e., 54 Mbps), while Bob is sending packets with a lower rate (i.e., 18 Mbps), then Jack barely identifies encoding opportunities! This is because the volume of packets originated from Alice is much higher, and thus very few of those packets are mixed with packets that are originated from Bob. Therefore, in order for network coding to be efficient anyway, the difference between the rates of the groups must be small. This is easy to identify with our grouping mechanism, so as to enable network coding dynamically, between specific groups only. We are in the process of experimenting with more scenarios, in order to make concrete conclusions and produce graphs.
Packet storage and time-out threshold
Let us assume similar traffic patterns for all nodes in the network (e.g. saturated UDP traffic). In order to facilitate network coding, packets from one group need to be temporarily stored at intermediate routers until packets from other groups arrive, so as to for encoding operations to take place. We call time-out threshold the maximum time period that a packet is waiting in a queue (in order for other packets to arrive for encoding). In the above scenario, a packet from Bill's group has to be stored in Jack's (router's) local queue, until a packet from Chloe's group arrives, in order to have an encoding opportunity. But, for how long should a packet be stored in Jack's queue? Clearly, (a) prolonged queuing reduces performance, since it increases the end-to-end delay experienced; (b) short-duration queuing may not allow frequent encoding operations, since packets leave the queues before being encoded (they are simply unicasted with some additional queuing delay). We are experimenting with different packet storage time-out thresholds. Note that the decision on the time-out threshold seems to be aligned with our grouping policy and the potential network coding ability between groups: When f(Alice, Jack) is similar/equal to f(Bob, Jack), packets from Alice and Bob arrive at Jack with similar rates from each direction. Hence, since frequent encoding opportunities exist, a small time-out threshold should not affect the efficacy of the network coding engine.
Creating and updating neighbor groups
As explained above, Jack needs to receive information about the links that exist in his neighborhood, as well as the neighborhood of each of his neighbors (read our earlier justification). For this, NCRAWL enables the exchange of link-quality information between nodes (we provide implementation details later), so as for nodes to have accurate information about the state of the links that are maintained by them and by their neighbors. This locally stored information depicts the most recent, temporary state of the local network topology in terms of the connectivity graph and the highest-throughput bit rate per neighbor link. Nodes use this information to classify their neighbors into groups. An issue here is how frequently this information should be updated, and how long does it take for all nodes to retrieve the updated information. Note that similar issues exist with the design of throughput-aware routing protocols. As we explain later, we build NCRAWL in Roofnet, and hence we have all the link-quality information that is exchanged through the probe-based SRCR routing protocol. We should also state that in dynamic environments the frequency of information update should be increased. We believe, however, that again this issue is more generic and potentially previous studies on information exchange may be applied – we will provide a related list. NCRAWL includes a (distributed) lightweight information exchange protocol. Finally, it is clear that the grouping policy can be changed; it is up to the network designer/administrator to select the policy. As an example, another policy would be to allow Alice and Chloe to communicate with each other, but enforce them to exchange packets through Jack when the Alice⇔Chloe link is quite poor. This is an approach that is followed by throughput-aware routing protocols (e.g. RM-AODV, SRCR, etc).
Constructing experimental scenarios
We implement and evaluate NCRAWL on our wireless testbed. First, care needs to be taken in terms of discovering representative scenarios where we can validate our claims using NCRAWL. Given that the testbed currently consists of 9 nodes (as of Nov. 15th, 2008), the set of scenarios that can be examined is limited. Thus, most of our experiments focus on certain relay nodes, which can categorize their neighbors to 2-4 groups at most. Even with such limited topologies, though, we expect to able to make concrete observations and conclusions. Second, as with most studies that involve routing and cross traffic at intermediate WiFi hops, we do expect to observe high end-to-end throughput values. For example, we have performed measurements with 3 nodes (topology same as Alice, Jack and Chloe above), and high quality links between Alice ⇔ Jack and Chloe ⇔ Jack (i.e., order of 25-30 Mbps at all directions, while the highest-throughput bit rate is 54 Mbps). When we inject fully saturated UDP traffic on the path Alice ⇒ Jack ⇒ Chloe as well as on the path Chloe ⇒ Jack ⇒ Alice (in parallel and without enabling network coding) we observe that the average end-to-end throughput at Alice and Chloe is typically not higher than 3.1 Mbps. If we additionally consider that (a) Alice and Chloe could be intermediate relays belonging to a longer route, and (b) Jack may be a relay for more than 2 routes, then we can expect that the cross-throughput without network coding cannot go way beyond the order of the above value. With network coding, and considering scenarios with encoding packets from multiple groups, we expect the throughput to increase but again not to extreme values.
Differentiating NCRAWL from COPE
NCRAWL differs from the previously implemented COPE framework in many aspects. At this point we do not intend to perform experiments with COPE and compare its performance against the one achieved with NCRAWL. The main reason is that COPE considers transmissions at the basic rate (6 Mbps in 802.11a/g), and thus one may directly expect that NCRAWL outperforms COPE, since the former supports multiple bit rates. We provide a list with the major differences between the two frameworks, in terms of design and implementation decisions. (DISCLAIMER: We do not negatively judge COPE; we state the main differences between COPE and our architecture, which (implicitly) lead to better performance in the long term).
Avoidance of long packet headers: With NCRAWL, nodes do not need to explicitly state (in future packet headers) which packets have been received and by whom. With COPE, each packet includes an additional COPE header, which contains this information, in order for neighbor nodes to know if their neighbors can perform decoding operations. In contrast, NCRAWL depends on the grouping mechanism which directly provides such information. Hence, theoretically Jack can directly predict that Bill and Chloe have sniffed Alice's and Bob's packets respectively. A possible (potentially minor) drawback with our design is that Alice's packet may have collided at Bill's antenna (with another packet from someone else); in this case, Bill would be unable to decode Jack's encoded packet. Since nodes acknowledge all the packets that they successfully decode, NCRAWL resolves this issue by having Jack to specifically unicast the packet to Bill. Note here that the COPE header is of variable length; the more the packets sniffed, the larger the header becomes for future data packets. However, this can create extremely long packet headers, thereby increasing the transmission overhead and reducing the throughput. More than that, this problem will be exacerbated at high bit rates, where thousands of packets are sniffed per second! Finally, note that the processing overhead of the sniffed packets will also be a major issue: the sniffer has to filter all these packets, store them and add their IDs into future packet headers. This is a quite time-consuming task and is expected to tremendously degrade the performance of the device at high bit rates.
Ability to employ network coding dynamically: As we explained above, in certain scenarios network coding does not offer additional benefits. NCRAWL is able to perform network coding selectively for certain link qualities and between specific groups, by simply observing the state of the links in the different groups. However, the COPE implementation cannot make such decisions online. With COPE, network coding is always performed, and the extra COPE header is always used with data packets.
Aggressiveness in encoding: NCRAWL is more aggressive in exploiting encoding opportunities. While COPE prefers encoding packets of similar lengths, NCRAWL encodes all kinds of packets that reach layer 3 (e.g. data packets, TCP acknowledgments, etc). Depending on the implementation, searching for packets of appropriate length may incur even more processing overhead. (COPE considers 2 different virtual queues per node: one for small packets and one for large ones). Furthermore, we believe that making online decisions with regards to what packets to encode together is also going to increase the processing overhead. Imagine having to (i) decide about what packets to encode together (from which neighbors and of what size), (ii) search and retrieve those packets from the corresponding queues, (iii) encode them (potentially padding some of them), and (iv) transmit them at (say) 54 Mbps: This procedure would have to be performed for approximately 4655 times per second (this is the approximate number of 1460-byte packets leaving the device at 54 Mbps)!
On the probability of packet reception: NCRAWL is heavily dependent on the grouping mechanism. Jack can directly expect that (with a high probability) nodes belonging to the same group have sniffed the same packet. The authors in COPE incorporate the IDs of the sniffed packets into the COPE header. A question we have is: Why isn't the COPE header sufficient to let neighbors know about the successfully sniffed packets? Why should encoders (such as Jack) need to additionally estimate the probability that each neighbor has correctly sniffed a packet? COPE argues that at times of severe congestion, reception reports may get lost in collisions. Note however that even with a few neighbors such situations can easily occur, if each node transmits at high bit rates. We consider that the probabilistic approach (which is also aligned with the design of NCRAWL) is the way to go in general, and that specific reception reports are rather obsolete here.
Setting the destination address in the encoded packets: Similarly with COPE, we perform pseudo-broadcast, by unicasting encoded packets that are meant for broadcast. Additionally to COPE, we specifically address the encoded packets to the node with the poorest link (in the groups of interest). Consider for example group 1 in the above figure, and assume that all links in group 1 use the 6 Mbps transmission rate. Even though the same rate is used for all those links (an inherent design concept of NCRAWL), each link may have a different quality in terms of packet delivery ratio (PDR), and thus the expected transmission count (ETX) and expected transmission time (ETT) metric values may differ among the 1-hop links in the group. With NCRAWL, whenever Jack transmits an encoded packet to groups 1 and 2, he addresses the packet to the node X with the highest ETT value on the link Jack ⇔ X, among all the nodes that will sniff the packet. With this, if Jack receives an 802.11 ACK from X (perhaps after a potential number of MAC retransmissions of his encoded packet), he is quite confident that all the other nodes (in the respective groups) will have managed to successfully overhear his encoded packet. Finally, NCRAWL also employs cumulative ACKs.
The discussion of COPE about fairness: An experimental assessment of COPE relative to our study is discussed in their Sigcomm'06 paper, section 7.5.1. As we also observe in our testbed, the sender with the best channel usually captures the medium for long time intervals, even though 802.11 tries to be fair. Recall that the COPE experiments were performed at the basic transmission rate (6 Mbps with 802.11a). No matter what the (same, fixed) rate is though, the link quality is extremely important in network coding. If Alice is far away from Jack while Bob is quite close, the latter will dominate the channel access. In such scenarios we have observed that the coding opportunities are very few, and the the selected bit rate matters a lot.