fixing some driver problems most software is discovered to have some ‘design-flaws’ after it has...

18
Fixing some driver problems Most software is discovered to have some ‘design-flaws’ after it has been put into use for awhile

Upload: gordon-robbins

Post on 17-Dec-2015

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Fixing some driver problems Most software is discovered to have some ‘design-flaws’ after it has been put into use for awhile

Fixing some driver problems

Most software is discovered to have some ‘design-flaws’ after it has been put into use for awhile

Page 2: Fixing some driver problems Most software is discovered to have some ‘design-flaws’ after it has been put into use for awhile

Problem 1

• Statistics registers are ‘clear-on-read’, but ‘get_info()’ function can be entered twice, so if a statistics register is re-read upon a second entry to this function, a zero-value ‘overwrites’ the original statistic-value

int my_get_info( char *buf, char **start, off_t off, int count ) {

int n_xmit_packets = ioread32( io + E1000_TPT );int len = 0;

len = sprintf( buf+len, “ packets sent = %d \n”, n_xmit_packets );

Page 3: Fixing some driver problems Most software is discovered to have some ‘design-flaws’ after it has been put into use for awhile

Fix for problem 1

• We can declare any variables that will be used to store statistics to be ‘static’, then add any new inputs to their prior values

int my_get_info( char *buf, char **start, off_t off, int count ) {

static int n_xmit_packets = 0; int len = 0;

n_xmit_packets += ioread32( io + E1000_TPT );len = sprintf( buf+len, “ packets sent = %d \n”, n_xmit_packets );

Page 4: Fixing some driver problems Most software is discovered to have some ‘design-flaws’ after it has been put into use for awhile

Problem 2

• Our ‘nic.c’ device-driver programmed the hardware to use 2KB buffers for received packets, but our software only allocated 1536-bytes for each of our receive buffers, so if an oversized packet gets transmitted to our NIC, it can cause ‘buffer overflow’ (i.e., possible data-corruption – or even a system-crash!)

A 2048-byte packet can “overflow” a 1536-byte buffer

data-corruption

system-crash

Page 5: Fixing some driver problems Most software is discovered to have some ‘design-flaws’ after it has been put into use for awhile

Fix for problem 2

• Insure our driver programs the software in a manner consistent with its programming of the Network Interface hardware!

RCTL:

BSIZE (Buffer-Size) BSEX (Buffer-Size Extension bit)FLEXBUF (size of all receive-buffers in KB, if nonzero)

30 27 25 17 16

Possible sizes for hardware receive-buffers: If FLEXBUF=0 and BSEX=0: 2048-bytes, 1024-bytes, 512-bytes, 256-bytes If FLEXBUF=0 and BSEX=1: 32768-bytes, 16384-bytes. 8192-bytes, 4096-bytes Othewise: 15K, 14K, 13K, 12K, 11K, 10K, 9K, 8K, 7K, 6K, 5K, 4K, 3K, 2K, 1K

Page 6: Fixing some driver problems Most software is discovered to have some ‘design-flaws’ after it has been put into use for awhile

device-driver’s packet-buffer

Problem 3

• If an application tries to ‘read’ fewer bytes than are contained in a received packet, the extra bytes get discarded (i.e., ‘lost’), which is NOT how a character-device is supposed to work!

application-program’s bufferUser-space:

Kernel-space:

excess bytes never do get returned to the application

copy_to_user()

Page 7: Fixing some driver problems Most software is discovered to have some ‘design-flaws’ after it has been put into use for awhile

Fix for problem 3

• We have introduced a new static variable (called ‘pickup’) that keeps track of where the ‘extra’ data begins -- so next time the application tries to ‘read()’, our driver can ‘pick up’ from where it had left off before ssize_t my_read( struct file *file, char *buf, size_t len, loff_t *pos ) { static int rxhead = 0; // the current rxring[] array-index

static int pickup = 0; // count of bytes returned so far

copy_to_user( buf, cp+pickup, len );pickup += len;if ( pickup >= rxring[ rxhead ].packet_length )

{ rxhead = (1 + rxhead) % N_RX_DESC; pickup = 0; }

Page 8: Fixing some driver problems Most software is discovered to have some ‘design-flaws’ after it has been put into use for awhile

Problem 4

• A program might call our driver’s ‘write()’ procedure more rapidly than the hardware can transmit previously written packets, so ‘old’ packets not yet sent get ‘overwritten’ by ‘new’ packets – thus data gets ‘lost’!

txring:

TDH

TDT

User’s next packet goes here, but the hardware still isn’t ‘done’ with transmitting previous data put there

NOTE: the NIC ‘stalls’ when TDT == TDH

Page 9: Fixing some driver problems Most software is discovered to have some ‘design-flaws’ after it has been put into use for awhile

Fix for problem 4

• We created an additional ‘wait-queue’ so our driver’s ‘write()’ routine can put a task to sleep until the hardware is ‘done’ with the descriptor indexed by the TDT value

wait_quete_head_t wq_xmit;

ssize_t my_write( struct file *file, const char *buf, size_t len, loff_t *pos ) {

int txtail = ioread32( io + E1000_TDT );

if ( txring[ txtail ].desc_status == 0 )wait_event_interruptible( wq_xmit, txring[ txtail ].desc_status );

Page 10: Fixing some driver problems Most software is discovered to have some ‘design-flaws’ after it has been put into use for awhile

Our ISR’s modification

• Our driver’s Interrupt Service Routine has the duty of ‘awakening’ a sleeping writer when the NIC generates a TXDW interrupt (i.e., for Transmit-Descriptor Writeback)

irqreturn_t my_isr( int irq, void *dev_id ) {

int intr_cause = ioread32( io + E1000_ICR );

if ( intr_cause & (1<<0) // TXDW has occurredwake_up_interruptible( &wq_xmit );

Page 11: Fixing some driver problems Most software is discovered to have some ‘design-flaws’ after it has been put into use for awhile

Problem 5

• The hardware might receive packets at a faster rate than the application program desires to read them – causing our ring-buffer to ‘fill up’ with newer data before being adequately drained of its older data

rxring:

RDH

RDT Because we allowed all of our receive-buffers to be ‘owned’ by the network controller, it continues round-and-round receiving everything!

Page 12: Fixing some driver problems Most software is discovered to have some ‘design-flaws’ after it has been put into use for awhile

Fix for problem 5

• We only grant ‘ownership’ to some of the receive-buffers at any given time – but we arrange for our interrupt-handler to adjust the RDT value dynamically whenever the the number owned by the NIC falls below a certain threshold (signaled by RXDMT0)

RCTL: 9 8

RDMTS

RDMTS (Receive Descriptors Minimum Threshold Size): 00=one-half, 01=one-fourth, 10=one-eighth. 11=one-sixteenth

Page 13: Fixing some driver problems Most software is discovered to have some ‘design-flaws’ after it has been put into use for awhile

Our ISR’s modification

• Our driver’s Interrupt Service Routine has the duty of advancing the RDT register’s value whenever the ‘Minimum Threshold Reached’ event is signaled irqreturn_t my_isr( int irq, void *dev_id ) {

int intr_cause = ioread32( io + E1000_ICR );

if ( intr_cause & (1<<4) // RXDMT0 has occurred{int rxtail = ioread32( io + E1000_RDT );rxtail = (8 + rxtail) % N_RX_DESC;iowrite32( rxtail, io + E1000_RDT );}

Page 14: Fixing some driver problems Most software is discovered to have some ‘design-flaws’ after it has been put into use for awhile

Discussion question

• Should our ‘fix’ for problem 5 be modified to employ the controller’s ‘flow control’?

• What will happen if an application program stops reading, but the NIC’s link-partner keep on sending out more data?

• Does this suggest a use drivers can make of the SWXOFF-bit in register TCTL?

TCTL: 22

SWXOFF

SWXOFF (Software XOFF) writing ‘1’ causes NIC to send a PAUSE frame

Page 15: Fixing some driver problems Most software is discovered to have some ‘design-flaws’ after it has been put into use for awhile

Problem 6

• When we all are doing development of device-drivers for the 82573L controller using our ‘anchor-cluster’ network, any broadcast-packets sent by one driver cause interference with others’ work

switched hub

Page 16: Fixing some driver problems Most software is discovered to have some ‘design-flaws’ after it has been put into use for awhile

Fix for problem 6

• We implemented VLAN capabilities in our ‘nic2.c’ revised character-mode driver, so students can employ VLAN identification-numbers in their outgoing packets that will cause those packets to be ‘filtered out’ by receiving drivers with different VLAN tags

switched hub

VLAN 1VLAN 1 VLAN 2 VLAN 3 VLAN 4

Page 17: Fixing some driver problems Most software is discovered to have some ‘design-flaws’ after it has been put into use for awhile

Modification to IOCTL

• We needed a convenient way to let user-programs change their driver’s VLAN tag

int my_ioctl( struct inode *inode, struct file *file, unsigned int request, unsigned long address )

{switch ( request )

{case 0: // SET destination MAC-addresscase 1: // GET destination MAC-addresscase 2: // SET a revised VLAN identifiercase 3: // GET the current VLAN identifier}

Page 18: Fixing some driver problems Most software is discovered to have some ‘design-flaws’ after it has been put into use for awhile

In-class exercise

• Try using our ‘joinvlan.cpp’ demo-program which lets a user change the current VLAN identification in the running ‘nic2.c’ driver

• How shall we arrange a scheme for every student to have their own unique VLAN id?