kernel development

Download Kernel Development

Post on 19-Mar-2016

37 views

Category:

Documents

1 download

Embed Size (px)

DESCRIPTION

Kernel Development. CSC585 Class Project Dawn Nelson December 2009. Compare timing and jitter between a realtime module and non- realtime module. Are the results of using a realtime module worth the effort of installing RTAI? - PowerPoint PPT Presentation

TRANSCRIPT

  • KERNEL DEVELOPMENTCSC585 Class ProjectDawn NelsonDecember 2009

  • COMPARE TIMING AND JITTER BETWEEN A REALTIME MODULE AND NON-REALTIME MODULE

    Are the results of using a realtime module worth the effort of installing RTAI?What is the timing difference between realtime and non-realtime kernel modules for computation?What is the jitter difference between realtime and non-realtime kernel modules for computation?What is the jitter difference between realtime and non-realtime kernel modules for overall process time, with and without MPI?What types of tasks are improved by using RTAI?

  • WHAT IS THE TIMING DIFFERENCE BETWEEN REALTIME AND NON-REALTIME KERNEL MODULES?

  • OVERALL PROCESS TIME COMPARISON FOR 8 NODES

  • WHAT IS THE JITTER DIFFERENCE BETWEEN REALTIME AND NON-REALTIME KERNEL MODULES FOR OVERALL PROCESS TIME?

  • SOURCE CODE WRITTENKernel Module implementing a char device read/write as a signal to perform the kernel task.Kernel Module implementing RTAI with a fifo and a semaphore as a signal to perform the kernel task.Programs to use the kernel modules.MPI Programs to use the kernel modules.Scripts to build and load both modules.Scripts to run programs and save results.Scripts to initiate MPI on all nodes (because mpdboot is retarded and doesnt work for 8 nodes)

  • CHARACTER DEVICE DRIVER READ FUNCTION///readssize_t mmmodule_mmmdo(struct file *filp, char *buf,size_t count, loff_t *f_pos) {int a[20][20],b[20][20],c[20][20];int i,j,k,extraloop,t2;RTIME t0, t1;t0 = rt_get_cpu_time_ns();//50000 iterations for a good measurementfor (extraloop=0; extraloop< 50000; extraloop++) { // Matrix calculation block for (k=0; k< 20; k++) for (i=0; i< 20; i++) {c[i][k] = 0;for (j=0; j< 20; j++) c[i][k] = c[i][k] + a[i][j] * b[j][k]; } }t1 = rt_get_cpu_time_ns();t2 = (int) (t1-t0);// Changing reading position as best suits //copy_to_user(buf,mmmodule_buffer,1);return t2; }

  • CHARACTER DEVICE DRIVER - SETUP// memory character device driver to do matrix// multiply upon a call to it#include #include #include // printk() #include // kmalloc() #include // everything#include // error codes #include // size_t#include #include // O_ACCMODE #include // cli(), *_flags#include MODULE_LICENSE("GPL");// Declaration of mmmodule.c functions int mmmodule_open(struct inode *inode, struct file *filp);int mmmodule_release(struct inode *inode, struct file *filp);ssize_t mmmodule_mmmdo(struct file *filp, char *buf, size_t count, loff_t *f_pos);void mmmodule_exit(void);int mmmodule_init(void);/* Structure that declares the usual file *//* access functions */struct file_operations mmmodule_fops = {read: mmmodule_mmmdo,//write: mmmodule_write,open: mmmodule_open,release: mmmodule_release};

    // Declaration of the init and exit functions module_init(mmmodule_init);module_exit(mmmodule_exit);// Global variables of the driver int mmmodule_major = 60; // Major number char *mmmodule_buffer; // Buffer to store data int mmmodule_init(void) { int result; // Registering device result = register_chrdev(mmmodule_major, "mmmodule", &mmmodule_fops);if (result < 0) { printk("mmmodule: cannot get major number %d\n", mmmodule_major); return result;}// Allocating mmmodule for the buffer mmmodule_buffer = kmalloc(1, GFP_KERNEL);if (!mmmodule_buffer) { result = -ENOMEM; goto fail;}memset(mmmodule_buffer, 0, 1);printk("Inserting mmmodule module\n");return 0;fail:mmmodule_exit();return result;}

  • REALTIME MODULE - READstatic int myfifo_handler(unsigned int fifo){rt_sem_signal(&myfifo_sem);return 0;}

    static void Myfifo_Read(long t){ int i=0,j=0,k=0,xj=0; int a[20][20],b[20][20],c[20][20]; char ch ='d'; RTIME t0, t1;

    while (1) {//rt_printk("new_shm: sem_waiting\n");rt_sem_wait(&myfifo_sem);rtf_get(Myfifo, &ch, 1);//rt_printk("got a char off the fifo... time to do matrix mult\n");t0 = rt_get_cpu_time_ns();//rt_printk("t0= %ld \n",t0);for (xj=0; xj < 50000; xj++) {for (k=0; k < 20; k++)for (i=0; i < 20; i++) {c[i][k] = 0;for (j=0; j< 20; j++)c[i][k] = c[i][k] + a[i][j] * b[j][k];} }t1 = rt_get_cpu_time_ns();shm->t2 = t1-t0; // = (int *)t2;}}

  • REALTIME MODULE - SETUPstatic RT_TASK read;#define TICK_PERIOD 100000LL /* 0.1 msec ( 1 tick) */

    int init_module (void){// shared memory sectionrt_printk("shm_rt.ko initialized: tick period = %ld\n", TICK_PERIOD);shm = (mtime *)rtai_kmalloc(nam2num(SHMNAM), SHMSIZ);if (shm == NULL)return -ENOMEM;memset(shm, 0, SHMSIZ);

    rtf_create(Myfifo, 1000);rtf_create_handler(Myfifo, myfifo_handler);rt_sem_init(&sync, 0);rt_typed_sem_init(&myfifo_sem, 0, SEM_TYPE);rt_task_init(&read, Myfifo_Read, 0, 2000, 0, 0, 0);

    start_rt_timer((int)nano2count(TICK_PERIOD));rt_task_resume(&read);return 0;}

  • CONCLUSIONSThere are cases when RTAI improves timing and jitter. Mostly, longer running tasks, widely distributed tasks and deterministic tasks.Accessing shared memory created using RTAI sadly slows the module to a crawl. My previous rt-module was giving results of 140 milliseconds per 5000 matrix multiplies. New version gives results of 100 Nanoseconds for 50,000 matrix multiplies. I can try physical memory mapping to see if performance is improved.I dont think modules were meant to be used for mass amounts of data, because of the slow transfer between user & kernel via copy-to-user, shared memory and copy-from-userFor MPI, the main advantage of using RTAI is that the nodes all finish at nearly the same rate.

  • LESSONS LEARNEDA kernel crash writes core dumps on all open windows.A small tick-period locks up the whole machine and is unrecoverable.Fifos and semaphores work nicely and do not create race conditions.Character device drivers work nicely but are a little more maintenance to set up and program.These are my first modules ever written, including the rt one for the conference.A profiler would be very useful for comparing performance instead of graphs and text.I will soon be writing an RT module to read a synchro device every 12 milliseconds to try out the deterministic-ness of RTAI.1000 Nanoseconds = 1 Micosecond1 Microsecond = 1000 Millisecond1 Millisecond = 1000000 Nanoseconds

  • FUTURE WORKThere is very little work or code examples (findable by Google, anyway) done with RTAI The Matrix Multiply, even at 50 thousands iterations, is not cpu-intensive enough to prove or disprove the advantages of RTAI.Need to ask the Physicists for some of their algorithms to crunch through the system. At the conference, it was the physicists who showed interest in RTAI.Plan to use RTAI for its intended purpose of being deterministic.Write stuff about things for a paper.

  • C107 8 NODE CLUSTER SETUP WITH CENTOS 5.3, RTAI AND MPICH2

    *

Recommended

View more >