whitepaper: converting data into information

www.process-relations.com

Process Relations GmbH

Converting Data into Information

Abstract Every new product or product enhancement starts with a new idea. In the area of process and device design for high-tech products like MEMS, NEMS, PV-Cells and Nanomaterials, personal experiences gained through previous developments provide a major contribution to new developments. Other sources for information and inspiration include colleagues, scientific papers and old lab books. However, this is where problems arise. Colleagues are not always available and it is not always clear if a certain experiment has already been conducted. Lab books are a great resource for historical data, but in most cases, they are only useful to the people who wrote them, as they know where to look and how to read them. Even if computer files are available, they are often distributed on several file servers or are hidden in some place and they are only sorted by a one dimensional criterion. Searching from another perspective is almost impossible. Furthermore, every engineer has his/her own way of storing information, which means that various office and freeware software is used to create documentation.

XperiDesk by Process Relations is a Process Development Execution System (PDES). It is used to organize and track the data and information gathered during the research phase of process development efforts. XperiDesk offers various tools to load, to manage and to retrieve data from various sources. It allows engineers to look at historical and current data and to make connections between the results gathered. By doing so XperiDesk enhances data and converts it into information that can be used for new product developments.

This whitepaper introduces two clients for the XperiDesk system that address the “heap” challenge. Meant is here the heap of historical data and constantly generated digital data on file servers. It will be explained how they can be used to convert the raw data into usable information within the XperiDesk PDES and what advantages engineers gain by using these tools.

Whitepaper

Whitepaper Converting Data into Information /10

Table of Contents Converting Data in Information ...................................................................................................................... 1

Abstract ......................................................................................................................................................... 1

The Challenge ............................................................................................................................................... 3

Addressing the problem................................................................................................................................. 3

Excel Client .................................................................................................................................................... 4

Analyzing historical data ............................................................................................................................ 5

Automated loading of data ......................................................................................................................... 7

File Loading Client ......................................................................................................................................... 8

Analyzing historical data ............................................................................................................................ 9

Automated loading of data ......................................................................................................................... 9

Summary ..................................................................................................................................................... 10


The Challenge The insufficient internal information and knowledge management resulting in recurring engineering “déjà vu’s” is one of the main issues experienced in today’s development organizations. Experiences gained by previous developments, scientific papers, and old lab-books provide the major contribution to the realization of new product ideas. Having no or only insufficient structure in these data causes a lot of trouble and double work. Experts in semiconductor process development estimate that 10-15% of failed and double experiments could be avoided, if previous results would be accessible in an easier way. This ties in with issues arising from engineer fluctuation between different projects. Moving the project expert into a different project might jeopardize the previous project while engineers moving into a running project are flooded with lots of unstructured information.

Additionally the traditional means of data storage provide only a one-dimensional search criterion. Cluttered result data storage and important data on local disk drives cause tedious and error prone manual data collection and sometimes even data loss. Furthermore often only the pure data points or result data sets are stored with limited or no context information. Having only limited context poses problems when trying to reproduce previously seen effects or result in drawing the wrong conclusions from cause-effect analysis. These circumstances produce “déjà vu’s” in the form of “Once we had a result ...” that can be very annoying and cost intensive.

Documenting and reporting the development progress can be tedious at best. Cluttered results storage puts major manual effort onto the development engineers requiring them to manually collect data from diverse machinery. Additionally the assembly of the collected result data into reports and the evaluation can take a major part of engineering time. Reporting on the development status is often times more a manual assembly of the reports than an automated process. The input data is often not up to date so that the Work In Progress (WIP) status is not necessarily precise. The impacts of these effects are even aggravated by quality assurance and compliance demands such as ISO 900X, CMMI, SOX etc. Because those apply more and more in development as well as in production, there is a strong demand to fulfill the imposed documentation requirements.

Addressing the problem One of the most commonly used tools in process development is Excel. Excel is very versatile and can be used to very quickly convert raw numbers into diagrams to visualize relationships. However Excel was never intended to replace a database. The data is structured in columns but some simple searches like show me all results where the resistance is between 500kOhm and 1MOhm and the thickness is between 5nm and 7nm are difficult at best. If you want to access context information like determining which project a specific wafer run was in or what other wafers were produced in the same project, the limits of Excel are reached very quickly. Obviously tables can be extended to contain that data, but then the tables become unreadable. Additionally copy and paste will be used as a starting point for entering new data, and that is a huge source for errors.

Besides these disadvantages Excel has a big advantage: it is a known tool. Most engineers know how to work with Excel and many software products can im- and export Excel files. The key is now to integrate the best of both worlds – a known and easy to use Excel with the power of database functionalities. Combining these two introduces many advantages to a R&D organization:

Existing Excel files can be analyzed and preprocessed to enable comprehensive searches

Excel templates can be used to ease the adaption of new methodologies, there will be no hard cut in the tool usage, the known Excel can be used to feed data into the new database

Established procedures of data collection can be kept and need only gradually to be changed enabling a smooth change to new procedures

Other tools that export Excel files can be integrated without much programming overhead.


Another problem today is the heap of digital data. Digital data in the form of images, analysis result files, diagrams and other formats is the backbone of any research organization. Many hours are spent to archive this data and later on to search for it. However, common approaches like file servers or even document management systems don’t account for the complexity of research data.

In many cases, the result files alone are useless. Without knowing the production process a wafer image doesn’t provide significant data or information value. Result diagrams are also not of much value without knowing the conditions in which they were created. More or less complicated hierarchies are used to compensate for limitations in file systems or meta data structures of Document Management Systems (DMS). However, as soon as a new dimension is added to the problem all of these approaches fail.

The following two paragraphs introduce two standalone clients for the XperiDesk system addressing the above motivated issues. The so called Excel Client having the capability to import table based data into XperiDesk, formalize the data on the fly and relate it to the pre-existing data and therefore building information. The second standalone client is the File Loading Client. It is capable of importing all types files into the system, add meta data to them and relate them to the pre-existing items or create new items other than files from the information in the file paths. Managed files will be indexed as well, so that fast searching in text files is enabled.

Excel Client The XperiDesk Excel Client was developed to overcome this problem. It allows users to continue to work with Excel, the tool they are used to, while enabling them to ask detailed questions of the XperiDesk database. Thus the Excel Client can be used to extract historical data and to analyze the ongoing experimentation. In addition, it gives a true meaning to the data. Raw numbers can be converted into meaningful measurement results with units and parameters. To facilitate these advantages to the maximum, the Excel Client is a standalone client that can be deployed to multiple servers and workstations. Thus it is possible to extract data from different sources to centralize the data on servers.


Figure 1: The Excel Client

Analyzing historical data Often a significant amount of historical data is stored in Excel worksheets. If the same or similar templates are used to create these worksheets, it is possible to easily import this large amount of data into XperiDesk. The Excel Client offers a graphical user interface to define loading scenarios from Excel worksheets. The content of the worksheets can be analyzed and modified while loading the data into XperiDesk.

Another benefit comes from the automated generation of relationships. For instance, the name of the wafer an experiment was done for can be extracted from any cell or any combination thereof in the worksheet or even from the sheet, file or path name. This information can be used to automatically link the experimental result to the specific wafer involved in the experiment. So the new information together with the relationships to existing data is generated on the fly during the import. It is not just raw data import as provides a searchable and browsable structure for the data converting it into information that can be used for effective decision-making.


Figure 2: Formalized data with relation in the XperiDesk Graph View

From this exercise the user gains a structured repository of the historical data. Data points from different sources are now related and can be searched using the relationships. If, for instance, two workgroups did measurements on the same device, the data can now be merged and referenced. Searches in a new quality become possible by asking for device data from different research groups. All numbers have a meaning after the import. Units and parameters are attached and are searchable. Dedicated searches e.g., for everything with a resistance of less than 3kΩ, become possible.


Figure 3: Search query and result on formalized data

Automated loading of data The automated loading of data is enabled by the batch mode of the Excel Client. Once a job is defined using the graphical user interface, the client can be started on the command line or using a scheduling service. So the Excel files can be checked for changes at given intervals and changes are imported into the XperiDesk system.

The described method can also be used to import external data from project partners. The results from e.g., external lab measurements can be linked to the internal research database. They become searchable and part of the context. This is extremely valuable for XperiDesk customers who outsource experiments or parts of experiments to 3

rd parties, but who want to track the results of an overall

experiment internally.

The result of this exercise is again a structured and searchable collection of the information. New questions can be asked to the system in seconds instead of the hours it took before to collect the data from the different Excel sheet. Imagine a search for all wafers manufactured with a certain combination of process steps and given processing intervals where the resulting resistance measurement showed a median resistance of 5kΩ. Additionally the “raw” Excel data can be attached to have the source of the data for extended reference.


Figure 4: Import Preview to check data before the import

In summary, the raw Excel data from different worksheets can be collected and structured. Links and relationships between the data sets are established and all is included in the steadily growing information network of the company, turning the raw data into usable information.

File Loading Client Based upon the common use of Excel spreadsheets in research and process environments, the File Loading Client was developed using similar principles. It can be used to analyze distributed file servers even on different locations. It can create file containers, called artefacts, containing the files. Other entities like wafers, lots or experiments can also be created using the filename and path information from the file server hierarchy. All these entities can be linked together, applying context to the files.


Figure 5: The File Loading Client

Analyzing historical data One of the tasks the File Loading Client can be applied to is to analyze historical file hierarchies. Existing file servers and backups can be loaded and information can be extracted from these. To do so the File Loading client uses customizable patterns. These patterns can be used on any file and path name to extract meaningful data. Together this data can be used to create new entries in the XperiDesk database, to update existing ones and to attach files to newly created or existing wafers, experiments and other entities under management.

The File Loading Client is a standalone client that can be deployed to multiple servers. Thus it is possible to extract file data from different sources and to centralize the data. Additionally new views of the data become available to everyone working with the system. Users are now able to look at the result files of other departments (if security rights permit it). Raw data is transformed into information with contexts that can now be used to further research projects throughout the company.

Automated loading of data The File Loading Client is also of use once the historical structures are analyzed. Many tools will continue to generate digital data. It is important to continuously archive these results in the context of the original research project By establishing a certain hierarchical structure in the file system and by using


standardized names for the files, the File Loading Client is able to load and link these files. No additional overhead is necessary. Users can continue to use the tools as they are used to.

Similarly to the Excel Client the File Loading Client can be run in batch mode. A graphical user interface is used to define the loading jobs and then updates can be run regularly using a scheduler. Any changes in files or the hierarchy are detected and using the versioning system even multiple version an artifact (e.g., a project document) can be automatically managed.

Figure 6: Extraction of data from filepath using pattern matching

Again the advantage is that raw data becomes information. Result files are now available in their context. Searches become much easier, critical results can be found much faster. Engineers can now spend their time in analyzing the results rather than searching for them.

Summary The Excel and the File Loading Client offer ways to get rid of “the heap” of raw data. By analyzing the raw data both tools can generate entries in the XperiDesk database. Additionally they can create links between these entries enabling the user to navigate, e.g. by visualizing the graph structure, to the needed data a lot faster. New relationships can be found helping the engineer to better understand the ongoing work. The links or relations kept in XperiDesk enable complex search queries that only deliver the results searched for. Time spent on organizing raw data is reduced severely. In summary these tools enable organizations to convert the heap of raw data into information usable for current and future research projects.

whitepaper: converting data into information

Technology