bsc thesis part ii

Upload: tpitikaris

Post on 07-Apr-2018

232 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/6/2019 Bsc Thesis Part II

    1/50

    \, .

    o Ability o AdaptThereare severalways hat agents anbe rained o betterunder-standuserpreferencesy using omputationalntell igenceech-niques, euralnetworks, daptiveuzzy ogicetc.

    of agent has the property that can spawn to another process-unit l ive here performsome operations r die.

    permits he required esources o be included n theireasy o designsoftware agents han can exe-

    threads on differe nt systemsand thus they becomedistr ibuted

    systemshat incl udesseveralagents which ar e performing asks

    mult-agent environments. t very possibleis such mult i -agent ystems agent l wi l l .nothave al l data or al l

    available o achievean o bjective and thus they will have to ex-ecourseswith other agents

    ,#; i,,i:

    s

    2 t

  • 8/6/2019 Bsc Thesis Part II

    2/50

    o Abi l i ty o Adapt

    Thereare severalways hat agentscan be trained o betterunder-stand user preferences y usingcomputat ionalntel l igenceech-niques,neuralnetworks,adapt ive uzzy ogicetc.

    category fagent has he property hat can spawn o anothe rprocess-uni t ive here perform omeoperat ions r d ie,

    agentsagentdesignpermits he required esourceso be included n their

    t is relat ively asy o designsoftware gents han can exe-hreads on di f ferent ystemsand thus they becomedistr ibuted

    systemshat includes everal gentswhichare performing asks

    al l h is environmentsmult-agent environments.t very possibles suchmu l t i - agen t ys temsagen tQ i l l no t havea l l da tao r a l l

    avai lableo achieve n object ive nd thus they wi l l have o ex-i t h o the ragen ts .

    2 l

  • 8/6/2019 Bsc Thesis Part II

    3/50

    CHAPTER2, SYSTEMDEVELOPMENT PROCESS

    D e f i n i t i o n o f s o f t w a r e d e v e l o p m e n tp r o c e s s

    Accordingo WikipediaWikipedia, 005) a software evelopmentprocessis a structure mposed n the development f a softwareproduct.Synonyms ncludesoftware ife cycle and software proc-eG. Thereare severalmodels or suchprocesses,ach describingapproacheso a varietyof tasksor activitieshat take placeduringth e process.'In other wordssoftwaredevelopment rocesss a set of methodsthat intent o provideguidelines boutselecting,mplementingndmonitoringa lifecycle or a softwareproject.

    Some of mos t well known modelsare (Wikipedia,2005):o Capability Maturity Modelo ISO 15504(SoftwareProcess mprovement Capability' Determination SPICE))o Six Sigma

    Th e aforementionedmodels 6r e (with exception o the ISO 15504)generalprojectmanagementmodels ha t can be applied n softwareindustry n order to control and guide he software productionproc-ess.

    22

  • 8/6/2019 Bsc Thesis Part II

    4/50

    S y s t e m D e v e l o p m e n t L i f e C y c l e ( S D L C )

    The systems development ife cycle (SDLC) s defined by United StatesDepaftment of Justice (Justice, 2003) as "ilsg-&rryefe--C-ee]qpment-processralthough it is also a distinct process ndependentof softwareor other Information Technology considerations. t is used by a sys-tems analyst to develop an information system, including require-ments, validation, raining, and user ownership hrough investigation,analysis, design, implementation and maintenance. SDLC is alsoknown as information systems development or application develop-ment." o

    23

  • 8/6/2019 Bsc Thesis Part II

    5/50

    Systems DevelopmentLife-Cvcle Life Cycle (SDLC)Phases trC-

    d tr6rM6t i{Ytrei

    Figure SDLC hasesvailablerom: http://www,usdoj.gov/jmd/irm/lifecycle/im-ages/ch1.gif

    VariousSDLCmethodologies ave been developed o guide the processes-. involved.The most commonare:

    o The waterfall model (Lowe, 1999): in which development spass-through he phasesof :

    1. Requirementsanplysis (System services,constraints,an dgoals are established.Definitionsare understandablebyboth developers nd customers,)

    2. Design System& Softwaredesign)3. implementation Programunits are produced)4. Testing & Debugging: f inding bugs & defects and re-

    duce/eliminate hat in order to meet the specification.

    24

  • 8/6/2019 Bsc Thesis Part II

    6/50

    irt*;Integration:programunits are integrated nto the system.

    The systemas whole s tested o verify that it meets hespecifications.Maintenance:nhancing ndoptimizing eployed oftware,integrating ewneedsand correcting efects,

    The erm was ntroducedn 1970by W. W. Royce;. Rapid application development (RAD)sagest hat products an

    be developedasterwithhigher uality y (Inc.,2000):1. Usingworkshops r focusgrouflbo gather equirements.2, Prototypingnduser esting f designs.3. Re:using oftware omponents.4, Following schedulehat defersdesign mprovementso

    the nextproduct ersion.5. Keeping eviewmeetings nd other eam communication

    informal.Joint applicationdevelopment JAD) The JointApplication e-velopmentJAD)methodologyim s o involvehe client n the de-sign and development f an applic ation. his is a ccomplishedthrough series f collaborativeorkshopsalled AD es sions.

    . The fountain modelTh e spiral model (Wikipedia, 2OO6c): The spiral methodologyextends the waterfall model by introducing prototyping. It is gen-erally chosenover the waterfall approach or large, expensive,ndcomplicatedprojects.Agile Software Development: Agile software development is aconceptual ramework for software engineeringprojects. There are

    5.

    6 .

    25

  • 8/6/2019 Bsc Thesis Part II

    7/50

    o

    TimeLineFigure 2 The WaterfallModel

    a numberof agilesoftware everopment ethods, uc has thoseusedby the AgileAlliance.XP Extreme rogramming)Th ePrototyping ethodology

    Figure 3 the Rad process Flow

    II

    26

  • 8/6/2019 Bsc Thesis Part II

    8/50

    A g i l e S o f t w a r e D e v e l o p m e n t i n d e t a i l sWhile traditionaldevelopmentmethodologiesiv e emphasis ndocumentation rocess gile methodologiesefines he teamworkand communication s key factors o a successfulystemdesign&implementationrocedue.Accordingo AgileManifestoFowler, 002), he agilemethodologydeclare re mportantaspects:'Individuals nd nteractionsve rproCsses nd ools..Working oftware ve rcomprehensiveocumentation..Customer ollaborationve rcontract egotiation..Respondingo change ver ollowing plan.

    Most agile methods attempt to minimize risk by developingsoft-ware in short tinderboxes, called terations,with typical length be-tween one and four weeks.Every iteration is like a software project of its own, and includesall of the task s necessary o.reletse the mini-increment of newfunctionality: planning, requirements analysis, design, coding,testing, and documentation.While iteration may not add enough functionality to releasettheproduct, an agile software methodologyhas as scope o be capa-ble of releasingnew s oftware at the end of every iteration,At the end of each iteration, the team revaluates' project priori-t ies.

    27

  • 8/6/2019 Bsc Thesis Part II

    9/50

  • 8/6/2019 Bsc Thesis Part II

    10/50

    -

    This methodologyalso emphasizes o team work. Managers/cus-tomers, and developers re all part of a team with ultimate goal todelivering quality software. XP implem ents a simple but eff icientway to empower he groupwarestyle of develo pment,

    Th e XP methodology eference o the followingprinciples:Feedback s most useful i f i t is done rapidly. In Extreme Pro-gramming, contact with customersoccurs very often, in small it-qrations. Th e customer ha s clear insight into the system that isbeing developed so he can provide feedback and contribute thedevelopment s needed.Unit tests also contribute to the rapid feedback principle, Whenwriting code, the unit test providesdirect feedbackas to how thesystem reacts o the changesone has made.Assuming impl ic i ty s about reat ingevery problemas i f i t can besolved "extremely simply" while at the same time XP rejects theidea of interface for "Future extension" and code reusability aspr ior i t ise impl ic i ty s more mportant.Extreme Programming suggests that performing large scalechangesal l at once includes high possibi l i ty f fa i lure. nsteadExtreme Programming has introduced th e idea of incrementalchange; that consists on providing many l i t t le steps in softwaredevelopment procedure n*order to help the customer to achievemore control over the developmentprocessand the system that isbeingdeveloped.The pr incipleof embracingchange s not about working againstchanges but embracing hem therefore helping the developers nprepar ing or the inclusion f new customerneedsand demandsdur ing he next i terat ionphase.

    30

  • 8/6/2019 Bsc Thesis Part II

    11/50

    The PrototypangMethodologyTh eprototypingmethodologyuggesthat: " users anpoint ofeatures hey don't like about an existing system (or indicatewhena feature s missing)moreeasily han they can describewhat they think they would like in an imaginarysystem"(Jenkins, 985).Rather han force he user o try to understand nd sometimeguess he many majorand minordetails f an Information ys -te m presentedn form of document pecification,he developerpresentshe userwith a series f roughapproximationsproto-Wpes)of the candidate omputer ystem.The prototypes a workingmodelof the system,often incom-plete.The S developernitiallymeetswith the user in order ogatherenough nformationo builda "rough" nitialsystempro-totype,whichhe then presentso the user o examine nd in-teractwith t, in order o provideeedbackomments.From his tangibleapproximationf the system, h e user hasimprovedchancesboth to clarify system requirements, nd toexpress hose reguirements o the developer,The developerthen takes account of the newly expressed equirementsandproduces'a ew prototype, hich s againpresentedo the userfor comments,

    This continues terativeprocesss repeated, nti l there are nonew requests ro m customer.We can say that the four majorphasesof the SystemLifeCyclemethodology--analysis,esign,

    3 1

  • 8/6/2019 Bsc Thesis Part II

    12/50

    The PrototypingMethodologyThe prototypingmethodology uggest hat: " userscan point ofeatures hey don't like about an existingsystem (or indicatewhena feature s missing)moreeasily han they can describewhat they think they would like in an imaginarysystem"(Jenkins, 985).Rather han force he user o try to understand nd sometimeguess he manymajorand minordetails f an Information ys -te m presentedn form of document pecification,he developerpresentshe userwith a series f roughapproximationsproto-types)of the candidate omputer ystem.The prototypes a workingmodelof the system,often incom-plete.The S developernitiallymeetswith the user in order ogatherenough nformationo builda "rough" nitialsystempro-totype,whichhe then presentso the user o examine nd in-teractwith t, in order o provideeedbackomments.From this tangibleapproximation f the system, the user hasimprovedchancesboth to clarify system requirements, nd toexpress hose requirements o the developer.The developerthen takes account of the newly expressed equirementsandproduces new prototype,which s againpresentedo the userfo r comments,

    This continues terative process s repeated,until there are nonew requests ro m customer,We can say that the four majorphasesof the SystemLife Cyclemethodology--analysis,esign,

    3 1

  • 8/6/2019 Bsc Thesis Part II

    13/50

    programdevelopment, nd implementation--areombined nt oonephase, epeatedn each teration Parker,1983)Softwareengineerings a complexprocesseshat incorporatenumberof activitiesn someSDLCmethodologieshesehave a se-quential rder n othersmaybenot.

    G e n e r a l C h a r a c t e r i s t i c s o f S D L CCIndependentlyo what methodology developmenteam is goingto follow hereare generalphasesn SoftwareDevelopmentifecycle(Alexandrou,006).Thesegeneral stepsare:

    Requirementsnalysis.I f there is an existingsystem, ts deficienciesre identified.This s possible y interviewing sersanddiscusses ith applica-

    . tion'ssuppoftpersonnel..The system equirements re defined. The importantpointat'a " ' this stage s to take into account ny deficienciesn the existingsystem, f there s any, with specific roposalsor improvement.

    Specification?.. Software s preciselydescribedn a mathematically igorous

    way. Specificationsre most mportant or external nterfaceshatmust emain table.

    Software architedure

    32

  • 8/6/2019 Bsc Thesis Part II

    14/50

    program development,and implementation--are ombined ntoonephase, epeatedn each terationParker, 983)Softwareengineerings a complexprocesseshat incorporatenumberof activities n someSDLCmethodologieshesehave a se-quential rder n othersmaybenot.

    G e n e r a l C h a r a c t e r i s t i c s o f S D L Ct.Independentlyo what methodology developmenteam is goingto follow here are generalphasesn SoftwareDevelopmentifecycle(Alexandrou,006).Thesegeneral stepsare:

    Requirementsnalysis.I f there is an existingsystem, ts deficiencies re identified.This s possible y interviewing sersand discusses ith applica-tion'ssupportpersonnel..The system equirements re defined. The importantpointatthis stage s to take into account ny deficienciesn the existingsystem, f there s any, with specific roposalsor improvement.

    Specification?. Software s preciselydescribed n a mathematically igorous

    way. Specificationsre most mportant or external nterfaceshatmust emain table.

    Softwarearchitecture

    32

  • 8/6/2019 Bsc Thesis Part II

    15/50

    o A candidateystems designed. lans re created nd ncludethe hardware,operating ystems,programming, nd security s-sues.

    . The new system s developed.Coding

    Testing. Usersof the systemmust be trainedtests must be carriedout. If necessary,takeplace.

    Documentation

    in its use. Performancenew adjustmentsmust

    o Documentinghe internal esign f softwareor the purpose ffuture maintenance nd enhancement. ocumentations mostimportant or externalnterfaces,

    Maintenance. The system becomes perational ither by replacing t oncethe old system r by gradually eplacinghe ol dsystemwith henewone.

    . Once he new system s up andrunning or a periodof time, itshould e evaluatedn details.Maintenanceustbe keptup at alltimes. The usersshouldbe keptup-to-date oncerninghe latestmodifications/chancesnd he newprocedureshat maybeare in-troduced.

    33

  • 8/6/2019 Bsc Thesis Part II

    16/50

    R e q u i r e m e n t G a t h e r i n g a n d P r i o r i t t z a 't i o n

    S o f t w a r e r e q u i r e m e n t s a n a l Y s i sSoftware requirementsanalysis s the activity of extracting, analyzing,andrecording requirements or Information Systems' Sometimes s overlap-ping with general system requirementsbut as a paft of Software Devel-opment Life Cycle has its own specificcharacteristics Barrett, 7997)'

    In a typical software development project there is( a trained software practitioner called the Requirements Analyst(RA) that has as main area or responsibilityo communicatewiththe user n order o understandwhat the requirements re .Most of the time cl ientshave a general dea about what they wantfrom the system to do but are RequirementsAnalysts ob to de-fine in detailswhat the real customerneeds s.The next task, after the client's idea about the system has deter-mined in details, he requirementanalysis ea m has to determine

    .whether or not the candidate ystem s:

    . Feasible

    . Schedulabler Affordable. Legal. Ethical

    34

  • 8/6/2019 Bsc Thesis Part II

    17/50

    - + _ _ _ - : - ? - i * - ,

    In the rushof enthusiasm ssociated ith a new proje@ here salwaysa temptauon o downplay he irnportane of requirementsanalysis.However, tudiesof previousprojects eveal hat costsar'rdechnq'salbks can be reduced hroughrigorousand thoroughup-frontreqtrirernents engrir,teenirg.Tlp Requirer4ent natysf phase s diytdedon the fqllgwingsub-phases Barrett, L997). Reguirempntsathering,r Reguirementsnalysis,. Reguirementspecificationo Requirementserification.

    35

  • 8/6/2019 Bsc Thesis Part II

    18/50

    R e q u.i r,me fr,t*..,Gri lS,hr Iq g. . ' , ' . . . .

    Rquitnrent gnth*ingi$ an ift.pon@,nt'sub'-phasef. Reryirer: ' i - - - j _ ' ^analysis.at lhfs sta99 he develofnenttear,ril,must$igab, r

    and deflne the Client needs. Oncethe client's requirements" ' \ " ( . ; , .bee* idenUfiba;'tfr'e ystern'designeis re then. n'a positionsisn solutign-(urqEptg19e9).,A formalGqulrenr,e.nti trering .o"*." ins{ode}he fun(steps Table2): ' :

    I Current Defects EvaluatlonPlan2 "Prior Relea6$" ProblemRevlew Plan3 Revlewof Bdsting Product Project) MalntenancePlans4 Rsrb$, Prcliminary Softwarc Ardlitectural Overview, " PrliminaryRequirements atheringPhaseExitCrlteria

    l

    Table 1 RequirementGathrlngSteps

    36

  • 8/6/2019 Bsc Thesis Part II

    19/50

    P r o b l e m s & D i f f i c u l t i e s

    During the RequirementsGathering phase a number ofStakeholder ssues may rise, Some of them are resul ts ofclient's organizationbehavior and some others get groundon human nature (for instance some people tend to beoveroptimistic) Tsagatakis, 005).The aforementioned difficultie scan be categorized asstakeholder ssues,engineeringaissues,nd general ssues.In category hat referees o general problemswe can clas-sify the fol lowingsituations:

    'The right people with adequate experience, echnical exper-t ise, and languageski l ls may n ot be avai lableei ther be-cause organisationstructure doesn't incl ude such person-nel or because management an d other factors preventthem from communicating with Software developmentteam. In that ca se the RequirementAcquisi t ionpeoplemust "reinvent he wheel" in order to cover the gap andsometime make assumption not always correct) about theexisting system and tb.e needs of the new can didate sys-tem.

    .The ini t ia lspeculat ions bout what the needsare most of thetime don't c over all the aspectssatisfactory, t may be'in-complete, or opt imist ic assumptionsabout the nature(time, user acceptance, ntegratione.t.c) of the project istaken in account.

    .The need of well trained requirements acquisit ionpersonneland knowledgeengineer n combinat ionwith the di f f icul ty

    37

  • 8/6/2019 Bsc Thesis Part II

    20/50

    of using the complex tools and div erse methods linked torequirements gathering processmay dishearten he hopefor benefitsof a completeand detailedapproach.

    In addition to (McConnell,2004) preceding situations wehave to take into account the ways that users can affect therequirements atheringprocess:

    Jsome users maybe are not in position o understand whatthey really want

    .Some usersmaybeare unwilling o commit to a set of writtenrequirements(in order to feel safe in case of future unde-sired situations)

    oSome users maybe ar e not in position o express n a suitablean d understandable ay what there reallyneedsare.r Some users may in troducenew requirements fter the costand the time schedule ave been inalize d.. Communicationwith users s slow and that has as result. Usersoften do not participate n reviewsor they don't havethe appropriatebackground o d o that.oUsersdon't understand hc development rocess.This om-monly leads o the situationwhere user requirements eepchangingeven when system or productdevelopment as beenstafted.

    But not onlyusers re responsibleor project elays nd/or n-adequatenformation ystems, ometimes ngineersnd de -velopersmaybecauseproblematicituations uring require-

    38

  • 8/6/2019 Bsc Thesis Part II

    21/50

    rnents analysis2005):

    processt(Wikipedia,006b) (Tsagatakis,

    rTechnical ersonnelndendusers ftenhavedifferent o-cabularies nd codeof intercommunieations.hat sometimeshasas result hat whileboth believehey are n perfectunder-standing, ut when the product s finished nd becorrc angi-ble he discoverhat theydidn't cEer all the necessary s-pects..In business ystems omain, he duty o bridge hat gap softenassignedo Business nalysts,His ole s to analyze nddocumenthe business rocessesf business nits hatwill beaffected y the candidatenformation ystem. n parallelBusi-nessSystemsAnalysts, nalyze nd document he proposedbusiness olution rom a systemsperspective.hisparallelsituation om,etimesauses onfusion nd ncorrect ssump-tions,. Engineersnddevelopers ften ry to refine the require-ments n order o fit to'an existing ystemor model,whileamore clearsolution ike hE'aeielopment a systemspecificothe needs f the client.rAnalysissoftencarried ut by engineersr programmers,erather han knowledgengineers hohave the appropriatecommunicationkills nd sufficient omain nowledgeo un-derstandhe client's eeds roperly.

    M a i n t e c h n i q u e s o f I n f o r m a t i o n G a t h e r i n gThe introductionof a new InformationSystem s very likely tochange he environment nd the re lationshipsetweenpeople,

  • 8/6/2019 Bsc Thesis Part II

    22/50

    rnents analysis2005):

    processt(Wikipedia,006b) (Tsagatakis,

    .Technical ersonnelndendusers ftenhavedifferent o-cabulariesnd codeof intercommunications.hat sometimeshasas result hat whileboth beli.evehey are n perfectunder-standing, ut when the product s finished nd becorrc angi-ble he discoverhat theydidn't cEer all the necessary s-pects..In business ystems omain, he duty o bridge hat gap softenassignedo Business nalysts.His ole s to analyze nddocumenthe businessrocessesf business nits hat will beaffected y the candidatenformation ystem. n parallelBusi-nessSystemsAnalysts, nalyze nd documenthe proposedbusiness olution rom a systemsperspective.hisparallelsituation smetimes auses o,nfu,sionnd ncorrect ssump-tions.. Engineersnddevelopers ften ry to refine the require-ments n order o fit to'an existing ystemor model,whileamoreclearsolution ike tE'deielopment a systemspecificothe needs f the client..Analysissoftencarried ut by engineersr programmers,crather han knowledgengineers hohave the appropriatecommunicationkills nd sufficient omain nowledgeo un-derstand he client'sneedsproperly.

    M a i n t e c h n i q u e s o f I n f o r m a t i o n G a t h e r i n gThe introductionof a new InformationSystem s very likely tochange he environment nd the relationshipsetweenpeople,

    39

  • 8/6/2019 Bsc Thesis Part II

    23/50

    thus it is impoftant o identifyall the stakeholders,ake into ac-countall heirneeds ndensure hey understandhe nference fthe newsystems.To happenedhat we needa structured rocedurehat will help okeep he requirement iscussionsetween evelopmenteam andclientwell organized nd efficient.

    rKnowledge ngineers nd systemsAnalystscan employseveraltechniqueso get the requirementsrom the customer Dr Vru-sias. B, 2005) this includes nteruiews, uestioners,ecording,groupworkshopsknownas requirements orkshops) nd whishlists.More modem techniques nclude Prototyping,and use cases.Wherenecessary,he analystwill employa combination f thesemethods o establishhe exact equirements f the clientso thata system hat meets he business eeds s produced.

    40

  • 8/6/2019 Bsc Thesis Part II

    24/50

    :CHAPTER 3.'SO FTV/ARB REQU REMENTS .SPE,CIF'IA-TION

    I n t r o d ' u c t i o n

    Thispaft of the FinalYearprojecthas as scope o providea fullidea about he Softwareand system equirements s they havebeen capturedby the system developer, he structureof thedocument nd the basicelements 'r ebasedon IEEE830-1998IEEE

    I d e n t i f i c a t i o n .This SRS (SoftwareRequirernents pecification)efers to aweb Infor:nration etrieval system, Current version of thiss o f t w a r e i s # 1 ( o n e ) .Th e purpose f this chapters tb describen detail he opera-tion of the Web InformationRetrieval oftwareproject. In anormalSRSpaper he first sectionof this document houldprovide document verview,he appropriate efinitions ndreferencesor the rest.

    Bu t due o the natureof the projecta Final ear projectandthe needof definitions nd bibliographyn otherpartsof thisdocument as decidedo skipdefinitions nd reference t Sestage n order o prevent edundantmaterialromappearing.So n the first partof this chapterwe will provide documentreview or consistencyo IEEE830-1998EEE

    4 I

  • 8/6/2019 Bsc Thesis Part II

    25/50

    In the secondsectionwill give detailsabout he major objec-tive of the softwareunder he questionanda fictionalaccountof its use. It will also specifysome constraints nd data re-quirements.In the last sectionswe givea more detaileddescription boutthe technicalaspectsof this projectsuch as user limitationsand echnical equirementso use he product.

    S y s t e m o v e r v i e w .

    The proposed ystem s running nto parts, he first part thatconsistshe main application eceivesqueries rom usersei -ther by command ine or web nterface.The query is passed o Google earchengine rom which hesystemreceivea resuft a list of URLs maximum#50) thataccording o Googleare correlated o user nitial query.Theinitialquery s stored n a map.Thensystemcrawleach of this URLS nd producewo Hash-Map ype object; one hat contains he total of terms occurredin all docum ent hat havebeen crawledand a secondonethat containscurrent document erm index and in what fre-quency his term occurred n the text.During he map creationphase losed lasswords VanPettenC, 1991)are removedwhile he remainingerm pass hrougha stemmerhat mplementshe PorterSteamer lgorithm(C.J.van Rijsbergen, 98dJ.After we have inishedwith the crawlingof all URLswe endupwith 50 Hash Map objects one for each documentand onelargeHashMapwith all the words hat we havemet during he

    42

  • 8/6/2019 Bsc Thesis Part II

    26/50

    URLs rawling. Using LSAand Euclidian is tancewe producearelevance o originalquery ist.

    D e f i n i t i o n s , A c r o n y m s , a n d A b b r e v i a -t i o n s

    Subchapter omitted in order to prevent redundancy to thegeneralGlossary ectionof FinalYear Project eport.This part can be found in the appendices

    R e f e r e n c e

    Subchapter omitted in order to prevent redundancy to thegeneral Bibliography ectionof the FinalYear Project eport.This part can be found in the appendices

    G e n e r a l D e s c r i p t i o n

    U s e r P e r s o n a s a n d C h a r a c t e r i s t i c sAll users that this system is targeted to be u5ed ar e people withaverage computer iterature ha t have used befor e a search enginel ikeGoogle r yahoo.

    P r o d u c t P e r s p e c t i v eThis software requiresa graphicalweb browser such as InternetExplorer (T ) version 6 or MozillaFirefox version 1 .5.0.1. Also a

    43

  • 8/6/2019 Bsc Thesis Part II

    27/50

    G e n e r a l C o n s t r a i n t s , A s s u m p t i o n s , D e -p e n d e n c i e s , G u i d e l i n e s

    Our user run this software n computer hat is connect o theinternet with a connectionwith at least 256Kbits/secdown-stream capacity, he OS that facilitates his software s one ofWindowsXP, Linux Fedora ore 4, Suse 10.Java 1.5 is installed and bot h mysql-connec tor-java(http://www.mysql.com) version 3.t.t2 and htmlunit version

    { 1. 8 (htto://www.GargovleSoftware.com/),Also ApacheJakarta s running at this system and is l isteningon 8080 TCPport.We assume hat th e user demandsspecif icallyweb-basedap-plicationand his computer s equippedwith softwareable o in-terpret HTML.

    User View of Product Use.The complete vision of the project s a search enginesimilarto Google. The key factors are ret urn results relevance an dresources n the form of time needed in order to ge t an an-swer. The goal is to provide user with a init ial adequate an-swer to his/her query with low t ime wait and at least nsideusers' olerance imits(Bhatti).The help menu wi l l be accessible ia a help con hat appearson the first pageof eachquery

    .In the first screen, he user insefts he query he wants to make.The more common the word the more results will be returned as

    46

  • 8/6/2019 Bsc Thesis Part II

    28/50

    consequencehe,longer he results etchprocesswill last. Usercanperformhis/herqueryusingcommandine or web nterface,3, 0SpecificRequirements

    E x t e r n a l I n t e r f a c e R e q u i r e m e n t s

    The program equiresa PCwith at least a Pentium4/Celeronrunningat 2GHzor Athlon/Athlon4 runningat 2600+ PRprocessor.Operation ystem must be one of the above:o WindowXp rjr Windows 003serveri Fedora inux4. SuseLinux10r Solaris or later. MacOS 0.1 or laterAll system must be equipped with at least 512 MB ofDDMM/333MHZand a rnonitorcapable or 800x600screenresolution ith a min imum f 16 million olors, 00MB f harddiskspace.The tlme for a result o return is depending n networkcon-nectionand system Ou:!1"0:Networkconnections necessaryor this applicatlono func-tion, AlsoGoogle it ernustbe upand running n order o getthe AppropriateURLS. ?The product .equires web browsercompatiblewith HTML .The base r.equirementfor he b{owser.would e Internet Ex-plorer6.0 andaboveand Mozil la lrefox .5.0andabove.

    47

  • 8/6/2019 Bsc Thesis Part II

    29/50

    D e t a i l e d D e s c r i p t i o nq u i r e m e n t s

    Introduction Page

    o f F u n c t i o n a l R e -

    lPurpose The introduction age prompt user to enter hisi ' j r y li ,q'"ry Iilnputs rurousenO vUo*anputr --i[e.""."Jinl Displaynstruct-ionsor the s"*.h ""gr" "r "il"- |I user o nsert query. It ( lP:::to l ---:: ----- - I

    Waiting Screen

    Result Screeni - - - - - ljPurpose iPresento user h e answerso his/her ueryi;p"t" irlo --- -li ' l- ,P'roJessins itllA - - *il ' | r /A IiOutputs iTheUser'sQuery esultspage i-*-____._.-,1

    Help PageiP*pose-*E[ttr" i"trodu&i*p"g", tre;serA" .t[k ;" h"tplt l button to receive help about how to use the applica-

    Ition.ffiilt" *- irh; ffi r ;imptyticfi ; thehoilA ;'.crrt -,,--_.'.----'_-1lProcessing heprocessings done hrough ointan dclickactions

    48

    iPromptuser o wait unti l he results etrievedI

  • 8/6/2019 Bsc Thesis Part II

    30/50

    Ftp"t"]fiom he user hrough fi b.",#.ifhe helnpagewithhelp nformationAfter24Hours age

    Information PagelPurpose lContractnformationIllnnuts lrurnlP';;ffi; [i^loutnuts lAuthor& supcrvisor ommittee IIjProcessingI lrurn iI

    P e r f o r m a n c e R e q u i r e m e n t sThe Applicationwill be loaded ocallyand accessed ia a we bbrowser. t operates n single-usermode only. System s highlydependedon Googleweb site availabilit y nd time response.As for the response ime of the user in depend on the speedof the n etwork connectionand computer'sCPU.

    ,---l IIII

    fu"prt,ri"" rn*r* .|rcr.o Sr"a .ppiopiiale"th"aiI iofsearchroma dropdownmenuandsubmitbutton.

    49

  • 8/6/2019 Bsc Thesis Part II

    31/50

    As for the nu.mberof files and fi,lesizes,,,tfierewill be two filesper URL one content tle,and oneserializedHashMap).Da-tabase s alsorun locallyand thus SeL queryperformanceisdependingon System memory that is availabteand cup.speed.

    Q u a t i t y A t t r i b u t e s. The generated esultsshoutdcorrespondGo elativedocuments' y99n average reGision ustbe relatively lgh. ::3

    O t h e r R e q u i r e m e n t sNONE

    jI

    50

  • 8/6/2019 Bsc Thesis Part II

    32/50

    CHAPTER4. SYSTEMDESIGN

    M e t h o d o l o g y C h o s e n

    Finalyear project is often the first large scale project curried outby a s tudent in undergraduate evel, hus the methodology hatwill be followedduring he project developmenton the one handmust provide enough flexibility to $ftware developer n order toovercome design mistake and inefficient oadmaps hat maybecaused by dev eloper ow level of experi enceand on the otherhand to provide the developerwith stable ground to continuewith the rest of system development nd documentation.

    This requirementsar e easilycovered f the methodologygive heappropriate ools that will permit the segmentationof the underdevelopment project in small semi-autonomous egments andstep by step crystallize of project aspects.

    Th e above requirements eem to be coveredby the ag ile devel-opment methodology, but i f we incorporate he fact that thespecificproject has a kindi.f researchproject nature it seems tobe safer hat the prototypingmethodology o be chosen.

    As t hasbeendiscussedn SDLCmethodologyhapterhe pro-totype s not a paper pecificationf the system, ut a workingmodel f the system, lbeit ften ncomplete,

    5 1

  • 8/6/2019 Bsc Thesis Part II

    33/50

  • 8/6/2019 Bsc Thesis Part II

    34/50

    second one that contains current document term indexan d in what frequency his term occurred n the text.During the ma p creation phase closed class words (VanPetten C, 1991) are remo ved while the remaining termpass through a stemmer that implements the PorterSteamer algorithm(C.J. an Rijsbergen,1980).After we have finishedwith the crawlingof all URLswe en dup with 50 Hash Map objects,( ne for each document andon e large Hash Map with all the words that we have metduring he URLs rawling.EachURL s representedby a nx1 dimensions ector wheren i s th e number of terms that lives n each document.At this stage system we combine he La rge Hash Ma p an deach URLs individual Hash Map in order to produce onelarge 2D array with the all terms hash map values as rowsand visit ed URLsas columns.Then we decompo se his large 2D array using the SingularValue Decomposition. he ne xt step is to use Latent Seman-t ic Analysis echnique nd Eucl id ian istance o classi fy herelevance f eachdocument o the originalUserQuery.The EuclidianDistance f two vectors P= (p' pu,F* ) and*Q=(Q", 9y, 9*), is defined by the formula Edistance(P,Q)=^l(n,-o ' +(P - Q ' +(P* q*)'

    Th e user can access he relativi ty is t with a web interfaceorvia the standardoutput.

    53

  • 8/6/2019 Bsc Thesis Part II

    35/50

    The other part of this applicationmplements ome charac-teristicsof an agent; this agent-likepaft is initiatedvia atime scheduler nd has as scope o rework he previous24hours user queries,but now by gettingextra results romyahoo.com.This part is launcheddaily 5GTMsince after several estswere run it was found o be he best time in term of lowernetwork congestions oth in Europeand the majority ofUSA(pleaseefer to Appendix I with the Greek Networks

    LT Weathermap).

    P r o j e c t d e v e l o p m e n t p r o c e s sIn order to accomplishhe tasks of this projct he develop-ment process ad been segment o discretephaseswhile hesofturaredevelopment,since ava is a language hat promotereusabilityand Object Orientationshad been developed nmodulesogic.

    . Phasel.The Initial task was to determine he idea hat theprojectshouldbe served. n the beginninghere wasathoughtaboutcreating n intelligent earch ngineus-in gAI .

    qr-2.But after some discussion ith the projectsupervi-sor a decision o incorporate ome researchaboutAgentswas taken. Additionallyhere was an agree-ment o implement omeof the Agents haracteristics,if the time and resource asantiquated.

    54

  • 8/6/2019 Bsc Thesis Part II

    36/50

  • 8/6/2019 Bsc Thesis Part II

    37/50

    t

    4. The next modulewas he softwarepart hatwouldcount he occurrencef each erm ineverydocument nd he interconnection ith thetask6.5. Again o me estingwas akenplace. es tandF tx .6. Incorporate I techniquesn order o test therelevance f each etrievedURL o originalques-tion and nterconnecthe new softwarewith soft-ware rom ask8 and6.7. UserEvaluation8. TestandFix.

    1 , System Front-End. How userwould interactwith the core system. For safety reasons(sinceknowledge n graphical ui was lim-ited) both the consolemode and web inter-face methodshave been employed.UserEvaluationTest and Fix.Introduce Agent characteristics.AutonomousTest and Fix.

    ?

    1. Totalsystem esting2, FinalSystemEvaluationrom User3, Producehe finaldocumentation

    Phase V

    PhaseV

    2 .3 .4.5 .

    56

  • 8/6/2019 Bsc Thesis Part II

    38/50

    BibliographicalResearch on how the state-of-the-af tweb searchengineswork (Google& msn)BibliographicalResearchon how design and impl ementan soft-agentDesignand mplement n agent (crawler)or collectingthe web-dataDesign he storage databaseDecide what AI method are applicable o our domai nproblemDesign ndproduce n output nterface {Evaluate sability f the interface singusers eedbad