news aggregator

of 139/139
News Aggregator News Aggregator A news aggregator refers to a system including software application, webpage or service that collects syndicated content using RSS and other XML feeds from weblogs and mainstream media sites. Aggregators improve upon the time and effort needed to regularly check websites of interest for updates, creating a unique information space or "personal newspaper." An aggregator is able to subscribe to a feed, check for new content at user-determined intervals, and retrieve the content. The content is sometimes described as being "pulled" to the subscriber, as opposed to "pushed" with email or other channels. Unlike recipients of some "pushed" information, the aggregator user can easily unsubscribe from a feed. Software which allows syndicated news content (such as RSS feeds) to be brought together and displayed.

Post on 08-Jan-2016

124 views

Category:

Documents

9 download

Embed Size (px)

DESCRIPTION

News Aggregator. - PowerPoint PPT Presentation

TRANSCRIPT

  • News AggregatorA news aggregator refers to a system including software application, webpage or service that collects syndicated content using RSS and other XML feeds from weblogs and mainstream media sites. Aggregators improve upon the time and effort needed to regularly check websites of interest for updates, creating a unique information space or "personal newspaper." An aggregator is able to subscribe to a feed, check for new content at user-determined intervals, and retrieve the content. The content is sometimes described as being "pulled" to the subscriber, as opposed to "pushed" with email or other channels. Unlike recipients of some "pushed" information, the aggregator user can easily unsubscribe from a feed.Software which allows syndicated news content (such as RSS feeds) to be brought together and displayed.

  • Introduction of PythonFor Engr 101 5-Week News Aggregator Module Fall 2010Instructor: Tao Wang

  • What is a computer?

  • Computer Organization

  • Software / ProgramsComputer Programs instruct the CPU which operations it should perform/executeProgrammers write the code for the tasks a computer program performsBut computers only understand binary (1s and 0s)Programmers need a language to write computer code

  • Types of Programming LanguagesLow-Level LanguagesMachine LanguagesCPU instructions are binary strings of 1s and 0s [10010000]Each kind of CPU has different instruction setsPrograms are not portable across CPUs architecturesDifficult for programmers to read/writeAssembly LanguagesUse English-like abbreviations to represent CPU instructionsA bit easier to understand [MOV AL, 42h]Converted to machine language using an assemblerStill not portable

  • Types of Programming LanguagesHigh-Level LanguagesC/C++, Java/C#, Python, Ruby, many more...These languages abstract hardware implementation detailsProvides programmers a logical computer modelAllows programmer to focus on solving problems instead of low-level hardware detailsUse English-like keywords and statements to write codeUse a compiler or interpreter that translates code to machine languageMakes code portable across different CPUs and platformsProgrammer does not have to learn each CPUs instructions

  • Compiled vs. InterpretedBoth Compilers and Interpreters translate source code to machine language

    CompiledMust compile program on each target CPU architecture prior to execution.

    InterpretedCode can be directly run on any platform that has an available interpreter

  • About PythonDesigned by Guido van Rossum in late 1980sInterpreted programming languageImperative, Dynamic, and Object-OrientedPython Programsare a sequence of instructions written in Python languageinterpreter executes the instructions sequentially, in orderprograms can take inputs and send data to outputsprograms can process and manipulate dataprograms may read and write data to RAM, Hard Drive, ...

    First Programprint Welcome to learning Python.

  • PYTHON: LETS BEGINThe Python Shell (2.6)You type commandsIt does stuff

    It converts python to machines instructions and runs them right now

  • Python as Calculator>> 2 + 2 # add4>> 2 * 3 # muliply6>> 3**2 # powers9>> 12 % 11 #modulo (remainder)1

  • DivisionInteger division1/4 #Integer division, no decimals0Float division1.0/4.0 #float number division0.25

  • LITERALS, VARIABLES, DATA TYPES,STATEMENTS AND EXPRESSIONS

  • Literals, Data TypesNumbersIntegers are natural numbers: ..., -2, -1, 0, 1, 2, ... (32 bits)Floats contain decimals: 4.5, 67.444443335, 7E2...Booleans: True, FalseLong ints that exceed 32 bit capacityComplex numbers: 4.5, 1j * 1j = -1 + 0jStringsStrings are used to represent words, text, and charactersexamples (can use single, double or triple quotes):I am learning python.'hello.'

  • VariablesLiterals are data values that our program useData is stored in memory, and we need a way to access itWe use variables to name our data in memoryWe can retrieve data by calling the name that refers to itstudent_name = Benprint student_name= is the assignment operator, assigns a data value to a variable

  • Variables Syntax RulesVariable names must begin with a letter (uppercase or lowercase) or underscore (_)* good programming convention: variables should not start with uppercase letters, commonly used for something elseremainder of name can contain letters, (_), and numbersnames may not contain spaces or special charactersnames are case sensitive and must not be a reserved python keywordmyVariable and myvariable refer to different data

  • Statements and ExpressionsStatement perform a task; do not return a valuex = 2y = 3print yExpression return a value>> x+ y5

  • Expressions (evaluate to values)Math expressions>> 10 * 2 + 3>> 10 * (2.0 + 3)Boolean expressions>> 10 < 2 # False>> 10 >=10 # TrueCombined with logic operators>> (10> (a*c+d) > (d*a-c)

  • Expressions (evaluate to values)String expressions>> hel + lo # hello>> Hi * 3 # HiHiHiOperator PrecedenceParenthesesExponentiationMultiplication and DivisionAddition and Subtraction

  • Operator Precedence (top-to-bottom)

  • Data TypesFinding out a data type

  • Data TypesWhat if data types dont match?

    STRONG TYPES no automatic conversion (for non number types)

  • Data TypesExplicit conversion

  • PYTHON KEYWORDS, USER INPUT

  • Python KeywordsRESERVED: do not use as variable names

  • User InputCreate interactive programs by requesting user input

  • CONTROL STRUCTURES

  • Branching / Conditional StatementsDecision making

  • if - statementif a < 0:print a is negative

  • if - else

  • if - elif - elseIf one test fails, perform next test

  • Nested if statements

  • MODULES

  • ModulesPython Strength: large collection of open source modulesModules are collections (packages) of useful (tested) code you can reuseCommon modules: random, mathThe modules we use for the project:urllib, xml

  • ModulesPython Standard Library (packages included with most python distributions)http://docs.python.org/library/index.htmlPyPI (Python Package Index)http://pypi.python.org/pypirepository of optional modules available (11,000+)

  • Using ModulesMath module contains many useful functions and values: math.sin, math.cos, math.pow, math.sqrt, ...Using modules:

  • Getting helpIn python interpreter you can get documentation

  • CONTROL STRUCTURES:REPETITION, ITERATION

  • Repetitionselection statements (if-elif-else) let us make simple decisions

    repeating statements and decisions let us build more complex programs

  • while

  • Testing Primeness

  • break statementbreak immediately ends loopsDO NOT overuse; Can be difficult to read/understand logic

  • Testing Primeness

  • range(...)Built in function; produces a sequence (list)range(0, 3) [0, 1, 2]range(3) [0, 1, 2]range(1, 3) [1, 2]range(1,7,2) [1, 3, 5]

  • forThe for loop is used for iteration

  • continue statement

    break and continue work in both while and for loops

  • Find all primes

  • while - else

  • Nesting Control Structures

  • Counter-Controlled Loops

  • Sentinel-Controlled Loops

  • Accumulating

  • Swappingx = 2y = 3Swap (WRONG)x = y y = xx = 3y = 3Swap (CORRECT)z = xx = yy = zx = 3y = 2

  • Multiple AssignmentsaInt, bInt, cInt = 1, 2, 3Swapping with multiple assignmentaInt, bInt = bInt, aIntWhy does this work? (entire right side is evaluated before assignments)

  • Everything is an object

  • DEBUGGING

  • DebuggingSyntax Errors: Python gives us an alert; code crashesRuntime Errors: How do we fix incorrect results (logic) in our programs?We need to trace the codes execution flow.Tracing: Keep track of variable values as each line is executedPrint Statements: strategically add print to view results at each step; don't over do or it will be difficult to keep trackCan help us detect Infinite Loops

  • MORE DATA TYPES...(LISTS)

  • Collection TypesLists:Sequential and mutable>> k = [1,3, 5]>> m = [hel, 3]

    Tuples:Sequential and immutable>> (1,2,3)Dictionaries:map collection>> d={name: Alice, grade: 100}>> print d[name]>> Alice

    Sets:Has unique element>> aSet = set([a,b])

  • Lists (also called arrays)Lists are sequences of objectsMutable (unlike strings, and tuples)List are defined with square brackets [ ], elements are comma , separatedList elements can have different typesList indexing starts at 0If index is negative, begin at end of listIf index past last element ERROR

  • List accessIndexing and Slicing just like strings (same rules apply)

  • Working with listsCan convert other collections to lists

    List can contain mixed types, including other collections

  • Indexing lists of listsLists can be nested

  • List operators+ concatenates two lists (list1 + list2)* repeats the list a number of times (list1 * Integer)in tests membership

  • List comparisons>,
  • Collection functionslen(C) returns the length of the collection Cmin(C) returns the minimum element in C; only considers first element of list of listsmax(C) returns the maximum element in C; only considers first element of list of listssum(L) returns the sum of elements in list L; elements must all be numbers

  • Lists can changeLists can be modified, MUTABLE

    Strings cannot be changed, IMMUTABLE

  • List methodsThere are Non-modifying methods (don't change the list)index(x)count(x)and Modifying methods (Will change the list without need for assignment)append(x)extend(C)insert(i, x)remove(x)pop()sort()reverse()

  • Appending and Extendingappend(...) adds a single element to a listextend(...) joins two listscan also use '+'

  • List methodssort()count(x)pop()del keyword also removes an element

  • split() and join()Going from strings to lists and back againThese are string methodsjoin(C) takes as an argument any iterable collection type (such as lists, strings, tuples)

  • List Comprehension

  • STRINGS

  • Quote UseSingle Quotes These strings must fit on a single line of source Double Quotes Also has to fit on a single line of source Triple (single or double) Quotes""" These quotes are very useful when you need to span multiple lines. They are also often used for long code comments """

  • Quotes inside stringsTo use apostrophes" Let's use double quotes To use double quotes in our strings' They say, "use single quotes" Triple Quotes can take care of both cases""" With 3 quotes it's "easy" to use apostrophes & quotes. """''' With 3 quotes it's "easy" to use apostrophes & quotes. '''

  • Slash \We can use the \ to span multiple linesWorks with strings or expressionsNo character can follow the \

  • Character escapingSince some characters have special meanings, we have to escape them to use them in our strings"We can \"escape\" characters like this'Or let\'s escape them like this'and this \\ is how we get a backslash in our string'

  • WhitespaceThis is an empty string, not a characterThis is a spaceThis is a tab (a single character)This is a new line (in Unix/Mac OS X)This is a new line (in Windows)This is a new line (in old Mac
  • Strings are sequences

  • Simple string usageCan access with indexing like listsStrings do not have append(...) and extend(...) functions

  • Adding (+) and Repeating (*)We can add (concatenate) strings with +

    We can also repeat them with *

  • Compare stringsTest equality using ==

    What about , =

  • StringsStrings are sequences like listsEach element in a string is a characterCharacters we can print: letters ('a', 'b', 'c', ...) numbers ('1', '3', '4', ...) and symbols ('@', '$', '&', ...)Non printing charactersWhitespace: '\t', '\n', '\r\n'try printing this '\a'

  • Characters are really numbersASCII table

  • Character numerical values

  • Print the ABCsUsing numbers...

  • String ComparisonsCharacters are compared by their numerical valueshorter strings are smallerIf first characters are equal, compare the next one

  • String ComparisonsThese are characters, not numbers

    Capital letters are smaller (refer to ascii table)

  • Testing membership

  • import string

  • String is an ObjectObjects containDatax = 'Hello' # data is sequence of charactersActions (methods)things object can do (often on self)

  • Upper/Lower Case methodsThese methods are available to all string objectsStrings are IMMUTABLEthis means that characters in the sequence cannot changemethods return a new stringoriginal data is unchanged

  • What kind of characterThese methods are available to all string objectsTests that return boolean types:isalnum() - does the string contain only letters and digitsisalpha() - does the string contain only lettersisdigit() - does the string contain only digits

  • Formatting strings with strip(...)

  • Formatting strings

  • String Formatting

  • Formatting Floats

  • Creating Forms/Templates

  • Using replace

  • Output

  • find(...); rfind(...)

  • Go through all matches and capitalize

  • DICTIONARIES

  • DictionariesAnother collection type, but NOT a sequence (order doesn't matter)Also referred to as an associative array, or mapDictionaries are a collection of keys that point to values

  • Key --> Value

  • About DictionariesDefine dictionaries using curly braces {}key-value pairs are separated using colons :Dictionaries are MUTABLE (can add or remove entries)Keys:Can only be IMMUTABLE objects (int, float, string, tuples)Values:Can be anythingIdea: easier to find data based on a simple key, like the English Language Webster Dictionary

  • Indexing and AssignmentIndex using square brackets and keys; returns associated valueNumbered indices are not definedCan modify the dictionary by assigning new key-value pairs; or changing value a key points to

  • Dictionaries with Different Key TypesCannot index or search based on values, only through keysNumbers, Strings, Tuples can be keys (anything IMMUTABLE)

  • Operators[ ]: for indexing using key inside square bracketslen(): "length" is the number of key-value pairs in the dictionaryin: boolean test of membership (is key in dictionary?)for: iterates through keys in the dictionary

  • Operators in use

  • Dictionary Methodsitems(): returns all the key-value pairs as a list of tupleskeys(): returns a list of all the keysvalues(): returns a list of all the valuescopy(): returns a shallow copy of the dictionary

  • Methods

  • zip( ) - dict( )zip(): creates a list of tuples from two listsdict(): creates a dictionary from a mapping object, or an empty dictionary

  • FUNCTIONS

  • Why Use Functions?Functions provide encapsulation, making code better, readableDivide and Conquer Problem SolvingBreak large complicated problems into smaller sub-problemsSolution to sub-problems written in functionssub-problem solutions can be reused, sharedSimplification/ReadabilityRemoves duplicate code with function calls

  • Why Use Functions?AbstractionProvides a high-level interface to programYou know WHAT it does, not HOW it does itSecurityA small well defined piece of code is easier to prove correctCode can be fixed in one place, and will be corrected everywhere function is called

  • Function Definition

  • Function Calls

  • Functions that do things...no parametersno return statementaffect the environment

  • Functions that have parameters...definition has parameterscall takes argumentsno return statementaffect the environment

  • Functions that return results...return keywordFunction performs processingReturns value/object

  • Functions with default parameters...Parameters can have default values

    We can call this function with:print_message("Hello class.")print_message("Hello class.", 3)Can explicitly name arguments:print_message(times=3, msg="Hello class.")

  • Variable Scope (local variables)

  • Variable Scope (global variables)

  • INTRODUCTION OF NEWS AGGREGATOR

  • First Module - urllibUsage: import urllibThis module provides a high-level interface for fetching data across the World Wide Web. In particular, the urlopen() function is similar to the built-in function open(), but accepts Universal Resource Locators (URLs) instead of filenames. Some restrictions apply -- it can only open URLs for reading, and no seek operations are available

  • urlopen()urllib.urlopen(url)Example:import urllibf = urllib.open(http://www.python.org)text = f.read()

  • Second Module - xml.dom.minidomUsage from xml.dom.minidom import parse, parseStringis a light-weight implementation of the Document Object Model interface. It is intended to be simpler than the full DOM and also significantly smallerDOM applications typically start by parsing some XML into a DOM

  • RSS feed programdef get_latest_feed_items():return item_list def search_latest_feed_items(item_list, searchterm):Return filtered_item_listExample: search item descriptionFunction usagelatests = get_latest_feed_items()search = search_latest_feed_items(latests, "game")

  • Example of ModificationRetrieve latest feed item listitem_list = get_latest_feed_items()Define a search term, means allsearchterm = Obtain the filtered item listfiltered_item_list = search_latest_feed_items(item_list,searchterm)

  • Example of ModificationRemember, keys = tagnames in the XML!If you want to modify useful_keys, make sure you attach the "u".For example, if you want to add author, add u'author' to the listDefine your useful keysuseful_keys = [u'title', u'pubDate', u'description', u'guid']

  • Example of ModificationDisplay all items and keysfor item in filtered_item_list: for key in useful_keys: # print "%s: %s" %(key,item[key]) print key + " " + item[key] print " "

  • Some Modification Ideas (1)Read in an RSS feed and find MULTIPLE keywords (as many as the user wants),Return the corresponding articles. You may want to think about the readability of the results. Note that articles MAY be repeated if different keywords occur in their titles and/or description (hint: Useful keys).

  • Some Modification Ideas (II)Filter articles from an RSS feed based on multiple keywords.(hint: Nested loops, filtering by one keyword in each loop).

  • Some Modification Ideas (III)Count how many times certain interesting words appear in an RSS feed Plot Excel charts (bar, pie, or line graphs).

  • Some Modification Ideas (IV)Read an RSS feed and allow the user to specify how many news he/she wants to see at one time. You may want to display how the total number of news first, THEN ask the user how many news they want to see.

  • Some Modification Ideas (IV)The ability to take MULTIPLE RSS feeds, then go through them ALL and look for articles with a certain keyword. You can either give user a limit on maximum number of feeds, or allow as many feeds as user wants.Note: Probably the hardest. This one simulates a mini search engine / web crawler.

  • Your WorksSpecify rolesCome out some ideas or use those ideas but explain in your own wordsHow much progress you can makeTeam work, coordinate with each other (Project manager)Try to answer all listed questionPrepare your presentation and all other worksGrade is based on creativity and complexity as well as the role you performed

  • Discussion