news aggregator

Post on 08-Jan-2016

147 Views

Category:

Documents

9 Downloads

Preview:

Click to see full reader

DESCRIPTION

News Aggregator. - PowerPoint PPT Presentation

TRANSCRIPT

News AggregatorNews AggregatorA news aggregator refers to a system including

software application, webpage or service that collects syndicated content using RSS and other XML feeds from weblogs and mainstream media sites. Aggregators improve upon the time and effort needed to regularly check websites of interest for updates, creating a unique information space or "personal newspaper." An aggregator is able to subscribe to a feed, check for new content at user-determined intervals, and retrieve the content. The content is sometimes described as being "pulled" to the subscriber, as opposed to "pushed" with email or other channels. Unlike recipients of some "pushed" information, the aggregator user can easily unsubscribe from a feed.

Software which allows syndicated news content (such as RSS feeds) to be brought together and displayed.

Introduction of PythonIntroduction of Python

For Engr 101 5-Week News Aggregator Module Fall 2010Instructor: Tao Wang

What is a computer?What is a computer?

Computer OrganizationComputer Organization

Software / ProgramsSoftware / ProgramsComputer Programs instruct the

CPU which operations it should perform/execute

Programmers write the code for the tasks a computer program performs

But computers only understand binary (1’s and 0’s)

Programmers need a language to write computer code

Types of Programming Types of Programming LanguagesLanguagesLow-Level LanguagesMachine Languages

◦ CPU instructions are binary strings of 1’s and 0’s [10010000]

◦ Each kind of CPU has different instruction sets◦ Programs are not portable across CPU’s

architectures◦ Difficult for programmers to read/write

Assembly Languages◦ Use English-like abbreviations to represent

CPU instructions◦ A bit easier to understand [MOV AL, 42h]◦ Converted to machine language using an

assembler◦ Still not portable

Types of Programming Types of Programming LanguagesLanguagesHigh-Level LanguagesC/C++, Java/C#, Python, Ruby, many more...These languages abstract hardware

implementation detailsProvides programmers a logical computer

model◦ Allows programmer to focus on solving problems

instead of low-level hardware detailsUse English-like keywords and statements to

write codeUse a compiler or interpreter that translates

code to machine language◦ Makes code portable across different CPU’s and

platforms◦ Programmer does not have to learn each CPU’s

instructions

Compiled vs. InterpretedCompiled vs. InterpretedBoth Compilers and Interpreters

translate source code to machine language

Compiled◦Must compile program on each target

CPU architecture prior to execution.

Interpreted◦Code can be directly run on any platform

that has an available interpreter

About PythonAbout PythonDesigned by Guido van Rossum in late 1980’s Interpreted programming language Imperative, Dynamic, and Object-OrientedPython Programs

◦ are a sequence of instructions written in Python language

◦ interpreter executes the instructions sequentially, in order

◦ programs can take inputs and send data to outputs◦ programs can process and manipulate data◦ programs may read and write data to RAM, Hard

Drive, ...

First Program◦ print “Welcome to learning Python.“

PYTHON: LETS BEGINPYTHON: LETS BEGINThe Python Shell (2.6)You type commands It does stuff

It converts python to machines instructions and runs them right now

Python as CalculatorPython as Calculator

>> 2 + 2 # add4>> 2 * 3 # muliply6>> 3**2 # powers9>> 12 % 11 #modulo (remainder)1

DivisionDivisionInteger division

◦1/4 #Integer division, no decimals◦0

Float division◦1.0/4.0 #float number division◦0.25

LITERALS, VARIABLES, LITERALS, VARIABLES, DATA TYPES,DATA TYPES,STATEMENTS AND STATEMENTS AND EXPRESSIONSEXPRESSIONS

Literals, Data TypesLiterals, Data TypesNumbers Integers are natural numbers: ..., -2, -1, 0, 1,

2, ... (32 bits)Floats contain decimals: 4.5, 67.444443335,

7E2...Booleans: True, FalseLong int’s that exceed 32 bit capacityComplex numbers: 4.5, 1j * 1j = -1 + 0jStringsStrings are used to represent words, text, and

charactersexamples (can use single, double or triple

quotes):◦ “I am learning python.“◦ 'hello.'

VariablesVariablesLiterals are data values that our

program useData is stored in memory, and we

need a way to access itWe use variables to name our data

in memoryWe can retrieve data by calling the

name that refers to it◦student_name = “Ben”◦print student_name

= is the assignment operator, assigns a data value to a variable

Variables Syntax RulesVariables Syntax RulesVariable names must begin with a letter

(uppercase or lowercase) or underscore (_)◦ * good programming convention: variables

should not start with uppercase letters, commonly used for something else

remainder of name can contain letters, (_), and numbers

names may not contain spaces or special characters

names are case sensitive and must not be a reserved python keyword◦ myVariable and myvariable refer to

different data

Statements and Statements and ExpressionsExpressionsStatement perform a task; do not

return a value◦x = 2◦y = 3◦print y

Expression return a value◦>> x+ y◦5

Expressions (evaluate to Expressions (evaluate to values)values)Math expressions

◦>> 10 * 2 + 3◦>> 10 * (2.0 + 3)

Boolean expressions◦>> 10 < 2 # False◦>> 10 >=10 # True

Combined with logic operators◦>> (10<2) or (10==10)

Can combine◦>> (a*c+d) > (d*a-c)

Expressions (evaluate to Expressions (evaluate to values)values)String expressions

◦>> “hel” + “lo” # ‘hello’◦>> “Hi” * 3 # ‘HiHiHi’

Operator Precedence◦Parentheses◦Exponentiation◦Multiplication and Division◦Addition and Subtraction

Operator Precedence (top-to-Operator Precedence (top-to-bottom)bottom)

Data TypesData TypesFinding out a data type

Data TypesData TypesWhat if data types don’t match?

STRONG TYPES no automatic conversion (for non number types)

Data TypesData TypesExplicit conversion

PYTHON KEYWORDS, PYTHON KEYWORDS, USER INPUTUSER INPUT

Python KeywordsPython KeywordsRESERVED: do not use as

variable names

User InputUser InputCreate interactive programs by

requesting user input

CONTROL CONTROL STRUCTURESSTRUCTURES

Branching / Conditional Branching / Conditional StatementsStatementsDecision making

if - statementif - statement

if a < 0:print “a is negative”

if - elseif - else

if - elif - elseif - elif - elseIf one test fails, perform next test

Nested if statementsNested if statements

MODULESMODULES

ModulesModulesPython Strength: large collection of

open source modules•Modules are collections

(packages) of useful (tested) code you can reuse

Common modules: random, mathThe modules we use for the

project:◦urllib, xml

ModulesModulesPython Standard Library

(packages included with most python distributions)◦http://docs.python.org/library/

index.htmlPyPI (Python Package Index)

◦http://pypi.python.org/pypi◦repository of optional modules

available (11,000+)

Using ModulesUsing ModulesMath module contains many

useful functions and values: math.sin, math.cos, math.pow, math.sqrt, ...

Using modules:

Getting helpGetting helpIn python interpreter you can get

documentation

CONTROL STRUCTURES:CONTROL STRUCTURES:REPETITION, ITERATIONREPETITION, ITERATION

RepetitionRepetitionselection statements (if-elif-else)

let us make simple decisions

repeating statements and decisions let us build more complex programs

whilewhile

Testing PrimenessTesting Primeness

break statementbreak statementbreak immediately ends loopsDO NOT overuse; Can be difficult

to read/understand logic

Testing PrimenessTesting Primeness

range(...)range(...)Built in function; produces a

sequence (list)range(0, 3) [0, 1, 2]range(3) [0, 1, 2]range(1, 3) [1, 2]range(1,7,2) [1, 3, 5]

forforThe for loop is used for iteration

continue statementcontinue statement

break and continue work in both while and for loops

Find all primesFind all primes

while - elsewhile - else

Nesting Control StructuresNesting Control Structures

Counter-Controlled LoopsCounter-Controlled Loops

Sentinel-Controlled LoopsSentinel-Controlled Loops

AccumulatingAccumulating

SwappingSwappingx = 2y = 3Swap (WRONG)

◦ x = y ◦ y = x

x = 3y = 3

Swap (CORRECT)◦ z = x◦ x = y◦ y = z

x = 3y = 2

Multiple AssignmentsMultiple AssignmentsaInt, bInt, cInt = 1, 2, 3Swapping with multiple

assignment◦aInt, bInt = bInt, aInt◦Why does this work? (entire right

side is evaluated before assignments)

Everything is an objectEverything is an object

DEBUGGINGDEBUGGING

DebuggingDebuggingSyntax Errors: Python gives us an alert;

code crashesRuntime Errors: How do we fix

incorrect results (logic) in our programs?◦We need to trace the codes execution flow.◦Tracing: Keep track of variable values

as each line is executed◦Print Statements: strategically add

print to view results at each step; don't over do or it will be difficult to keep track Can help us detect Infinite Loops

MORE DATA TYPES...MORE DATA TYPES...(LISTS)(LISTS)

Collection TypesCollection TypesList s:

◦ Sequential and mutable

>> k = [1,3, 5]>> m = [“hel”, 3]

Tuples:◦ Sequential and

immutable

>> (1,2,3)

Dictionaries:◦ map collection>> d={‘name’:

‘Alice’, ‘grade’: ‘100’}

>> print d[‘name’]>> ‘Alice’

Sets:◦ Has unique element>> aSet =

set([‘a,b’])

Lists (also called arrays)Lists (also called arrays)Lists are sequences of

objectsMutable (unlike strings,

and tuples)List are defined with

square brackets [ ], elements are comma , separated

List elements can have different types

List indexing starts at 0 If index is negative,

begin at end of list If index past last

element ERROR

List accessList accessIndexing and Slicing just like

strings (same rules apply)

Working with listsWorking with listsCan convert other collections to

lists

List can contain mixed types, including other collections

Indexing lists of listsIndexing lists of listsLists can be nested

List operatorsList operators+ concatenates two lists (list1 + list2)* repeats the list a number of times (list1 *

Integer) in tests membership

List comparisonsList comparisons>, <, ==, <=, >=, !=Similar rules to strings, compares

elements in orderordered elements being

compared should have same type

Collection functionsCollection functionslen(C) returns the length of the

collection Cmin(C) returns the minimum

element in C; only considers first element of list of lists

max(C) returns the maximum element in C; only considers first element of list of lists

sum(L) returns the sum of elements in list L; elements must all be numbers

Lists can changeLists can changeLists can be modified, MUTABLE

Strings cannot be changed, IMMUTABLE

List methodsList methodsThere are Non-modifying methods (don't

change the list)◦ index(x)◦ count(x)

and Modifying methods (Will change the list without need for assignment)◦ append(x)◦ extend(C)◦ insert(i, x)◦ remove(x)◦ pop()◦ sort()◦ reverse()

Appending and ExtendingAppending and Extendingappend(...) adds a

single element to a list

extend(...) joins two lists◦ can also use '+'

List methodsList methodssort()count(x)pop()del

keyword also removes an element

split() and join()split() and join()Going from

strings to lists and back again

These are string methods

join(C) takes as an argument any iterable collection type (such as lists, strings, tuples)

List ComprehensionList Comprehension

STRINGSSTRINGS

Quote UseQuote UseSingle Quotes

◦‘ These strings must fit on a single line of source ’

Double Quotes◦“ Also has to fit on a single line of source

”Triple (single or double) Quotes

◦""" These quotes are very useful when you need to span multiple lines. They are also often used for long code comments """

Quotes inside stringsQuotes inside stringsTo use apostrophes

◦" Let's use double quotes “To use double quotes in our strings

◦' They say, "use single quotes" ‘Triple Quotes can take care of both

cases◦""" With 3 quotes it's "easy" to use

apostrophes & quotes. """◦''' With 3 quotes it's "easy" to use

apostrophes & quotes. '''

Slash \Slash \We can use the \ to span multiple

linesWorks with strings or expressionsNo character can follow the \

Character escapingCharacter escapingSince some characters have

special meanings, we have to escape them to use them in our strings◦"We can \"escape\" characters like

this“◦'Or let\'s escape them like this‘◦'and this \\ is how we get a backslash

in our string'

WhitespaceWhitespaceThis is an empty string,

not a characterThis is a spaceThis is a tab (a single

character)This is a new line (in

Unix/Mac OS X)This is a new line (in

Windows)This is a new line (in old

Mac <= 9)

Strings are sequencesStrings are sequences

Simple string usageSimple string usageCan access with

indexing like listsStrings do not

have append(...) and extend(...) functions

Adding (+) and Repeating Adding (+) and Repeating (*)(*)We can add (concatenate) strings

with +

We can also repeat them with *

Compare stringsCompare stringsTest equality using ==

What about <, >, <=, >=

StringsStringsStrings are sequences like listsEach element in a string is a

characterCharacters we can print: letters

('a', 'b', 'c', ...) numbers ('1', '3', '4', ...) and symbols ('@', '$', '&', ...)

Non printing characters◦Whitespace: '\t', '\n', '\r\n'◦try printing this '\a'

Characters are really Characters are really numbersnumbersASCII table

Character numerical Character numerical valuesvalues

Print the ABCsPrint the ABCsUsing numbers...

String ComparisonsString ComparisonsCharacters are compared by their

numerical valueshorter strings are smallerIf first characters are equal,

compare the next one

String ComparisonsString ComparisonsThese are

characters, not numbers

Capital letters are smaller (refer to ascii table)

Testing membershipTesting membership

import stringimport string

String is an ObjectString is an Object

Objects containData

◦x = 'Hello' # data is sequence of characters

Actions (methods)◦things object can do (often on self)

Upper/Lower Case Upper/Lower Case methodsmethods

These methods are available to all string objects

Strings are IMMUTABLE◦this means that

characters in the sequence cannot change

◦methods return a new string

◦original data is unchanged

What kind of characterWhat kind of characterThese methods are

available to all string objects

Tests that return boolean types:◦ isalnum() - does the

string contain only letters and digits

◦ isalpha() - does the string contain only letters

◦ isdigit() - does the string contain only digits

Formatting strings with Formatting strings with strip(...)strip(...)

Formatting stringsFormatting strings

String FormattingString Formatting

Formatting FloatsFormatting Floats

Creating Forms/TemplatesCreating Forms/Templates

Using replaceUsing replace

OutputOutput

find(...); rfind(...)find(...); rfind(...)

Go through all matches and Go through all matches and capitalizecapitalize

DICTIONARIESDICTIONARIES

DictionariesDictionariesAnother collection type, but NOT a

sequence (order doesn't matter)Also referred to as an associative

array, or mapDictionaries are a collection of keys

that point to values

Key --> ValueKey --> Value

About DictionariesAbout DictionariesDefine dictionaries using curly braces {}key-value pairs are separated using

colons :Dictionaries are MUTABLE (can add or

remove entries)Keys:

◦ Can only be IMMUTABLE objects (int, float, string, tuples)

Values:◦ Can be anything

Idea: easier to find data based on a simple key, like the English Language Webster Dictionary

Indexing and AssignmentIndexing and AssignmentIndex using square brackets and

keys; returns associated value◦Numbered indices are not defined

Can modify the dictionary by assigning new key-value pairs; or changing value a key points to

Dictionaries with Different Dictionaries with Different Key TypesKey TypesCannot index or search based on

values, only through keysNumbers, Strings, Tuples can be

keys (anything IMMUTABLE)

OperatorsOperators[ ]: for indexing using key inside

square bracketslen(): "length" is the number of

key-value pairs in the dictionaryin: boolean test of membership

(is key in dictionary?)for: iterates through keys in the

dictionary

Operators in useOperators in use

Dictionary MethodsDictionary Methodsitems(): returns all the key-

value pairs as a list of tupleskeys(): returns a list of all the

keysvalues(): returns a list of all the

valuescopy(): returns a shallow copy of

the dictionary

MethodsMethods

zip( ) - dict( )zip( ) - dict( )zip(): creates a list of tuples

from two listsdict(): creates a dictionary from

a mapping object, or an empty dictionary

FUNCTIONSFUNCTIONS

Why Use Functions?Why Use Functions?Functions provide encapsulation,

making code better, readableDivide and Conquer Problem Solving

◦Break large complicated problems into smaller sub-problems

◦Solution to sub-problems written in functions

◦sub-problem solutions can be reused, shared

Simplification/Readability◦Removes duplicate code with function

calls

Why Use Functions?Why Use Functions?Abstraction

◦Provides a high-level interface to program

◦You know WHAT it does, not HOW it does it

Security◦A small well defined piece of code is

easier to prove correct◦Code can be fixed in one place, and will

be corrected everywhere function is called

Function DefinitionFunction Definition

Function CallsFunction Calls

Functions that do things...Functions that do things...no parametersno return statementaffect the environment

Functions that have Functions that have parameters...parameters...definition has parameterscall takes argumentsno return statementaffect the environment

Functions that return Functions that return results...results...return keywordFunction performs processingReturns value/object

Functions with default Functions with default parameters...parameters...Parameters can have default values

We can call this function with:◦print_message("Hello class.")◦print_message("Hello class.", 3)

Can explicitly name arguments:◦print_message(times=3, msg="Hello

class.")

Variable Scope (local Variable Scope (local variables)variables)

Variable Scope (global Variable Scope (global variables)variables)

INTRODUCTION OF INTRODUCTION OF NEWS AGGREGATORNEWS AGGREGATOR

First Module - urllibFirst Module - urllibUsage: import urllibThis module provides a high-level

interface for fetching data across the World Wide Web. In particular, the urlopen() function is similar to the built-in function open(), but accepts Universal Resource Locators (URLs) instead of filenames. Some restrictions apply -- it can only open URLs for reading, and no seek operations are available

urlopen()urlopen()urllib.urlopen(url)Example:

◦import urllib◦f =

urllib.open(“http://www.python.org”)◦text = f.read()

Second Module - Second Module - xml.dom.minidomxml.dom.minidomUsage

◦from xml.dom.minidom import parse, parseString

is a light-weight implementation of the Document Object Model interface. It is intended to be simpler than the full DOM and also significantly smaller

DOM applications typically start by parsing some XML into a DOM

RSS feed programRSS feed programdef get_latest_feed_items():

◦return item_list def

search_latest_feed_items(item_list, searchterm):◦Return filtered_item_list◦Example: search item description

Function usage◦latests = get_latest_feed_items()◦search =

search_latest_feed_items(latests, "game")

Example of ModificationExample of ModificationRetrieve latest feed item list

◦item_list = get_latest_feed_items()Define a search term, “” means

all◦searchterm = “”

Obtain the filtered item list◦filtered_item_list =

search_latest_feed_items(item_list,searchterm)

Example of ModificationExample of ModificationRemember, keys = tagnames in

the XML!If you want to modify useful_keys,

make sure you attach the "u".For example, if you want to add

author, add u'author' to the listDefine your useful keys

◦useful_keys = [u'title', u'pubDate', u'description', u'guid']

Example of ModificationExample of ModificationDisplay all items and keys

◦for item in filtered_item_list:◦ for key in useful_keys:◦ # print "%s: %s" %

(key,item[key])◦ print key + " " + item[key]◦ print " "

Some Modification Ideas Some Modification Ideas (1)(1)Read in an RSS feed and find

MULTIPLE keywords (as many as the user wants),

Return the corresponding articles. You may want to think about the

readability of the results. Note that articles MAY be repeated if

different keywords occur in their titles and/or description (hint: Useful keys).

Some Modification Ideas Some Modification Ideas (II)(II)Filter articles from an RSS feed

based on multiple keywords.(hint: Nested loops, filtering by

one keyword in each loop).

Some Modification Ideas Some Modification Ideas (III)(III)Count how many times certain

interesting words appear in an RSS feed

Plot Excel charts (bar, pie, or line graphs).

Some Modification Ideas Some Modification Ideas (IV)(IV)Read an RSS feed and allow the

user to specify how many news he/she wants to see at one time.

You may want to display how the total number of news first,

THEN ask the user how many news they want to see.

Some Modification Ideas Some Modification Ideas (IV)(IV)The ability to take MULTIPLE RSS

feeds, then go through them ALL and look for articles with a certain keyword.

You can either give user a limit on maximum number of feeds, or allow as many feeds as user wants.

Note: Probably the hardest. This one simulates a mini search engine / web crawler.

Your WorksYour WorksSpecify rolesCome out some ideas or use those

ideas but explain in your own wordsHow much progress you can makeTeam work, coordinate with each

other (Project manager)Try to answer all listed questionPrepare your presentation and all

other worksGrade is based on creativity and

complexity as well as the role you performed

DiscussionDiscussion

top related