The new forums will be named Coin Return (based on the most recent vote)! You can check on the status and timeline of the transition to the new forums here.
The Guiding Principles and New Rules document is now in effect.

python question, simple, just i'm used to java is all

Shazkar ShadowstormShazkar Shadowstorm Registered User regular
edited December 2009 in Help / Advice Forum
simple python question, basically with how python handles objects and references:

so lets i have this file, processtext.py

in this there is a dict called vocabdict

i have a function in processtext.py that fills vocabdict called preprocess()

i have another file, call it mainproblem.py

in that file i:
import processtext
processtext.preprocess()

then i want to get that dictionary, vocabdict

how do i do this?

having a function in processtext.py that does return vocabdict does not work, that gives me an error about the global name not being defined

so...

what do i do

or am i just structuring my program wrong and python should be structured differently or something

poo
Shazkar Shadowstorm on

Posts

  • KlorgnumKlorgnum Registered User regular
    edited December 2009
    I'm not sure I understand what this has to do with references. Why don't you just use processtext.vocabdict to access the dictionary? If you're concerned about privacy leaks, you could use the Copy library and call copy.deepcopy(vocabdict).

    Klorgnum on
  • Shazkar ShadowstormShazkar Shadowstorm Registered User regular
    edited December 2009
    if i try to access that, it says

    AttributeError: 'module' object has no attribute 'vocabdict'

    eh?

    edit: okay, i'll read up on attributes i guess

    Shazkar Shadowstorm on
    poo
  • Shazkar ShadowstormShazkar Shadowstorm Registered User regular
    edited December 2009
    like... how does that work

    accessing it directly

    do i need to make some kind of declaration

    Shazkar Shadowstorm on
    poo
  • Shazkar ShadowstormShazkar Shadowstorm Registered User regular
    edited December 2009
    easier question:


    you can ignore most of the code, i know it processes the text right
    #processtext.py
    
    import csv, re
    
    def preprocess():
    	vocabdict = dict()
    	sections=['arts','business','obituaries','sports','world']
    	sectiondict = dict()
    	
    	#vocaboutput=file("vocabulary",'w')
    	
    	stopwords = open('stopwords', 'r').read().split()
    		
    	for section in sections:
    		#set up output
    		#output=file(section+"_words", 'w')
    		
    		reader = csv.reader(open("./nytimes_sections/"+section+".tsv", "rb"), delimiter='	', quoting=csv.QUOTE_NONE)
    		
    		sectiondict[section] = []
    				
    		for article in reader:	
    			#lowercase
    			s=article[1].lower()+' '+article[2].lower()
    			
    			#remove &_____;
    			s = re.sub('&\w{5};'," ",s)
    				
    			#remove punctuation
    			s = re.sub('[^A-Za-z\'\s]'," ",s)
    			
    			#remove multiple whitespaces
    			s = re.sub('\s{2,}'," ",s)
    			
    			#create a list	
    			textwords = s.split()
    	
    			#remove stopwords
    			filteredwords = [t for t in textwords if t not in stopwords]
    			
    			#remove apostrophes
    			for i, word in enumerate(filteredwords):
    				word = re.sub('\'',"",word)
    				filteredwords[i]=word
    			
    			#filter stopwords again
    			filteredwords = [t for t in filteredwords if t not in stopwords]
    			
    			#remove blank items in list
    			finalwords = [t for t in filteredwords if t != '']
    			
    			#find unique words for article & overall new unique words
    			articledict = dict()
    			for item in finalwords:
    				if item not in articledict:
    					articledict[item]=1
    				if item not in vocabdict:
    					vocabdict[item]=1
    			articlewords = articledict.keys()
    			
    			sectiondict[section].append(articledict)
    	
    	return (vocabdict, sectiondict)
    	
    
    

    this is what confuses me
    
    >>> import processtext as pt
    >>> (vocabdict,sectiondict)=pt.preprocess()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: 'NoneType' object is not iterable
    
    
    though i don't get an error if i just do
    >>> pt.preprocess()
    
    

    explanations? help?

    Shazkar Shadowstorm on
    poo
  • Shazkar ShadowstormShazkar Shadowstorm Registered User regular
    edited December 2009
    okay its returning None:

    >>> tupe=pt.preprocess()
    >>> tupe
    >>> print tupe
    None

    Shazkar Shadowstorm on
    poo
  • AetheriAetheri Registered User regular
    edited December 2009
    Edit: Oops, scrap that. Give me a moment to look over your code.

    Aetheri on
  • Shazkar ShadowstormShazkar Shadowstorm Registered User regular
    edited December 2009
    ohhh

    i think that will fix it

    1 sec

    Shazkar Shadowstorm on
    poo
  • AetheriAetheri Registered User regular
    edited December 2009
    What I posted initially is a bad way to do it, though- it's much better to return something rather than having it be a module-level variable.

    Aetheri on
  • Shazkar ShadowstormShazkar Shadowstorm Registered User regular
    edited December 2009
    moving vocabdict and sectiondict outside the function didn't fix anything unfortunately

    Shazkar Shadowstorm on
    poo
  • Marty81Marty81 Registered User regular
    edited December 2009
    simple python question, basically with how python handles objects and references:

    so lets i have this file, processtext.py

    in this there is a dict called vocabdict

    i have a function in processtext.py that fills vocabdict called preprocess()

    i have another file, call it mainproblem.py

    in that file i:
    import processtext
    processtext.preprocess()

    then i want to get that dictionary, vocabdict

    how do i do this?

    having a function in processtext.py that does return vocabdict does not work, that gives me an error about the global name not being defined

    so...

    what do i do

    or am i just structuring my program wrong and python should be structured differently or something

    You should be able to get it by using a function in processtext that returns vocabdict, or you can just access it directly with processtext.vocabdict.

    If this isn't working, close and reopen your main file (mainproblem) and try it again.

    Marty81 on
  • Shazkar ShadowstormShazkar Shadowstorm Registered User regular
    edited December 2009
    oh


    i just needed to quit the interpreter and try it again


    that fixed everything


    so my code was right


    THANK YOU

    Shazkar Shadowstorm on
    poo
  • Marty81Marty81 Registered User regular
    edited December 2009
    Yeah, no problem.

    I think import commands only do something once per session.

    Marty81 on
  • Shazkar ShadowstormShazkar Shadowstorm Registered User regular
    edited December 2009
    yeah, thanks for sure

    umm how inefficient is it for me to use a dictionary instead of a sparse matrix? seems easier for me to use a dict, but i was just wondering

    Shazkar Shadowstorm on
    poo
  • BarrakkethBarrakketh Registered User regular
    edited December 2009
    Use a dictionary if it's appropriate for the job at hand. They're rather fast, and Python uses them internally for a number of things (like classes).

    Barrakketh on
    Rollers are red, chargers are blue....omae wa mou shindeiru
  • Shazkar ShadowstormShazkar Shadowstorm Registered User regular
    edited December 2009
    yeah, my program ran pretty fast with something like just 9000 documents using dictionary

    though i might need to convert some of that to a sparse matrix for matrix multiplication sake

    anyone know what the easiest way to create sparse matrix in python is? scipy?

    Shazkar Shadowstorm on
    poo
  • PracticalProblemSolverPracticalProblemSolver Registered User regular
    edited December 2009
    scipy, don't worry about the dict inefficiency, especially since it's already done.

    PracticalProblemSolver on
  • LoneIgadzraLoneIgadzra Registered User regular
    edited December 2009
    like... how does that work

    accessing it directly

    do i need to make some kind of declaration

    Python doesn't have access permissions like Java. There is kind of an equivalent if you precede variable names with underscores (1 underscore for protected, 2 for private), but this is a name scrambling mechanism rather than a permission system. (That said, it works well and violating encapsulation can be very tempting unless you use this feature.)

    Java doesn't really have this, so if it hasn't sunk in yet just take a second to contemplate that everything in python is an object. That includes modules. (Specifically, all members of a python object are stored via a dictionary and can retrieved in that manner - including functions (which are objects)).

    LoneIgadzra on
Sign In or Register to comment.