The new forums will be named Coin Return (based on the most recent vote)! You can check on the status and timeline of the transition to the new forums here.
The Guiding Principles and New Rules document is now in effect.
python question, simple, just i'm used to java is all
I'm not sure I understand what this has to do with references. Why don't you just use processtext.vocabdict to access the dictionary? If you're concerned about privacy leaks, you could use the Copy library and call copy.deepcopy(vocabdict).
you can ignore most of the code, i know it processes the text right
#processtext.py
import csv, re
def preprocess():
vocabdict = dict()
sections=['arts','business','obituaries','sports','world']
sectiondict = dict()
#vocaboutput=file("vocabulary",'w')
stopwords = open('stopwords', 'r').read().split()
for section in sections:
#set up output
#output=file(section+"_words", 'w')
reader = csv.reader(open("./nytimes_sections/"+section+".tsv", "rb"), delimiter=' ', quoting=csv.QUOTE_NONE)
sectiondict[section] = []
for article in reader:
#lowercase
s=article[1].lower()+' '+article[2].lower()
#remove &_____;
s = re.sub('&\w{5};'," ",s)
#remove punctuation
s = re.sub('[^A-Za-z\'\s]'," ",s)
#remove multiple whitespaces
s = re.sub('\s{2,}'," ",s)
#create a list
textwords = s.split()
#remove stopwords
filteredwords = [t for t in textwords if t not in stopwords]
#remove apostrophes
for i, word in enumerate(filteredwords):
word = re.sub('\'',"",word)
filteredwords[i]=word
#filter stopwords again
filteredwords = [t for t in filteredwords if t not in stopwords]
#remove blank items in list
finalwords = [t for t in filteredwords if t != '']
#find unique words for article & overall new unique words
articledict = dict()
for item in finalwords:
if item not in articledict:
articledict[item]=1
if item not in vocabdict:
vocabdict[item]=1
articlewords = articledict.keys()
sectiondict[section].append(articledict)
return (vocabdict, sectiondict)
this is what confuses me
>>> import processtext as pt
>>> (vocabdict,sectiondict)=pt.preprocess()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'NoneType' object is not iterable
Python doesn't have access permissions like Java. There is kind of an equivalent if you precede variable names with underscores (1 underscore for protected, 2 for private), but this is a name scrambling mechanism rather than a permission system. (That said, it works well and violating encapsulation can be very tempting unless you use this feature.)
Java doesn't really have this, so if it hasn't sunk in yet just take a second to contemplate that everything in python is an object. That includes modules. (Specifically, all members of a python object are stored via a dictionary and can retrieved in that manner - including functions (which are objects)).
Posts
AttributeError: 'module' object has no attribute 'vocabdict'
eh?
edit: okay, i'll read up on attributes i guess
accessing it directly
do i need to make some kind of declaration
you can ignore most of the code, i know it processes the text right
this is what confuses me though i don't get an error if i just do
explanations? help?
>>> tupe=pt.preprocess()
>>> tupe
>>> print tupe
None
i think that will fix it
1 sec
You should be able to get it by using a function in processtext that returns vocabdict, or you can just access it directly with processtext.vocabdict.
If this isn't working, close and reopen your main file (mainproblem) and try it again.
i just needed to quit the interpreter and try it again
that fixed everything
so my code was right
THANK YOU
I think import commands only do something once per session.
umm how inefficient is it for me to use a dictionary instead of a sparse matrix? seems easier for me to use a dict, but i was just wondering
though i might need to convert some of that to a sparse matrix for matrix multiplication sake
anyone know what the easiest way to create sparse matrix in python is? scipy?
Python doesn't have access permissions like Java. There is kind of an equivalent if you precede variable names with underscores (1 underscore for protected, 2 for private), but this is a name scrambling mechanism rather than a permission system. (That said, it works well and violating encapsulation can be very tempting unless you use this feature.)
Java doesn't really have this, so if it hasn't sunk in yet just take a second to contemplate that everything in python is an object. That includes modules. (Specifically, all members of a python object are stored via a dictionary and can retrieved in that manner - including functions (which are objects)).