I know I have a lot of questions about Java here, and I intend to go ask the TAs about things, but they don't have office hours for another few days and I'd like to get over my current hurdle. (Note: Still learning Java, this is my first programming class, and my first time dealing with I/O).
I'm just quite confused about different things. Most of my issues I've been able to solve via the API (yay, this thing rocks).
Okay, I guess I can tell the assignment in case it is helpful:
Write a java application that reads a file of email messages and classifies each message as spam or not. Here's how this will work:
- There will be three input files and one output file in this project:
- An input file consisting of email messages. Each message will have a unique Message Identification Number (MIN) associated with it.
- An input file with a list of keywords.
- An input file with a list blacklisted senders.
- An output file with the list of MINs that belong to spam messages.
- Here is a sample file of email messages. Notice that each message has a BEGIN and END tag.
- Here is a sample file of keywords.
- Here is a sample file of blacklisted senders.
- Your application must accept as a command line argument the names of all three input files.
- Your application must mark as spam any messages containing a word on the keyword list in the subject or body of the message or any message sent from a sender on the blacklist. To mark a message as spam you simply need to include its MIN in the output list.
- Your application must append the email address for the sender of any mail marked spam to the blacklisted file when that address does not already appear there.
- For any message marked as spam add all words that are 6 characters or longer in the subject line to the keyword list.
- Finally, you must output a text file containing a list of all of the MINs for mail marked spam.
There are many approaches to this problem but you
must use a class Message to represent email messages and a class Filter to provide the filtering functions necessary here. These are minimum requirements. Your filter class should maintain a list of keywords and a list of blacklisted email addresses and a user must be able to add words and addresses to these lists. We will discuss approaches to this problem in more detail in class.
Okay, so, first off... we haven't learned regular expressions, so that's not going to work.
Instead I'll be using Scanner objects, as it seems to be the most logical other solution.
So here are my general problems/questions (I've written some code already, but the stuff I've written I don't need any help on as of yet):
- Okay, so, I'm going to have a set of methods that start off by splitting the mail text file into several different strings, one for ID #, sender, to, subject, body... that stuff, and then use that to create several objects of type Message, which I made. Here is my issue... so I take the message text, and create a scanner object with its contents. That I can do. Then I figure that within a loop I can take it line by line into line sized scanner objects, and within that, a loop that can take it token by token... and then try to check for certain indicators that will tell me where the to, from, id #, subject, and body are.
That is my issue. Okay... I think I can work out a way to figure out the start and end of each message.. when doing the look line by line use the String's contains method to check where the beginning and end are? and concatenate everything in between?
And about, for example, the message ID number.. how would I get that string of numbers and remove the < >'s?
Ablah, I'm asking far too much, in such an incoherent way... it will be easier to ask the TAs in person and articulate it, but it's the start of the weekend and I can't meet with them till Monday, and I'd like to work on this a bit now in case more problems come up later.
Basically, I need help pulling out certain strings from the main text body... I think I have a good idea/already written code for how how to filter the messages after I do that, but this part is stumping me.
Posts
message ID number:
String MID = buffer.substring(buffer.indexOf("MIN:")+7,buffer.indexOf(">",buffer.indexOf("MIN:")+7);
Or something like that, you get the general idea.
And never forget a quick search for "classname java class" on a search engine will bring up the official javadocs for the given class.
If I'm doing a try-catch thing, with inputting some files, and say it catches an IOException and i want it to quit after printing a message... should I just use System.exit(0); ?? Or something else. This is not a GUI based program of course, or anything.
You mean something like this? (Pretending that args has a String)
Using "return" for main will exit it, no need to use System.exit(0)
And no, putting several input files in one try catch block is perfectly legitimate.
And also, as a introductory programming exercise thats pretty damn difficult. The approach you're going with seems pretty good at the moment though.
Also, I would ask the TAs if they want the filter to iteratively refine the search:
What this does is that it reruns the filtering, this time using the new revised blacklists.
Keep in mind, though, it might be easier to find where an error is coming from if you put them in separate blocks.
My code is a bit gross and convoluted, but... for now it's the best I can muster.
So, it was working properly first at identifying which mails were spam and marking them so... but then I had an issue where I realized that the Scanner, of course, does not reset itself to the beginning, and had to change a bunch of code so that new Scanners were made for different methods that wanted text from the same bit, and just made new Scanners out of strings.
This worked perfectly now in that it would not add duplicates words to the files, and they would output properly and nicely and neatly. Unfortunately, somehow, now EVERYTHING is marked as spam... and I'm trying to use the debugger to figure out where this is happening, but I'm not very good at using that... -sigh-
So goddamn close.. yet so far.
If anyone wants to look at my terrible, gross, confusing, convoluted code, here it is... but I do not recommend it. Don't say I didn't tell you so.
At this point, after not having slept for forever, and it being 6AM, I really don't give a crap about how messy and how bad style it is. I just want it to work. Woe is me.
Now I truly know what it is like to procrastinate and suffer. This was not a one night project, by far, for a noob like me.
Second, have you learned about foreach, ArrayList, and generics? Those three things can make coding this project a bit more straightforward. If you want to talk about it, PM me, and I'll explain.
Okay, simple question that would help: How do I track one private instance variable from an object in an array with the debugger?
EDIT: Scratch the question, figured it out... will try and solve problem.
So, it turns out using indexOf with nextLine to check if something existed was a bad combination, because if the line was blank, then every String contained it. Hence, everything was returned as false.
I just switched it to dual while loops and checked each token one by one and it worked.
IMHO it should be used sparingly for things like startup failure of an application, non-recoverable failures of a large application (like database connection issues assuming they are needed), etc. Almost all the time, there are better ways of managing failure like returning from a method, throwing an exception, logging the error, and/or informing the user.