The new forums will be named Coin Return (based on the most recent vote)! You can check on the status and timeline of the transition to the new forums here.
The Guiding Principles and New Rules document is now in effect.

Question about I/O in Java.. or, perhaps, Scanners and Strings..

Shazkar ShadowstormShazkar Shadowstorm Registered User regular
edited April 2007 in Help / Advice Forum
I know I have a lot of questions about Java here, and I intend to go ask the TAs about things, but they don't have office hours for another few days and I'd like to get over my current hurdle. (Note: Still learning Java, this is my first programming class, and my first time dealing with I/O).

I'm just quite confused about different things. Most of my issues I've been able to solve via the API (yay, this thing rocks).

Okay, I guess I can tell the assignment in case it is helpful:

Write a java application that reads a file of email messages and classifies each message as spam or not. Here's how this will work:
  • There will be three input files and one output file in this project:
    • An input file consisting of email messages. Each message will have a unique Message Identification Number (MIN) associated with it.
    • An input file with a list of keywords.
    • An input file with a list blacklisted senders.
    • An output file with the list of MINs that belong to spam messages.
    • Here is a sample file of email messages. Notice that each message has a BEGIN and END tag.
    • Here is a sample file of keywords.
    • Here is a sample file of blacklisted senders.
  • Your application must accept as a command line argument the names of all three input files.
  • Your application must mark as spam any messages containing a word on the keyword list in the subject or body of the message or any message sent from a sender on the blacklist. To mark a message as spam you simply need to include its MIN in the output list.
  • Your application must append the email address for the sender of any mail marked spam to the blacklisted file when that address does not already appear there.
  • For any message marked as spam add all words that are 6 characters or longer in the subject line to the keyword list.
  • Finally, you must output a text file containing a list of all of the MINs for mail marked spam.
There are many approaches to this problem but you must use a class Message to represent email messages and a class Filter to provide the filtering functions necessary here. These are minimum requirements. Your filter class should maintain a list of keywords and a list of blacklisted email addresses and a user must be able to add words and addresses to these lists. We will discuss approaches to this problem in more detail in class.

Okay, so, first off... we haven't learned regular expressions, so that's not going to work.
Instead I'll be using Scanner objects, as it seems to be the most logical other solution.

So here are my general problems/questions (I've written some code already, but the stuff I've written I don't need any help on as of yet):
- Okay, so, I'm going to have a set of methods that start off by splitting the mail text file into several different strings, one for ID #, sender, to, subject, body... that stuff, and then use that to create several objects of type Message, which I made. Here is my issue... so I take the message text, and create a scanner object with its contents. That I can do. Then I figure that within a loop I can take it line by line into line sized scanner objects, and within that, a loop that can take it token by token... and then try to check for certain indicators that will tell me where the to, from, id #, subject, and body are.
That is my issue. Okay... I think I can work out a way to figure out the start and end of each message.. when doing the look line by line use the String's contains method to check where the beginning and end are? and concatenate everything in between?
And about, for example, the message ID number.. how would I get that string of numbers and remove the < >'s?

Ablah, I'm asking far too much, in such an incoherent way... it will be easier to ask the TAs in person and articulate it, but it's the start of the weekend and I can't meet with them till Monday, and I'd like to work on this a bit now in case more problems come up later.

Basically, I need help pulling out certain strings from the main text body... I think I have a good idea/already written code for how how to filter the messages after I do that, but this part is stumping me.

poo
Shazkar Shadowstorm on

Posts

  • ValkunValkun Registered User regular
    edited April 2007
    Are you not allowed to use String.IndexOf()/related functions or you simply haven't learned how to use them? Because the solution to this is stupidly easy if you just copy the substring between <begin> and <end> and check that substring for the index of each of the blacklisted words / blocked addresses. Of course, if you're not allowed to use methods from the string class this entire exercise is considerably more menial.

    message ID number:

    String MID = buffer.substring(buffer.indexOf("MIN:")+7,buffer.indexOf(">",buffer.indexOf("MIN:")+7);

    Or something like that, you get the general idea.

    And never forget a quick search for "classname java class" on a search engine will bring up the official javadocs for the given class.

    Valkun on
  • Shazkar ShadowstormShazkar Shadowstorm Registered User regular
    edited April 2007
    No, no, thanks, I wasn't taught that in class, but we are definitely allowed to use it. I was, in fact, spending half of last night looking at the javadoc for String, and trying to figure out how to do it... I will look at what you said and look at the API a bit more and see if I can't make this work.

    Shazkar Shadowstorm on
    poo
  • dsplaisteddsplaisted Registered User regular
    edited April 2007
    I know it's just a programming assignment for a class, but *man*, that is a really bad spam filtering algorithm. Before too long everything will probably be marked as spam.

    dsplaisted on
    2850-1.png
  • Shazkar ShadowstormShazkar Shadowstorm Registered User regular
    edited April 2007
    Question, unrelated to above problems...

    If I'm doing a try-catch thing, with inputting some files, and say it catches an IOException and i want it to quit after printing a message... should I just use System.exit(0); ?? Or something else. This is not a GUI based program of course, or anything.

    Shazkar Shadowstorm on
    poo
  • Mr.FragBaitMr.FragBait Registered User regular
    edited April 2007
    Question, unrelated to above problems...

    If I'm doing a try-catch thing, with inputting some files, and say it catches an IOException and i want it to quit after printing a message... should I just use System.exit(0); ?? Or something else. This is not a GUI based program of course, or anything.

    You mean something like this?
        Scanner fileIn = null;
    try{
        fileIn = new Scanner(new File(args[0]));
    } catch( IOException e){
        System.out.println("Cannot find input file "+args[0]);
        return;
    }
    
    (Pretending that args has a String)

    Using "return" for main will exit it, no need to use System.exit(0)

    Mr.FragBait on
  • Shazkar ShadowstormShazkar Shadowstorm Registered User regular
    edited April 2007
    Thanks.. um, also, is it wrong for me to put several input files in one try method?

    Shazkar Shadowstorm on
    poo
  • SeydlitzSeydlitz Registered User regular
    edited April 2007
    I'm going to disagree with Fragbait and say use a system.exit(1) call; its good practice to get into, and simply using return; doesn't allow you to return a system state other than zero (which means a error has occured, if the shell actually cares about such things)

    And no, putting several input files in one try catch block is perfectly legitimate.

    And also, as a introductory programming exercise thats pretty damn difficult. The approach you're going with seems pretty good at the moment though.

    Seydlitz on
    [SIGPIC][/SIGPIC]
  • Mr.FragBaitMr.FragBait Registered User regular
    edited April 2007
    I won't disagree with you, using system.exit(*non-zero number*) is good practice when coding errors. In his case, all he's doing is writing a simple main for a homework (I'm guessing without constraints like exiting with error codes) that will print an error to standard out and exit, so using return is fine. Plus he was asking if there was an alternative to using system.exit(0), which is what return is in this case.

    Mr.FragBait on
  • AngelHedgieAngelHedgie Registered User regular
    edited April 2007
    Hmm...one thing I would do is use generic ArrayLists to store the Message, keyword and address lists - this way, you can just use foreach loops, and avoid worrying about fencepost errors and type safety.

    Also, I would ask the TAs if they want the filter to iteratively refine the search:
    
    ArrayList<Message> messageList = new ArrayList<Message>();
    ArrayList<string> keywordList = new ArrayList<string>();
    ArrayList<string>addressList = new ArrayList<string>();
    
    ///code to import data///
    
    Filter spamFilter = new Filter();
    bool foundSpam = false;
    
    do
    {
       foundSpam = spamFilter.filterSpam(messageList, keywordList, addressList);
    }
    while(foundSpam == true)
    
    ///inside the Filter class
    public bool filterSpam(ArrayList<Message> messageList, ArrayList<string> keywordList, ArrayList<string> addressList)
    {
       bool foundSpam = false;
       ///scanning code - no, I'm not giving this to you (but I'll give hints)///
       ///when you get a positive, set foundSpam to true///
       return foundSpam;
    }
    
    

    What this does is that it reruns the filtering, this time using the new revised blacklists.

    AngelHedgie on
    XBL: Nox Aeternum / PSN: NoxAeternum / NN:NoxAeternum / Steam: noxaeternum
  • MasterDebaterMasterDebater Registered User regular
    edited April 2007
    Thanks.. um, also, is it wrong for me to put several input files in one try method?

    Keep in mind, though, it might be easier to find where an error is coming from if you put them in separate blocks.

    MasterDebater on
  • Shazkar ShadowstormShazkar Shadowstorm Registered User regular
    edited April 2007
    Well, it looks like it works perfectly right now, except one little problem. Alas, I am having trouble figuring out where... I really need to learn the debugger.

    My code is a bit gross and convoluted, but... for now it's the best I can muster.


    So, it was working properly first at identifying which mails were spam and marking them so... but then I had an issue where I realized that the Scanner, of course, does not reset itself to the beginning, and had to change a bunch of code so that new Scanners were made for different methods that wanted text from the same bit, and just made new Scanners out of strings.

    This worked perfectly now in that it would not add duplicates words to the files, and they would output properly and nicely and neatly. Unfortunately, somehow, now EVERYTHING is marked as spam... and I'm trying to use the debugger to figure out where this is happening, but I'm not very good at using that... -sigh-

    So goddamn close.. yet so far.

    If anyone wants to look at my terrible, gross, confusing, convoluted code, here it is... but I do not recommend it. Don't say I didn't tell you so.
    //*******************************
    // SpamChecker.java
    // by jkfdjkdsfjald
    //
    //
    //*******************************
    
    import java.io.IOException;
    import java.util.Scanner;
    import java.io.File;
    import java.io.FileWriter;
    
    public class SpamChecker{
       
        public static void main(String[] args){
           
           // Check to make sure the right number of command line
           // arguments are input.
           if(args.length != 3)
           {
               System.out.println("Error in number of arguments!");
               System.out.println("Length was: " + args.length);
               return;
           }
            try {
                
                
                // Create new File objects for the input text files.
                Scanner mailFile = new Scanner(new File(args[0]));
                Scanner keywordFileTemp = new Scanner(new File(args[1]));
                Scanner blacklistFileTemp = new Scanner(new File(args[2]));
                
                // Need to read keyword list and blacklist into Strings to pass
                // into Filter object:
                
                String keywordFile = "";
                String textLine = "";
                while(keywordFileTemp.hasNextLine()){
                    textLine = keywordFileTemp.nextLine();
                    keywordFile += "\n" + textLine;
                }
                
                String blacklistFile = "";
                textLine = "";
                while(blacklistFileTemp.hasNextLine()){
                    textLine = blacklistFileTemp.nextLine();
                    blacklistFile += "\n" + textLine;
                }
                
                
                // Create MailReader to seperate e-mail messages:
                MailReader theReader = new MailReader();
                theReader.makeArray(mailFile);
                
                // Create spam filter to filter messages:
                Filter spamFilter = new Filter(keywordFile, blacklistFile);
                spamFilter.checkMailArray(theReader.getMessageArray(),
                        theReader.getCounter());
                
                // Display information about the files being written:
                System.out.println("These are the ID numbers of messages marked"
                        + " as spam:");
                System.out.println(spamFilter.getOutputIDList());
                
                System.out.println("These are the keywords to be added to the"
                        + " keyword list.");
                System.out.println(spamFilter.getKeywordAppend());    
                
                System.out.println("These are the addresses to be added to the"
                        + " blacklist.");
                System.out.println(spamFilter.getBlacklistAppend());
                     
                // Write output.txt
                FileWriter output = new FileWriter("output.txt", false);
                output.write(spamFilter.getOutputIDList());
                output.close();
                
                // Append keywords to keyword file.
                FileWriter keywordOut = new FileWriter(args[1], true);
                keywordOut.write(spamFilter.getKeywordAppend());
                keywordOut.close();
                
                // Append addresses to blacklist file.
                FileWriter blacklistOut = new FileWriter(args[2], true);
                blacklistOut.write(spamFilter.getBlacklistAppend());
                blacklistOut.close();
                
            } catch (IOException ex) {
                System.out.println("Problem with file reading/writing:");
                ex.printStackTrace();
                return;
            } 
        }
    }
    
    
    //*******************************
    // Filter.java
    //  by dflsmlkfj
    //
    // A filter class that manages
    // the filtering aspects of the
    // spam checking program.
    //*******************************
    
    import java.util.Scanner;
    
    public class Filter {
        
        Message mailMessage;
        String keywordFilter;
        String blacklistFilter;
        String outputIDList;
        String blacklistAppend;
        String keywordAppend;
        
        //---------------------------------------
        // Creates filter object and initializes
        // the variables.
        //---------------------------------------
        public Filter(String keywordTemp, String blacklistTemp){
            keywordFilter = keywordTemp;
            blacklistFilter = blacklistTemp;
            outputIDList = "";
            blacklistAppend = "";
            keywordAppend = "";
        }
        
        //----------------------------------------
        // Uses two other private methods
        // to determine if a message is spam,
        // and if so, marks that message as spam.
        //----------------------------------------
        public void checkMailArray(Message[] mailArray, int number){
            for(int i=0; i<=number; i++){
                if((checkBlackList(mailArray[i].getFrom())) || 
                    (checkKeywordList(mailArray[i].getSubject())) ||
                        (checkKeywordList (mailArray[i].getBody()))){
                    mailArray[i].setSpam(true);
                }
            }
            createOutput(mailArray, number);
        }
        
        //-------------------------------------------
        // Checks sender address with blacklist and
        // returns a boolean that depends on if
        // there is a match or not.
        //-------------------------------------------
        private boolean checkBlackList(String address){
            boolean found = false;
            Scanner blackScan = new Scanner(blacklistFilter);
            while((blackScan.hasNextLine()) && (found == false)){
                String badAddress = blackScan.nextLine();
                if (badAddress.equalsIgnoreCase(address)){
                    found = true;
                }
            }
            return found;
        }
        
        //-------------------------------------------
        // Checks body and subject with keywords and
        // returns a boolean that depends on if
        // there is a match or not.
        //-------------------------------------------
        private boolean checkKeywordList(String text){
            boolean found = false;
            Scanner keyScan = new Scanner(keywordFilter);
            while((keyScan.hasNextLine()) && (found == false)){
                String badWord = keyScan.nextLine();
                if (text.indexOf(badWord) != -1){
                   found = true;
                }
            }
            return found;
        }
        
        //---------------------------------------
        // Adds the message ID to output String
        // if the message is marked as spam;
        // Invokes appendBlacklist for senders
        // of messages markes spam.
        // Invokes appendKeyword for tokens longer
        // than 6 characters in the subject line
        // of e-mails marked as spam.
        //---------------------------------------
        private void createOutput(Message[] mailArray, int number){
            for(int i=0; i<=number; i++){
                if (mailArray[i].getSpam()){
                    // Adds ID to output list.
                    outputIDList += mailArray[i].getID() + "\n";
                    // Invokes appendBlacklist for sender.
                    appendBlacklist(mailArray[i].getFrom());
                    Scanner subjScan = new Scanner(mailArray[i].getSubject());
                    // Invokes appendKeyword for tokens longer than 6 characters.
                    while(subjScan.hasNext()){
                       String subjWord = subjScan.next();
                       if (subjWord.length() > 5){
                           appendKeyword(subjWord);
                       }
                    }
                }
            }
        }
        
        //----------------------------------------
        // Adds an address to a list of addresses
        // to be appended to blacklist file,
        // after making sure it is not already
        // in the list.
        //----------------------------------------
        public void appendBlacklist(String address){
            boolean found = false;
            Scanner blackScan = new Scanner(blacklistFilter);
            while ((blackScan.hasNextLine()) && (found==false)){
                String badAddress = blackScan.nextLine();
                if (address.equalsIgnoreCase(badAddress)){
                    found = true;
                }
            }
            if (found != true){
                blacklistAppend += address + "\n";
            }
        }
        
        //----------------------------------------
        // Adds a keyword to a list of keywords
        // to be appended to keyword file,
        // after making sure it is not already
        // in the list.
        //----------------------------------------
        public void appendKeyword(String keyword){
            boolean found = false;
            Scanner keyScan = new Scanner(keywordFilter);
            while ((keyScan.hasNextLine()) && (found==false)){
                String badWord = keyScan.nextLine();
                if (keyword.equalsIgnoreCase(badWord)){
                    found = true;
                }
            }
            if (found != true){
                keywordAppend += keyword + "\n";
            }
        }
        
        //---------------------------------
        // Returns list of ID numbers for
        // messages marked as spam.
        //---------------------------------
        public String getOutputIDList(){
            return outputIDList;
        }
        
        //-----------------------------
        // Returns a String of the 
        // addresses to be appended
        // to the blacklist.
        //-----------------------------
        public String getBlacklistAppend(){
            return blacklistAppend;
        }
        
        //-----------------------------
        // Returns a String of the 
        // keywords to be appended
        // to the keyword list.
        //-----------------------------
        public String getKeywordAppend(){
            return keywordAppend;
        }
        
    }
    
    
    
     //********************************
    // MailReader.java
    //  by fkljfklslk
    //
    // A class that deals with taking
    // a text file of e-mails and 
    // seperating them into seperate
    // Message objects.
    //********************************
    
    import java.util.Scanner;
    
    public class MailReader {
        
        int counter;
        Message[] mailArray;
        
        //---------------------------------------
        // Creates mail reader object and 
        // initializes the variables.
        //---------------------------------------
        public MailReader() {
            counter = 0;
            mailArray = new Message[100];
        }
        
        //------------------------------------
        // Seperates the text file into 
        // different message files, extracting
        // the sender, subject, ID number, and
        // the body into seperate Strings.
        //------------------------------------
        public void makeArray(Scanner mailScan){
            while(mailScan.hasNextLine()){
                // Skip to From: line.
                mailScan.nextLine(); 
                // Extract address
                String fromTemp = mailScan.findInLine("<([A-Z[a-z[0-9[.]]]]+)@([A-Z[a-z[0-9[.]]]]+)>");
                String from = fromTemp.substring(1, fromTemp.length()-1);
                //System.out.println(from); //***************
                
                // Skip to subject line.
                mailScan.nextLine();
                mailScan.nextLine();
                mailScan.nextLine();
    
                // Extract subject
                String tempSubject = mailScan.nextLine();
                String subject = tempSubject.substring(9);
                //System.out.println(subject); //***************
                
                // Skip to MIN # line & extract MIN #
                String msgIDtemp = mailScan.nextLine();
                String msgID = msgIDtemp.substring(6, msgIDtemp.length()-1);
                //System.out.println(msgID); //***************
                
                // Skip to body.
                mailScan.nextLine();
                // Extract the body:
                String msgBodyTemp = "";
                String textLine = "";
                while(textLine.equals("<END>")==false){
                    textLine = mailScan.nextLine();
                    msgBodyTemp += "\n" + textLine;
                }
                String msgBody = msgBodyTemp.substring(0, msgBodyTemp.length() - 5);
                //System.out.println("**** \n"+msgBody); //***************
            
                mailArray[counter] = new Message(from, subject, msgID, msgBody);
                counter++;
            }
        }
        
        //-----------------------------------
        // Returns the counter (so that the 
        // last non-null index in the array
        // can be known).
        //-----------------------------------
        public int getCounter(){
            return (counter - 1);
        }
    
        
        //---------------------------------------
        // Returns the entire message array.
        //---------------------------------------
        public Message[] getMessageArray(){
            return mailArray;
        }
        
        
    }
    
    
     //*******************************
    // Message.java
    //  by jfkdslfjkdlsfj
    //
    // A class for an object that 
    // represents an e-mail message.
    //*******************************
    
    public class Message {
        
        private String from;
        private String subject;
        private String body;
        private String messageID;
        
        private boolean spam;
        
        //---------------------------------------
        // Creates an instance of the message
        // object and initializes the variables.
        //---------------------------------------
        public Message(String whoFrom, String theSubject, String theMessageID,
                String theBody) {
            
            spam = false;
            
            from = whoFrom;
            subject = theSubject;
            messageID = theMessageID;
            body = theBody;        
        }
        
        //----------------------------
        // Mutator method for 'spam'
        //----------------------------
        public void setSpam(boolean isSpam) {
            spam = isSpam;
        }
        
        //---------------------------------
        // Accessor method for 'messageID'
        //---------------------------------
        public String getID() {
            return messageID;
        }    
        
        //----------------------------
        // Accessor method for 'from'
        //----------------------------    
        public String getFrom() {
            return from;
        }
        
        //------------------------------
        // Accessor method for 'subject'
        //------------------------------
        public String getSubject() {
            return subject;
        }
        
        //----------------------------
        // Accessor method for 'body'
        //----------------------------
        public String getBody() {
            return body;
        }
        
        //----------------------------
        // Accessor method for 'spam'
        //----------------------------
        public boolean getSpam() {
            return spam;
        }
        
    }
    
    

    At this point, after not having slept for forever, and it being 6AM, I really don't give a crap about how messy and how bad style it is. I just want it to work. Woe is me.

    Now I truly know what it is like to procrastinate and suffer. This was not a one night project, by far, for a noob like me.

    Shazkar Shadowstorm on
    poo
  • AngelHedgieAngelHedgie Registered User regular
    edited April 2007
    First, calm down. You're going to code badly if you're upset. Take it from someone who codes for a living.

    Second, have you learned about foreach, ArrayList, and generics? Those three things can make coding this project a bit more straightforward. If you want to talk about it, PM me, and I'll explain.

    AngelHedgie on
    XBL: Nox Aeternum / PSN: NoxAeternum / NN:NoxAeternum / Steam: noxaeternum
  • Shazkar ShadowstormShazkar Shadowstorm Registered User regular
    edited April 2007
    Alas, we have not learned those. I definitely would like to know more about them, but sadly this project is due too soon for them to be a part of it. However, I'll drop you a PM after I finish this so I can see how I may have done this differently afterwards.

    Okay, simple question that would help: How do I track one private instance variable from an object in an array with the debugger?

    EDIT: Scratch the question, figured it out... will try and solve problem.

    Shazkar Shadowstorm on
    poo
  • Shazkar ShadowstormShazkar Shadowstorm Registered User regular
    edited April 2007
    YATTA!

    So, it turns out using indexOf with nextLine to check if something existed was a bad combination, because if the line was blank, then every String contained it. Hence, everything was returned as false.

    I just switched it to dual while loops and checked each token one by one and it worked.

    Shazkar Shadowstorm on
    poo
  • mindlarmindlar Registered User regular
    edited April 2007
    Regarding System.exit(value) usage should generally be avoided. The only time it should be used is if the program that is being written ends up in a horribly broken state. It is easiest to avoid developing bad habits early.

    IMHO it should be used sparingly for things like startup failure of an application, non-recoverable failures of a large application (like database connection issues assuming they are needed), etc. Almost all the time, there are better ways of managing failure like returning from a method, throwing an exception, logging the error, and/or informing the user.

    mindlar on
Sign In or Register to comment.