So I'm starting a new project, which will require a lot of C++ coding. That's no problem. However, it will also require some specialized resources and libraries. I'm free to use whatever I want, but I'm not familiar with any of the appropriate tools. So I'd like some advice from more seasoned users on what to use or avoid in each case.
1. SQL
I'll need to design, implement and use a database in this program. MS Visual Studio 2005 comes with an SQL server, which I guess I'll be using. But I've never handled databases before, much less with SQL. Does anyone have a good tutorial for beginners regarding how to create databases and how to build interfaces in C++?
2. Natural language library
My program will need to handle English text documents. So I would need a good library of C++ functions to handle words and sentences. Preferably one that comes with a word stemmer and with a configurable stopword removal function.
3. Wiki code parser
In addition to English, my program will need to handle Wiki-style pages. This will require to parse documents and separate real text from mark-up code like
this or {{this}} or '''this'''. Does a tool exist to do this?
Thank you in advance.
Posts
4. C++ to C#
Does anyone have a brief tutorial about teaching C# to seasoned C++ programmers?
5. 1, 2 and 3 redux
I'll need an SQL interfacing tutorial, an English handling library, and a Wiki code parser for C#.
I haven't gotten around to actually using it yet but I've heard some good things from friends.
Here's a link to some open source wiki engines in C# (http://csharp-source.net/open-source/wiki-engines). You should be able to pull out the tag processing code from one of them.
SQL access in C# is pretty easy, but I don't have a single source that I learned from, I just kind of picked up pieces here and there as I went. Here's a brief tutorial about setting up and using a database connection http://www.codeproject.com/KB/database/sql_in_csharp.aspx
As for C#, you know 90% of the language if you know C++ (I was able to pick it up pretty quickly myself.) The main things are getting used to some of the quirks of the language (which are actually a whole lot simpler than what C++ does, such as including libraries.) And C# comes with a native regex library, which should handle most of your detection issues.
All of the database access code is built into the framework, so you won't have to hunt down any libraries, and the string parsing functions built in should make handling text much easier.
Is the app you are making a web app, or a desktop app?
As far as interfacing with your DB, .NET has all that built right in. Most of what you will be doing will involve the SqlCommand and SqlDataReader objects which will run queries against the DB and read your data into datasets or business objects respectively.
Also look into Stored Procedures to spare you all that nasty in line SQL.
So, C# is really easy, and the MSDN help files are detailed. After one afternoon of work, I've already got a basic app working, which can take in a sentence, parse it, remove stop words (from a hard-coded list rather than an external textfile list like I wanted, but it's good enough) and stem remaining words. So that takes care of points 2 and 4.
So far, so good.
Today I'm going to look into creating the database. I would like my program to be able to:
A. Read a Wikipedia database dump (6Gb) line by line
B. Extract the necessary information
C. Store it in a database
I haven't looked into C#'s file handling options yet, but I expect points A and B should be simple enough. Point C is going to be the more challenging one, since it will require all those database commands I don't know. Especially linking between several objects in the database.
I'll start looking into those SQL links you guys sent me yesterday. In the mean time, as always, if you have more ideas of tutorials to read, I'll welcome them. Thanks!
It's a desktop app.
How you go about storing the data in the DB depends entirely on how your DB is crafted, I have really no idea what you want to do with the data but basically you are going to need to do something like this.
Read in your file
Line by line, determine what the hell this line is, pick out the data, and then call a DB function that takes the data and inserts it into a SqlCommand (via SqlParameter).
Then, you execute SqlCommand against a stored procedure in your DB which is where the actual SQL is contained.
So, let's say for example your data object is an author, and his properties are name, location, join date
In a typical MVC program in .NET you would probably create your data object for the author, but I think that in this case, since you are reading in a 6 gig DB, you probably aren't going to be storing all this data in memory and THEN hitting the database all at once with it, that would seriously bog down/crash the computer. So you can get away with just passing off the parameters to your DB function one line at a time.
I usually just call mine Data.cs
Your data function might look something like:
That function is void but if you want to build in some failsafes to your app you can make that function return a bool, and you can determine if the DB submit is actually working by evaluating the return value of ExecuteNonQuery which will tell you the number of affected rows. In this case, 1, since your stored Proc will be a simple INSERT.
Sorry for the sloppy comments, hopefully you can make sense of it.