The new forums will be named Coin Return (based on the most recent vote)! You can check on the status and timeline of the transition to the new forums here.
The Guiding Principles and New Rules document is now in effect.
HTML ALT and TITLE Checker utility
L Ron HowardThe duckMinnesotaRegistered Userregular
Hi guys. I've been tasked with checking an entire website's alt and title attributes. Of course, I'm not going to do this by hand as the task is simply too large to take on by myself; and being human, I'm sure I'll forget where I am/what I'm doing (with whatever I use to keep track, such as Excel) and end up skipping entire pages or wasting time redoing it. So I'm thinking this will be best done automagically by a computer, since that's kind of what they excel in compared to most of us humans.
I, of course, could spend some time making a Python or PERL script to do it for me, but I'm wondering if anyone knows of a pre-existing utility or tool or script or whatever that is already made that will do this for me.
Does anyone know of one or have something pre-made?
Xenu link sleuth is possibly the tool you're looking for. I think it includes page data like title tags (nit sure about alt tags) in the excel export so at least you'll get a list of pages and their respective title tags. You can possibly also use Microsoft IIS to do this although I've never personally used it and it needs to be installed on an MS server to run which may be more hassle than this task is worth.
Also your username being L Ron Hubbard leads to delicious irony if xenu is the solution.
Szechuanosaurus on
0
L Ron HowardThe duckMinnesotaRegistered Userregular
edited December 2010
Well, I've been tasked with making sure all the img tags have an alt attribute, and pretty much all text has a title. On every page on a website.
I'll give Xenu a shot. I mean, he has his own cult too, right?
TITLE is the attribute that you're supposed to use for tooltip functionality because HTML people are uptight about the proper use of ALT being for when the image doesn't display.
Orogogus on
0
L Ron HowardThe duckMinnesotaRegistered Userregular
edited December 2010
What Orogogus said.
You can have TITLE attributes on imgs and ALT on text, but browsers don't render them if they're not exactly. I think it's a W3 convention to do it that way.
Either way, I need to scour an entire site and make sure that all P tags, or whatever have the TITLE attribute, and all IMG tags have an ALT attribute.
If you type in the url of each page if this website is live, or upload the file, or copy and paste the code to http://validator.w3.org/ with the "more options" set to XHTML 1.0 Strict it should highlight all of the missing alt tags for images.
Since the title attribute is optional except for the title tag in the head, how come they want you to check those may I ask? Are you checking specific tags like all p tags or table tags? Or is it that some portions of html need to have the title attribute for it to function properly? Something doesn't sound right about needing to do this if you don't mind me saying. It's my perception that professional websites don't use title attributes on all tags of certain kinds. It's only used optionally with specific sets of menu links or something.
splash on
0
L Ron HowardThe duckMinnesotaRegistered Userregular
edited December 2010
This is where it gets tricky.
The site I'm working on isn't publicly accessible, so I can't really run the W3 validator on it, unless I manually download each page. I don't have access to the HTML, I'm only really an end user, which is what's making it harder than it probably should be. And considering most of the site is dynamic, all I think I would be able to do is upload the template, and hope that that's W3 valid, because I'm quite certain the dynamic stuff isn't even close.
As for what has or needs the TITLE attribute, I'm at home, so I don't have access it, so this is off of memory. There are DIVs that are Xpx wide and Ypx tall. There is a bulleted <UL...> list there. So I believe all the UL tags have titles, but only the ones which get cut off and have the ellipses shown have the title attributes appear when the mouse is hovered over it. I know there are other examples, but I can't really think of any right now.
So the issue that I'm really trying to solve is that when it was coded, since so many people have touched it, there's no consistency. Not all of the IMGs have ALT attributes, and not all of the UL have a TITLE attribute. Of course, all of the noticeable ones have the correct ALT and TITLE attributes accordingly, but that's not a guarantee for the rest of the site.
They should know that you absolutely can't do your job without having access to the html pages. I suppose a working version of the site can't be put on the internet temporarily for you to work with due to security or some kind of concern. And then you'd have to make sure any new content they are working on separately is following the correct coding practice and you'd be able to place the old site files back correctly. The other choice then being at least a computer with internet access and access to all the html files.
Unfortunately I'm sure there could have been a consistent way to code the title attributes within the ul lists but they allowed something more manual and up to the user it sounds like. If they don't have a better way to this with new content already I hope you'll speak up about this problem.
Do people update the site with a content management system?
Ok, yeah, xenu isn't going to help here. IIS might be of some use, I think it downloads the HTML to the machine its running on so that would then give you access to the HTML directly (although not the source code and whatever the CMS is doing to generate the HTML).
You'll still have to find a way to validate all that HTML though. TBH something like mechanical Turk might have been the most cost effective and efficient option - get a swarm of humans verifying the information page by page - but I guess that's out if you can't publicly access the site.
Szechuanosaurus on
0
L Ron HowardThe duckMinnesotaRegistered Userregular
edited December 2010
Yeah, sorry. It's not publicly available, and there is no way to make it available.
I'm a lowly cog on this whole machine, so there's no way they'd release the HTML to me.
I'm honestly not sure how the information is generated. I'm speculating that it's some sort of archaic system where someone goes in by hand. Or a very rudimentary system. There are too many inconsistencies for it to be done by a machine with required fields. Hence the reason why some ULs have TITLE attributes and some don't, and some IMGs have correct ALT attributes, and some have "" while others have "alternate_image."
And correct me if I'm wrong, but if the code were genuinely W3 valid, it should, theoretically, render the same across all browsers, right? I know the CSS and JavaScript are really horrid, but I don't think the HTML is any better, judging by the few glimpses I've taken at the stuff that's on the website. I don't really think it's even a thought on anyone's mind to make it valid. Just valid enough.
How would I set up IIS on my machine to run through and download everything? I didn't even know it had that capability. I've set it up as my own webserver before, and to play around in SharePoint, but I've never seen anything that says it will download all the HTML on a site.
Yea the more W3 compliant the better it renders across all common browsers since most browsers try to be as W3 complaint as possible, except for Internet Explorer. IE not only needs separate code because it has so many quirks it also allows less strict coding and missing tags.
Really all pages on the site should be be validated against one doctype that they decide to shoot for like XHTML 1.0 Transitional or HTML 4.01 Transitional. Hopefully they are using all external CSS. The JavaScript should be checked against strict options.
And again I'd recommend bringing up suggestions to the appropriate people to make the site better, even if the site is not that integral or important to the company. Maybe those in charge don't understand. Asking you to do a couple things like this is a patch but if the entire site is coded in this way there's going to be tons of other fixes to be made. Any effort now to help the situation will save like 10x the effort down the road.
This is where it gets tricky.
The site I'm working on isn't publicly accessible, so I can't really run the W3 validator on it, unless I manually download each page. I don't have access to the HTML, I'm only really an end user, which is what's making it harder than it probably should be. And considering most of the site is dynamic, all I think I would be able to do is upload the template, and hope that that's W3 valid, because I'm quite certain the dynamic stuff isn't even close.
As for what has or needs the TITLE attribute, I'm at home, so I don't have access it, so this is off of memory. There are DIVs that are Xpx wide and Ypx tall. There is a bulleted <UL...> list there. So I believe all the UL tags have titles, but only the ones which get cut off and have the ellipses shown have the title attributes appear when the mouse is hovered over it. I know there are other examples, but I can't really think of any right now.
So the issue that I'm really trying to solve is that when it was coded, since so many people have touched it, there's no consistency. Not all of the IMGs have ALT attributes, and not all of the UL have a TITLE attribute. Of course, all of the noticeable ones have the correct ALT and TITLE attributes accordingly, but that's not a guarantee for the rest of the site.
Does all of this make sense?
There are firefox plugins that will use the W3C validation engine through the browser. So if you can browse to the site from your pc, that would be one way to do it.
wmelon on
0
Seguerof the VoidSydney, AustraliaRegistered Userregular
edited December 2010
Validation isn't going to check that each UL > LI has a "title" attribute if it contains an ellipsis.
This is something you'll need a script for, access to the template that generates the html (note: you've mentioned that you can't get access to the html - this is incorrect because if you can view the page, you have the html), or to do manually.
The best solution here would be to get access to the CMS/template, and figure out how the ellipsis is added, and thus how the title attribute is added. If there's an inconsistency, they may need to fix that at that CMS level.
Asking someone to check the output of dynamically generated HTML is stupid.
Seguer on
0
L Ron HowardThe duckMinnesotaRegistered Userregular
edited December 2010
What I assumed they were talking about was the template HTML, not the final markup.
I don't have access to the templates, or anything of the sort, just the final HTML after it's been generated.
I don't have access to the CMS, and I really doubt anyone will give it to me, since I'm such a lowly cog on the giant clock, so to speak. I will certainly try, which was what I was planning since it was mentioned in this thread, but I really can't see it happening.
I'm ultimately just hoping there's something that already exists that will crawl through an entire website that can give me an organized list of data that I can use to extrapolate which IMG and UL tags are incorrect.
How would I set up IIS on my machine to run through and download everything? I didn't even know it had that capability. I've set it up as my own webserver before, and to play around in SharePoint, but I've never seen anything that says it will download all the HTML on a site.
I'm not entirely sure tbh. A guy I work with uses it to perform site crawls to check for errors but I've never used it for this task or anything else before. It has a site crawler in the SEO Toolkit. Try starting here.
Szechuanosaurus on
0
Seguerof the VoidSydney, AustraliaRegistered Userregular
edited January 2011
IMG tags are easy enough, as there will be plenty of things that will do that for you already. It's the ULs that are going to cause you grief
Seguer on
0
L Ron HowardThe duckMinnesotaRegistered Userregular
edited January 2011
Thanks guys for all your help.
As a side note, I downloaded the index to my computer, as it appears, and uploaded it to the validator.
118 errors and a metric ass ton of warnings. This is awesome.
I don't have access to the CMS, and I really doubt anyone will give it to me, since I'm such a lowly cog on the giant clock, so to speak. I will certainly try, which was what I was planning since it was mentioned in this thread, but I really can't see it happening.
I don't know why I keep wanting you to do this so bad but yea, Speak Up! If they are asking you to do a job like this where you need access then obviously you aren't a lowly cog as you or they would think.
Posts
Xenu link sleuth is possibly the tool you're looking for. I think it includes page data like title tags (nit sure about alt tags) in the excel export so at least you'll get a list of pages and their respective title tags. You can possibly also use Microsoft IIS to do this although I've never personally used it and it needs to be installed on an MS server to run which may be more hassle than this task is worth.
Also your username being L Ron Hubbard leads to delicious irony if xenu is the solution.
I'll give Xenu a shot. I mean, he has his own cult too, right?
You can have TITLE attributes on imgs and ALT on text, but browsers don't render them if they're not exactly. I think it's a W3 convention to do it that way.
Either way, I need to scour an entire site and make sure that all P tags, or whatever have the TITLE attribute, and all IMG tags have an ALT attribute.
Since the title attribute is optional except for the title tag in the head, how come they want you to check those may I ask? Are you checking specific tags like all p tags or table tags? Or is it that some portions of html need to have the title attribute for it to function properly? Something doesn't sound right about needing to do this if you don't mind me saying. It's my perception that professional websites don't use title attributes on all tags of certain kinds. It's only used optionally with specific sets of menu links or something.
The site I'm working on isn't publicly accessible, so I can't really run the W3 validator on it, unless I manually download each page. I don't have access to the HTML, I'm only really an end user, which is what's making it harder than it probably should be. And considering most of the site is dynamic, all I think I would be able to do is upload the template, and hope that that's W3 valid, because I'm quite certain the dynamic stuff isn't even close.
As for what has or needs the TITLE attribute, I'm at home, so I don't have access it, so this is off of memory. There are DIVs that are Xpx wide and Ypx tall. There is a bulleted <UL...> list there. So I believe all the UL tags have titles, but only the ones which get cut off and have the ellipses shown have the title attributes appear when the mouse is hovered over it. I know there are other examples, but I can't really think of any right now.
So the issue that I'm really trying to solve is that when it was coded, since so many people have touched it, there's no consistency. Not all of the IMGs have ALT attributes, and not all of the UL have a TITLE attribute. Of course, all of the noticeable ones have the correct ALT and TITLE attributes accordingly, but that's not a guarantee for the rest of the site.
Does all of this make sense?
They should know that you absolutely can't do your job without having access to the html pages. I suppose a working version of the site can't be put on the internet temporarily for you to work with due to security or some kind of concern. And then you'd have to make sure any new content they are working on separately is following the correct coding practice and you'd be able to place the old site files back correctly. The other choice then being at least a computer with internet access and access to all the html files.
Unfortunately I'm sure there could have been a consistent way to code the title attributes within the ul lists but they allowed something more manual and up to the user it sounds like. If they don't have a better way to this with new content already I hope you'll speak up about this problem.
Do people update the site with a content management system?
You'll still have to find a way to validate all that HTML though. TBH something like mechanical Turk might have been the most cost effective and efficient option - get a swarm of humans verifying the information page by page - but I guess that's out if you can't publicly access the site.
I'm a lowly cog on this whole machine, so there's no way they'd release the HTML to me.
I'm honestly not sure how the information is generated. I'm speculating that it's some sort of archaic system where someone goes in by hand. Or a very rudimentary system. There are too many inconsistencies for it to be done by a machine with required fields. Hence the reason why some ULs have TITLE attributes and some don't, and some IMGs have correct ALT attributes, and some have "" while others have "alternate_image."
And correct me if I'm wrong, but if the code were genuinely W3 valid, it should, theoretically, render the same across all browsers, right? I know the CSS and JavaScript are really horrid, but I don't think the HTML is any better, judging by the few glimpses I've taken at the stuff that's on the website. I don't really think it's even a thought on anyone's mind to make it valid. Just valid enough.
How would I set up IIS on my machine to run through and download everything? I didn't even know it had that capability. I've set it up as my own webserver before, and to play around in SharePoint, but I've never seen anything that says it will download all the HTML on a site.
Really all pages on the site should be be validated against one doctype that they decide to shoot for like XHTML 1.0 Transitional or HTML 4.01 Transitional. Hopefully they are using all external CSS. The JavaScript should be checked against strict options.
And again I'd recommend bringing up suggestions to the appropriate people to make the site better, even if the site is not that integral or important to the company. Maybe those in charge don't understand. Asking you to do a couple things like this is a patch but if the entire site is coded in this way there's going to be tons of other fixes to be made. Any effort now to help the situation will save like 10x the effort down the road.
There are firefox plugins that will use the W3C validation engine through the browser. So if you can browse to the site from your pc, that would be one way to do it.
This is something you'll need a script for, access to the template that generates the html (note: you've mentioned that you can't get access to the html - this is incorrect because if you can view the page, you have the html), or to do manually.
The best solution here would be to get access to the CMS/template, and figure out how the ellipsis is added, and thus how the title attribute is added. If there's an inconsistency, they may need to fix that at that CMS level.
Asking someone to check the output of dynamically generated HTML is stupid.
I don't have access to the templates, or anything of the sort, just the final HTML after it's been generated.
I don't have access to the CMS, and I really doubt anyone will give it to me, since I'm such a lowly cog on the giant clock, so to speak. I will certainly try, which was what I was planning since it was mentioned in this thread, but I really can't see it happening.
I'm ultimately just hoping there's something that already exists that will crawl through an entire website that can give me an organized list of data that I can use to extrapolate which IMG and UL tags are incorrect.
I'm not entirely sure tbh. A guy I work with uses it to perform site crawls to check for errors but I've never used it for this task or anything else before. It has a site crawler in the SEO Toolkit. Try starting here.
As a side note, I downloaded the index to my computer, as it appears, and uploaded it to the validator.
118 errors and a metric ass ton of warnings. This is awesome.
I don't know why I keep wanting you to do this so bad but yea, Speak Up! If they are asking you to do a job like this where you need access then obviously you aren't a lowly cog as you or they would think.