WebProWorld Part of WebProNews.com
Page One Link To Us Edit Profile Private Messages Archives FAQ RSS Feeds  
 

Go Back   WebProWorld > Webmaster, IT and Security Discussion > Web Programming Discussion Forum
Subscribe to the Newsletter FREE!


Register FAQ Members List Calendar Arcade Chatbox Mark Forums Read

Web Programming Discussion Forum Working with an API? Developing a plugin? Writing a Mod or script for your favorite blog, Web 2.0 site or Forum? Welcome.

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 08-08-2007, 01:43 PM
WebProWorld Member
 

Join Date: Nov 2006
Posts: 72
dak888 RepRank 0
Default Javascript and those naughty characters...

Ah, yea....

Anyways, I'm trying to strip our bad characters from a block of text with javascript. This seems to work sometimes but not others and I think it has to do with me not stripping out all the bad characters.

Of course, I could be completely wrong.

Here is my code so far, which truncates the resulting text:

<script type="text/JavaScript">

var outputstr = "!---TEXT---"
var textpreview = outputstr.replace(/[^a-zA-Z 0-9]+/g,'')

if (textpreview.length > 50)
{
document.write(textpreview.substr(0,50) + "&nbsp;...")
}
else
{
document.write(textpreview)
}
</script>

The !---TEXT--- is a text file created by our shopping cart and holds product information. I thought I was only returning A-Z and 0-9 with my regex but I'm not sure if I'm doing it correctly.

Now, I said it work sometimes and not with others. The major difference I see with the ones that work and the ones that don't is that there are quotes "" in the text or there may be some html in the text.

A. Would this cause the javascript to break?
B. If so, what can I do?
C. Am I completely wrong about why it's not working?
D. Just give up this javascript stuff because I don't know what the hell I'm doing.

Thanks in advance,

DaK
Reply With Quote
  #2 (permalink)  
Old 08-08-2007, 03:08 PM
wige's Avatar
Moderator
WebProWorld Moderator
 

Join Date: Jun 2006
Location: United States
Posts: 1,722
wige RepRank 4wige RepRank 4wige RepRank 4wige RepRank 4
Default Re: Javascript and those naughty characters...

Being client side, JavaScript should really never be used to do processing on data - different browsers have different JS implementations, and users could have JS turned off or be using a security product that changes the way JS works. What is going into the text file that you want to strip out? Bear in mind that a user can view anything the JS can see.
__________________
The best way to learn anything, is to question everything.
Interestingly Average Security Blog
Reply With Quote
  #3 (permalink)  
Old 08-08-2007, 04:35 PM
WebProWorld Member
 

Join Date: Nov 2006
Posts: 72
dak888 RepRank 0
Default Re: Javascript and those naughty characters...

Hey Wige,

I don't know if I have much of a choice other than to use javascript. The page is generated by our shopping cart perl/cgi and where I can't get to the source code or use PHP, this looks to be about my only choice unless someone else has a suggestion.

The text file holds a product description. However, (correct me if I'm wrong) I don't think that javascript can process the text file if there are funky characters in it (html, ", etc....). This is the problem, some of our product description are just one or two lines with not html or weird characters. These a processed fine with the script. It's the descriptions that have weird characters that are not showing up in the results.

It's part of a search results page but I don't want there to be tons of text in the results so I wan't to truncate the product description to only the first 50 characters.

DaK
Reply With Quote
  #4 (permalink)  
Old 08-08-2007, 05:11 PM
wige's Avatar
Moderator
WebProWorld Moderator
 

Join Date: Jun 2006
Location: United States
Posts: 1,722
wige RepRank 4wige RepRank 4wige RepRank 4wige RepRank 4
Default Re: Javascript and those naughty characters...

I take it from the description the javascript is opening the text file containing the data, then processing line by line, filtering each result?

If you know enough PERL or PHP and are allowed to run scripts, I would suggest creating a server side script that opens the file and does the filtering for you, and passes the processed data to to the javascript. That way you can use the more comprehensive filtering abilities of PERL and leave the browser with less of a workload, and eliminate many client-side issues. In that case you would simply point the javascript to the new server side script.
__________________
The best way to learn anything, is to question everything.
Interestingly Average Security Blog
Reply With Quote
  #5 (permalink)  
Old 08-08-2007, 08:09 PM
WebProWorld New Member
 

Join Date: Apr 2005
Posts: 19
lanthus RepRank 0
Default Re: Javascript and those naughty characters...

More questions than questions there--

1. You want to allow the {space}?-- You said only "A-Z and 0-9"

2. Do you want to not-include nbsp?-- which get generated automatically in some html processing.

3. Do you later .toUpperCase() it?-- in which case NBSP in-caps fails...?

4. Would /[^\w ]+/ be simpler?-- except of course you may not want the "_" of \w

5. Would you want to convert unusable characters to space?

6. Do you need trim to single-spacing?

7. Why not simply, write textpreview.replace(/^(.{50}).+/,'$1...')

8. If you're seeing html, look for .innerHTML, htmlText in lieu of .innerText, text, data,...

9. NB. document.selection.createRange().text has empty-cells-of-zero-length for BR's ... like weapons of mass destruction they can be elusive in textonly.

10. [thinking... script type="text/JavaScript" might be choosing an old-version of javascript...]

11. [thinking... if you use RegExp.$1 you need make sure it matched something, else RegExp.$1 is old data from the last match... and would be anything]

12. "&nbsp;..." should be "..." without the space because it may land in the middl... (And &hellip; is one-character for that.)
__

PS. Here's a ms-bug: Find-in-page "a &nbsp;b ™c" (when rendered) fails till you remove either the nbsp or the trademark ... I reported this to MS today....

Ray.
__________________
Mr. Raymond Kenneth Petry
Lanthus Corporation

Last edited by lanthus : 08-08-2007 at 09:06 PM.
Reply With Quote
  #6 (permalink)  
Old 08-09-2007, 10:32 AM
WebProWorld Member
 

Join Date: Nov 2006
Posts: 72
dak888 RepRank 0
Default Re: Javascript and those naughty characters...

Ok, you've given me a bit to chew on here...


1. You want to allow the {space}?-- You said only "A-Z and 0-9"

Yes, I want to allow the space. I basically just want to be left with text with no formatting or special characters. Although I should say I would like to keep the special characters like "", ; , etc... as that would make the text more understandable to the user. But I don't know if those are actually breaking the code.

2. Do you want to not-include nbsp?-- which get generated automatically in some html
processing.

I suppose I would want to include it since it may represent a space between two words.

3. Do you later .toUpperCase() it?-- in which case NBSP in-caps fails...?

I would want it to be "natural" casing. If the letter in the text file is upper case, then keep it that way, etc...

4. Would /[^\w ]+/ be simpler?-- except of course you may not want the "_" of \w

I'll have to read about that, reg expressions are new to me.

5. Would you want to convert unusable characters to space?

No, just remove them.

6. Do you need trim to single-spacing?

Yes.

7. Why not simply, write textpreview.replace(/^(.{50}).+/,'$1...')

I don't know. I'll try it and see what it does. Regular expressions are new to me.

8. If you're seeing html, look for .innerHTML, htmlText in lieu of .innerText, text, data,...

If I'm seeing html where? The text file that is being processed may have html in it but I don't want it in the search results.

9. NB. document.selection.createRange().text has empty-cells-of-zero-length for BR's ... like weapons of mass destruction they can be elusive in textonly.

I'll have to read about this. You pretty much spoke Chinese to me there. LOL.

10. [thinking... script type="text/JavaScript" might be choosing an old-version of javascript...]

How should I be declaring it?

11. [thinking... if you use RegExp.$1 you need make sure it matched something, else RegExp.$1 is old data from the last match... and would be anything]

You lost me here...

12. "&nbsp;..." should be "..." without the space because it may land in the middl... (And &hellip; is one-character for that.)

That can be easily corrected.

Thanks for all the questions? Hopefully you or someone can pin down my problem.

DaK
Reply With Quote
  #7 (permalink)  
Old 08-09-2007, 10:41 AM
WebProWorld Member
 

Join Date: Nov 2006
Posts: 72
dak888 RepRank 0
Default Re: Javascript and those naughty characters...

Here are some test you can run to maybe get a better understanding of what is going on.

Go to this URL:
Curtain Rods, Drapery Hardware, Blinds, Luxury Bedding, Static Cling Window Films and Tints

Just a test page and I know the top nav is broken...lol.

In the search box in the header, type "arizona" (without the quotes). You will see results how I would like them to appear. If you click on the "view product link" for the first item and you will see where it is pulling the text from. No special characters or formatting.

Hit your back button. Now type in "finial" (again, no quotes). Now you will see the text is not showing up as expected. Click the view product link. No html but there are some special characters.

Hit your back button. Now type in "etch art" (with the quotes). Again, no text. Click the view product button and you will see what text should be there. No special characters but there is html on the page.

Maybe this will shed some light on my situation.

Thanks everyone!

DaK
Reply With Quote
  #8 (permalink)  
Old 08-09-2007, 11:00 AM
wige's Avatar
Moderator
WebProWorld Moderator
 

Join Date: Jun 2006
Location: United States
Posts: 1,722
wige RepRank 4wige RepRank 4wige RepRank 4wige RepRank 4
Default Re: Javascript and those naughty characters...

I grabbed one of the search results from the finial search.

HTML Code:
<script type="text/JavaScript">
var outputstr = "4 1/4"H, 6 1/4"L, 4"P.  Sold as each. "
/*var textpreview = "4 1/4"H, 6 1/4"L, 4"P.  Sold as each. "*/
/*var textpreview = outputstr.replace(/[\/=;:.<>'&_,%`"@~#]/gi," ")*/
var textpreview = outputstr.replace(/[^a-zA-Z 0-9]+/g,'') 

if (textpreview.length > 50)
{
document.write(textpreview.substr(0,50) + "&nbsp;...")
}
else
{
document.write(textpreview)
}
</script>
How is the first line of the script generated? This is where the script breaks, before even reaching the regular expression.
__________________
The best way to learn anything, is to question everything.
Interestingly Average Security Blog
Reply With Quote
  #9 (permalink)  
Old 08-09-2007, 11:31 AM
WebProWorld Member
 

Join Date: Nov 2006
Posts: 72
dak888 RepRank 0
Default Re: Javascript and those naughty characters...

It's generated by the shopping cart. We use !---TEXT--- which is a tag used by the shopping cart to represent the text description for the product. The shopping cart is written in perl but we can't access the source code.

DaK
Reply With Quote
  #10 (permalink)  
Old 08-09-2007, 11:54 AM
wige's Avatar
Moderator
WebProWorld Moderator
 

Join Date: Jun 2006
Location: United States
Posts: 1,722
wige RepRank 4wige RepRank 4wige RepRank 4wige RepRank 4
Default Re: Javascript and those naughty characters...

I see. The shopping cart looks like it is expecting the output to simply be embedded in the page. I would check the support for that cart software to see if they have an alternate code (they may have something like !--SCRIPT-- for JavaScript for example) that sanitizes the code. In the meantime a possible workaround is to change the first line of the script from:
HTML Code:
var outputstr = "!--TEXT-- "
to:
HTML Code:
var outputstr = '!--TEXT--'
outputstr = outputstr.replace('"', '\\"')
Note the quotes. In the first line, you are using single quotes. In the second line you are enclosing a double quote in a single quote. I am not absolutely sure about how many backslashes you need in the second line, but worst case scenario you could use "&quot;".
__________________
The best way to learn anything, is to question everything.
Interestingly Average Security Blog
Reply With Quote
  #11 (permalink)  
Old 08-09-2007, 03:47 PM
WebProWorld Member
 

Join Date: Nov 2006
Posts: 72
dak888 RepRank 0
Default Re: Javascript and those naughty characters...

Wige!!!!!!

That did it! Thanks for taking the time to help, everything seems to be working now.

DaK
Reply With Quote
Reply

  WebProWorld > Webmaster, IT and Security Discussion > Web Programming Discussion Forum
Tags: , , ,



Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Similar Threads
Thread Thread Starter Forum Replies Last Post
Special Characters snowycat Other Engines/Directories 2 12-01-2006 06:37 AM
Naughty Black Hat Stuff! clivemcg Affiliate Marketing Discussion Forum 3 10-26-2005 11:51 AM
Your Logs May Show If You've Been Naughty WPW_Feedbot Search Engine Optimization Forum 0 05-02-2005 09:30 PM
Am I invisible or have I been naughty? purex Google Discussion Forum 7 04-20-2004 05:50 AM
strange characters anabella Google Discussion Forum 5 04-16-2004 07:07 PM


Search Engine Optimization by vBSEO 3.2.0