We use cookies to improve your experience. No personal information is gathered and we don't serve ads. Cookies Policy.

ExpressionEngine Logo ExpressionEngine
Features Pricing Support Find A Developer
Partners Upgrades
Blog Add-Ons Learn
Docs Forums University
Log In or Sign Up
Log In Sign Up
ExpressionEngine Logo
Features Pro new Support Find A Developer
Partners Upgrades
Blog Add-Ons Learn
Docs Forums University Blog
  • Home
  • Forums

EE search should omit content in standard HTML comments

Feature Requests

Paul Bailey's avatar
Paul Bailey
63 posts
8 years ago
Paul Bailey's avatar Paul Bailey

This is a limitation I’ve been aware of in EE2 for a while, and I think it’s been carried forward into EE3. It’s arguably a bug, but I’m bringing it up here as a feature request.

EE2/3 include content inside standard HTML comment markers in any simple search results. This seems inappropriate. I (and other users of EE in the system I manage) regularly use HTML comments to temporarily hide blocks of content, for various reasons. This content should not be searched by EE as part of public-facing search functionality. I’m hoping that behaviour can be altered. Thanks.

       
Jeremy S.'s avatar
Jeremy S.
353 posts
8 years ago
Jeremy S.'s avatar Jeremy S.

Have you tried the ee comment tag?

{!-- --}
       
Paul Bailey's avatar
Paul Bailey
63 posts
8 years ago
Paul Bailey's avatar Paul Bailey

Jeremy:

Have you tried the ee comment tag?

Thanks, it’s a good suggestion. I’ve just tried it, though, and I get the same behaviour (in EE3.5.3): content doesn’t appear on the rendered page (as expected), but the page is returned as a positive hit in simple search results. (NB: I’m talking about content in channel entries here, rather than in templates.)

But that’s sort of not the point in any event. For example, I work with a number of non-technical people who have access to channel content. Helping them to use standard HTML is more than enough.

There’s no argument that I can see why content inside HTML comment tags (or EE comment tags) should result in a positive hit in EE simple search, which is public-facing functionality: i) user submits a search; ii) yay, finds a hit; iii) goes to page; iv) scours page for relevant text; v) is confused.

       
Derek Jones's avatar
Derek Jones
7,561 posts
8 years ago
Derek Jones's avatar Derek Jones

Content is content, and is designed to be separate from presentation so if it’s in a searchable field, it’s going to be searchable. ExpressionEngine doesn’t know how you’ll present the content, or what your intent is for anything in the content. I can even see someone arguing that the current behavior is an effective way to help make sure entries bubble up in search results for related content that might not match the words used, though it is also not one I’d recommend.

Can you explain why you are having content authors add HTML comments to content in the first place? I’d like to have an idea of what led to this solution for you, maybe there’s something else the app can do that will serve your authors better.

       
Paul Bailey's avatar
Paul Bailey
63 posts
8 years ago
Paul Bailey's avatar Paul Bailey

Derek:

Content is content,

Except it isn’t really, because there are different audiences. For someone working on the back-end, ‘content’ is everything in the content field, so it’s obviously right that searches in the back-end shouldn’t make a distinction. But for a public user, ‘content’ is what’s on the screen. They don’t care about — and shouldn’t have things confused by — stuff that only exists in the back-end. Simple search in EE is a public-facing search, so what it sees as ‘content’ should relate to what’s public.

(Obviously I accept that what appears on the screen can be more complicated than that, but use of an HTML comment tag is a pretty unambiguous statement that whatever it contains shouldn’t be shown — shouldn’t be considered part of the page content.)

and is designed to be separate from presentation so if it’s in a searchable field, it’s going to be searchable.

Sure, and the content/presentation distinction is a good one, obviously, but the distinction between channel entry and template (for example), isn’t the only relevant content/presentation distinction. HTML tags are also presentation, not content, and shouldn’t result in a positive hit from a public-facing search.

I was actually going to argue that you wouldn’t get a positive hit from searching on other HTML tags, so I tried BLOCKQUOTE as an example, and that did in fact bring up lots of hits. That’s consistent, I suppose, but really doesn’t make sense to my mind. This is a content/presentation issue. HTML tags — including comment tags — are presentation, not content.

ExpressionEngine doesn’t know how you’ll present the content, or what your intent is for anything in the content. I can even see someone arguing that the current behavior is an effective way to help make sure entries bubble up in search results for related content that might not match the words used, though it is also not one I’d recommend.

Using HTML comments as a sort of meta-data? Honestly, that’s quite a stretch. It’s really hard to imagine a typical user seeing that as intuitive behaviour. The point here should be what’s sensible, intitutive behaviour for the function.

Can you explain why you are having content authors add HTML comments to content in the first place? I’d like to have an idea of what led to this solution for you, maybe there’s something else the app can do that will serve your authors better.

A practical example from this week. I’ve been aware of comments being searchable for a while, but this example reminded me and pushed me to try to resolve it.

The main site I manage is for a university department, and it contains a lot of information about various degree programs. We’re in the process of setting up a new program, and the content was ready to go, and in fact live. But the program has been put on hold for a while, which entailed taking the content which refers to it out out of public view until it’s ready. As well as whole pages that have been closed, there’s a lot of information on other pages which have mixed content, so making it private has to be more targeted. HTML comments are an excellent way to do that: the information can remain in the page content, in situ, where it can be seen in context in the back end by everyone relevant, but kept from public view. It’s a quick, effective, natural solution. Removing the content, storing it elsewhere, and then replacing it when it’s ready would be way more complicated. But it can’t be searchable in its current state, so that would be the only option I can see.

Someone I work with, who is broadly familiar with HTML, but not a programmer, routinely uses HTML comments to hide content which is needed from time to time, such as information about seasonal events. To him, it’s a very intitutive way to take content from public view quickly, easily, and temporarily, which will be used again. He rightly expects that the content won’t appear on the site. I think he’d be surprised to learn that this content is being searched, and I think he’d be right to be surprised. This isn’t sensible, intuitive default behaviour.

       
JT Thompson's avatar
JT Thompson
745 posts
8 years ago
JT Thompson's avatar JT Thompson

To do what you are asking would require very intense regx code to accomplish, and there are many - many ‘gotchas’ in doing them. I would suggest you use a plugin for the advanced searching you desire, that is way too much burden and load on the CMS to do as a normal thing. You’re asking for allot of pre-proccessing before outputting to the user - too much imo.

I would be against it. 😊

       
Paul Bailey's avatar
Paul Bailey
63 posts
8 years ago
Paul Bailey's avatar Paul Bailey
To do what you are asking would require very intense regx code to accomplish, and there are many - many ‘gotchas’ in doing them. I would suggest you use a plugin for the advanced searching you desire, that is way to much burden and load on the CMS to do as a normal thing. You’re asking for allot of pre-proccessing before outputting to the user - too much imo.

Simple search in EE is simple, obviously. I wouldn’t ask for or expect something that was massively elaborate or bullet-proof in every possible situation. An option (even off by default, in case there are performance concerns) to run content through strip_tags before searching would be a big help.

       
JT Thompson's avatar
JT Thompson
745 posts
8 years ago
JT Thompson's avatar JT Thompson

Take a look at the EE3 plugins for this at Devot:ee ( https://devot-ee.com/search/results?keywords=search&collection=addons&addon_version_support=ee3 ), notice there are 2 pages of addons designed for search. We have used several of them over the years and I don’t recall one that does exactly what you want - however you may find one that is close that you can just ‘hack’ (lol - professionally modify) to fit your needs.

An addon would already hook into EE for you and do most of the heavy lifting - all you’d have to do is put in the regex’s (and a few php functions) to do what you want - and much of that code is already freely available (google is your friend).

Hope that helps.

Low_Search comes close - but they are not honoring the EE 50 percent discount (they may if you bug them 😊).

Not a bad idea - fyi, just not really something I’d want in the EE CMS - I’m having flashes of a bug crashing the entire output of a site - but then again I am paranoid. 😉

       
Paul Bailey's avatar
Paul Bailey
63 posts
8 years ago
Paul Bailey's avatar Paul Bailey
Not a bad idea - fyi, just not really something I’d want in the EE CMS - I’m having flashes of a bug crashing the entire output of a site - but then again I am paranoid.

Okay, thanks. My own paranoia is about overloading a site with 3rd-party add-ons and the fragility that can create. I’m a big fan of sticking with 1st-party where possible, especially for basic functionality, and this does feel like basic functionality to me. Imagine some sort of smiley at this point.

       
JT Thompson's avatar
JT Thompson
745 posts
8 years ago
JT Thompson's avatar JT Thompson
My own paranoia is about overloading a site with 3rd-party add-ons and the fragility that can create

True enough, however, that only effects you and not the entire community. 😉

       
JT Thompson's avatar
JT Thompson
745 posts
8 years ago
JT Thompson's avatar JT Thompson

And just a quick perspective from someone who has been professionally developing for EE since it’s inception. We’ve never come accross a client who ever had the issue you describe, it may not seem like it to you, but really you are an outlier in this regard.

We’ve always found the search addons to fit a clients need when they needed more search capability.

We do however take your point seriously about overloading addons into a system - we do our very best to avoid their use when we can - we’ll just write the code in straight PHP in a template if needed (did you know you can just about write the full functionality of an addon in the templates if you know what you’re doing). We will even write our own addon (usually to prevent the purchase of several addons - and special client features needed of course), not because of the cost of an addon - but because of the stability.

       
Derek Jones's avatar
Derek Jones
7,561 posts
8 years ago
Derek Jones's avatar Derek Jones

Thanks for the explanation, paulbailey. I guess what I mean is that ideally your content is chunked so that it doesn’t contain HTML (or much of it) in the first place. I do not disagree with you that there is no end-user benefit to being able to search presentational tags. The technical hurdles are not insurmountable, but it’s not a small task. It would require a pre-processor, lexer, and additional data storage for tag-free versions of content.

That, or some similar solution, might be in the cards down the road, but it’s not happening right away, and in the meantime there are simple changes you can make to how you’re dealing with this content that don’t require authors to use HTML comments. Optional fields, Grid with toggles, chunking your content into smaller bits and using statuses to show/hide, or relationships to enable/disable associations, or even an add-on like Bloqs for instance. Cheers.

       
Paul Bailey's avatar
Paul Bailey
63 posts
8 years ago
Paul Bailey's avatar Paul Bailey

Derek:

Thanks for the explanation, paulbailey. I guess what I mean is that ideally your content is chunked so that it doesn’t contain HTML (or much of it) in the first place.

Ideally, yes. At least in my real world, there are some good organisational reasons for having provisional or temporary content held alongside live content.

I do not disagree with you that there is no end-user benefit to being able to search presentational tags. The technical hurdles are not insurmountable, but it’s not a small task. It would require a pre-processor, lexer, and additional data storage for tag-free versions of content. That, or some similar solution, might be in the cards down the road, but it’s not happening right away,

I appreciate you considering it — and also appreciate that solutions that seem simple from a user perspective often need a lot of back-end engineering.

and in the meantime there are simple changes you can make to how you’re dealing with this content that don’t require authors to use HTML comments. Optional fields, Grid with toggles, chunking your content into smaller bits and using statuses to show/hide, or relationships to enable/disable associations, or even an add-on like Bloqs for instance. Cheers.

Being a big fan of simple, I might well just add an additional field that is neither displayed nor searched, and which can be used as a repository for non-live material. It loses the advantage of being able to work with the material in its intended place, but it’s a practical solution.

Thanks.

Paul

       

Reply

Sign In To Reply

ExpressionEngine Home Features Pro Contact Version Support
Learn Docs University Forums
Resources Support Add-Ons Partners Blog
Privacy Terms Trademark Use License

Packet Tide owns and develops ExpressionEngine. © Packet Tide, All Rights Reserved.