[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Finding end of sentence[ was Re: Understanding ... Sentence Boundari
From: |
Eric Abrahamsen |
Subject: |
Re: Finding end of sentence[ was Re: Understanding ... Sentence Boundaries] |
Date: |
Thu, 13 Dec 2012 12:27:45 +0800 |
User-agent: |
Gnus/5.130006 (Ma Gnus v0.6) Emacs/24.2 (gnu/linux) |
ken <gebser@mousecar.com> writes:
> On 12/12/2012 02:02 AM Eric Abrahamsen wrote:
>> ken<gebser@mousecar.com> writes:
>>
>>> On 12/11/2012 07:03 AM Eric Abrahamsen wrote:
>>>> ken<gebser@mousecar.com> writes:
>>>>
>>>>> On 06/26/2010 11:05 PM Deniz Dogan wrote:
>>>>>> 2010/6/27 ken<gebser@mousecar.com>:
>>>>>>>
>>>>>>> On 06/26/2010 06:53 AM Paul Drummond wrote:
>>>>>>>> Thanks for the responses guys.
>>>>>>>>
>>>>>>>> ....
>>>>>>>>
>>>>>>> Is it possible to specify word boundaries for a particular mode?
>>>>>>>
>>>>>>
>>>>>> Yes, it's part of the syntax table. See e.g. `modify-syntax-entry'.
>>>>>
>>>>> Thanks for the pointer to that function.
>>>>>
>>>>> The behavior I see in need of repair is the role of so-called "comments"
>>>>> in sentence syntax.</tag> For instance, immediately before this
>>>>> sentence are two spaces... which should signify the end of the
>>>>> previous sentence. But functions like "forward-sentence" and
>>>>> "fill-paragraph" and "backward-sentence" don't recognize it.
>>>>>
>>>>> Said another way, the "</tag>" string obscures the relationship
>>>>> between the period before it and the two spaces after it and so fails
>>>>> to see that one sentence ends and another starts. This occurs in
>>>>> text-mode and seems to be inherited by other modes.
>>>>>
>>>>> If I'm reading "modify-syntax-entry" correctly, the default meanings
>>>>> of '<' and'>' are, respectively, beginning and end of comment, so
>>>>> modifying them wouldn't fix this problem. Or can this be remedied by
>>>>> a change in the syntax table? Or is this a bug?
>>>>
>>>> For this particular case, I think you can modify the value of the
>>>> `sentence-end' variable (which is returned by the `sentence-end'
>>>> function? The whole thing is a little confusing). You'd probably be best
>>>> off starting with the docstring for the sentence-end function, and
>>>> working back from there.
>>>>
>>>> I think the `sentence-end' variable is automatically buffer-local, which
>>>> means if you change it in a mode-hook it ought to work the way you want.
>>>> I agree that the whole syntax thing feels like a very well-polished
>>>> hack.
>>>>
>>>> E
>>>
>>> Eric,
>>>
>>> Yes, that would be the variable to adjust. I took a hard look at it
>>> and discussed it (I believe) on this list years ago, but never came up
>>> with a fix. As I see it, there are two problems:
>>>
>>> First, "one" of the items in that RE would need to be "zero or more
>>> consecutive instances of '<' followed by any number of other
>>> characters up until the next '>' is found." E.g., the RE would need
>>> to be able to find the end of this
>>> sentence</b></i>.)</q></p></span></div> Though I've used REs
>>> successfully in quite a few instances and so with a small bit of help
>>> could probably figure that part out, there's a second issue.
>>>
>
> [In my original post the paragraph below was unclear. So changed it.]
>
>>> My considered opinion is that in the above and similar examples, the
>>> end of the sentence is immediately after the period ('.')... or
>>> question mark, exclamation mark, etc. and not after the</div>. That
>>> is where the point should go when forward-sentence is executed. This
>>> means that no RE would work because, once it finds the RE-defined
>>> sentence-end, it then needs to go backwards within the found string
>>> until it encounters [.!?]+ and then move the mark one char forward to the
>>> character after. IOW, unless I'm missing some capability of REs,
>>> "sentence-end" needs to be a function rather than an RE and would be a
>>> different function than one which finds the beginning of a sentence.
>>
>> I'm getting way out of my depth here, both regarding regexps and emacs'
>> sentence-related shenanigans, but you could consider advising the
>> `sentence-end' function so that it checks current the major mode, and
>> delegates to a different sentence-end function depending on the mode (or
>> declines to handle and bails to the built-in sentence-end).
>>
>> The individual mode-specific sentence-end functions look at the text
>> after point, and return a different regexp every time, one specifically
>> tailored to this particular sentence in this particular mode. The call to
>> `forward-sentence' or whatever happily uses a different regexp every
>> time it is called.
>>
>> Feels hacky, but I guess `sentence-end' is already doing this in a
>> sense -- potentially returning a different regexp every time.
>>
>> My brain is exhausted!
>>
>> E
>
> If one were to write a mode-specific replacement for the existing
> "forward-sentence" and "sentence-end", what are some ways in elisp to
> ensure that they're invoked when working in that mode? Would it be
> enough to include (the recoded) "forward-sentence" and "sentence-end"
> in the code for that mode...? or would some kind of specific hook
> language need to be included in ~/.emacs?
I was considering overloading the `sentence-end' function in a
mode-hook, but I think it's highly likely that you'd end up polluting
other modes. So probably the safest thing to do is to advise it at the
top level, ie in your ~/.emacs file, and then check current mode from
there. Something like the following totally untested code:
--8<---------------cut here---------------start------------->8---
(defadvice sentence-end (before my-check-sentence-end activate)
"Possibly short-circuit the `sentence-end' function."
(cond ((derived-mode-p 'emacs-lisp-mode)
(emacs-lisp-sentence-end))
((derived-mode-p 'some-other-mode)
(other-mode-sentence-end))
(t ad-do-it)))
(defun emacs-lisp-sentence-end ()
;; examine text around point and return an appropriate regexp
)
(defun other-mode-sentence-end ()
;; return a different regexp
)
--8<---------------cut here---------------end--------------->8---
That ought to work, but I'm not guaranteeing that this is the best
approach!
E