[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Tree sitter: issue embedding HTML, CSS, JavaScript within a new php-
From: |
Yuan Fu |
Subject: |
Re: Tree sitter: issue embedding HTML, CSS, JavaScript within a new php-ts-mode |
Date: |
Thu, 9 Feb 2023 21:45:50 -0800 |
Hey Simon,
Thanks for trying this out! Feedbacks like this are very welcome.
> On Feb 9, 2023, at 4:45 AM, Simon Pugnet <simon@polaris64.net> wrote:
>
> Dear Emacs maintainers,
>
> I have recently started work on a PHP tree sitter major mode. Things are
> going well so far, however I'm having trouble with embedding multiple
> languages in the PHP buffer.
>
> In case you're not familiar with PHP, here's a quick example (I'm using
> org-mode mark-up in this message which hopefully will help): -
>
> #+begin_src php
> <html lang="en-gb">
> <head>
> <style type="text/css">
> body {
> background: url("/background.png");
> color: #ff0000;
> }
> </style>
> </head>
>
> <body>
> <?php
> $a = [1, 2, "3", 4.5];
> if (is_array($a)) {
> echo "$a is an array";
> } else {
> echo "$a is not an array";
> }
> ?>
>
> <div id="my-div">
> <h1>This is a test</h1>
> </div>
>
> <script type="text/javascript">
> const div = document.getElementById('my-div');
> // This is a JS comment
> /* This too */
> console.log("my-div is:", div);
> </script>
>
> <?php // Some more PHP here ?>
>
> </body>
> </html>
> #+end_src
>
> As you can see, PHP code is usually encapsulated within a HTML document, with
> PHP code enclosed within ~<?php ... ?>~ blocks.
>
> The first block of HTML from the beginning of the buffer to the first ~<?php~
> is enclosed within a ~(program (text))~ node. The second (after ~?>~ and
> before the second ~<?php~) is enclosed within a ~(text_interpolation (text))~
> node. I have therefore defined the following ~treesit-range-settings~ in my
> major mode: -
>
> #+begin_src emacs-lisp
> (setq-local treesit-range-settings
> (treesit-range-rules
> :embed 'html
> :host 'php
> '((program (text) @capture)
> (text_interpolation (text) @capture))))
> #+end_src
>
> This seems to work however when I evaluate ~(treesit-language-at (point))~
> anywhere in this buffer I get ='html= in response. This is of course expected
> within a HTML region, but not within a PHP region. Despite this, the
> font-locking I have defined for PHP appears to work correctly. I have also
> defined a custom face and applied it via font-locking to the above two nodes
> to confirm that those regions are indeed enclosed as expected and they are.
>
> My hope eventually is to use the following ~treesit-range-settings~: -
>
> #+begin_src emacs-lisp
> (setq-local treesit-range-settings
> (treesit-range-rules
> :embed 'html
> :host 'php
> '((program (text) @capture)
> (text_interpolation (text) @capture))
>
> :embed 'css
> :host 'html
> '((style_element (raw_text) @capture))
>
> :embed 'typescript
> :host 'html
> '((script_element (raw_text) @capture))))
> #+end_src
>
> As well as defining these rules, I require =css-mode= and
> =typescript-ts-mode= and append their own font-locking rules to my own. My
> hope is that this will allow CSS and JavaScript embedded within HTML regions
> to be font-locked according to those separate major modes too. This appears
> to work for simple files but does not work reliably for more complex files.
> Also when using the above I get ='typescript= whenever I evaluate
> ~(treesit-language-at (point))~. I'm not sure if this is just a bug with the
> language grammars that I'm using or if perhaps because I'm not using the
> treesit library correctly. Because of the issue with ~treesit-language-at~
> above I'm concerned that it's the latter.
>
> So my questions are: -
>
> 1. Based on my rules for embedding ='html= within ='php= above, should I
> expect ~(treesit-language-at (point))~ to return ='php= when the point is
> within a PHP region?
Because we don’t have much experience with tree-sitter and its interfaces, I
made treesit-language-at simply delegate work to
treesit-language-at-point-function, which can be an arbitrary function, giving
developers maximum flexibility. You need to set that variables to a function,
otherwise treesit-language-at simply returns the first parser in the parser
list.
> 2. Is my goal of embedding HTML within PHP, then embedding CSS and
> JavaScript/TypeScript within HTML feasible and if so am I going about this in
> the right way?
It should be. Although I didn’t thought of having multiple layers of embedded
language (in this case PHP embedding HTML embedding CSS/Javascript), if you
order the entries in treesit-range-rules like you do now (outer most host
language, then embedded language, then embedded embedded language), it should
work. Try setting treesit-language-at-point-function and it should work right.
If not… then we need to look into it.
Yuan