emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Tree sitter: issue embedding HTML, CSS, JavaScript within a new php-


From: Yuan Fu
Subject: Re: Tree sitter: issue embedding HTML, CSS, JavaScript within a new php-ts-mode
Date: Thu, 9 Feb 2023 21:45:50 -0800

Hey Simon,

Thanks for trying this out! Feedbacks like this are very welcome.

> On Feb 9, 2023, at 4:45 AM, Simon Pugnet <simon@polaris64.net> wrote:
> 
> Dear Emacs maintainers,
> 
> I have recently started work on a PHP tree sitter major mode. Things are 
> going well so far, however I'm having trouble with embedding multiple 
> languages in the PHP buffer.
> 
> In case you're not familiar with PHP, here's a quick example (I'm using 
> org-mode mark-up in this message which hopefully will help): -
> 
> #+begin_src php
> <html lang="en-gb">
>   <head>
>     <style type="text/css">
>      body {
>        background: url("/background.png");
>        color: #ff0000;
>      }
>     </style>
>   </head>
> 
>   <body>
>     <?php
>     $a = [1, 2, "3", 4.5];
>     if (is_array($a)) {
>       echo "$a is an array";
>     } else {
>       echo "$a is not an array";
>     }
>     ?>
> 
>     <div id="my-div">
>       <h1>This is a test</h1>
>     </div>
> 
>     <script type="text/javascript">
>      const div = document.getElementById('my-div');
>      // This is a JS comment
>      /* This too */
>      console.log("my-div is:", div);
>     </script>
> 
>     <?php // Some more PHP here ?>
> 
>   </body>
> </html>
> #+end_src
> 
> As you can see, PHP code is usually encapsulated within a HTML document, with 
> PHP code enclosed within ~<?php ... ?>~ blocks.
> 
> The first block of HTML from the beginning of the buffer to the first ~<?php~ 
> is enclosed within a ~(program (text))~ node. The second (after ~?>~ and 
> before the second ~<?php~) is enclosed within a ~(text_interpolation (text))~ 
> node. I have therefore defined the following ~treesit-range-settings~ in my 
> major mode: -
> 
> #+begin_src emacs-lisp
> (setq-local treesit-range-settings
>             (treesit-range-rules
>              :embed 'html
>              :host 'php
>              '((program (text) @capture)
>                (text_interpolation (text) @capture))))
> #+end_src
> 
> This seems to work however when I evaluate ~(treesit-language-at (point))~ 
> anywhere in this buffer I get ='html= in response. This is of course expected 
> within a HTML region, but not within a PHP region. Despite this, the 
> font-locking I have defined for PHP appears to work correctly. I have also 
> defined a custom face and applied it via font-locking to the above two nodes 
> to confirm that those regions are indeed enclosed as expected and they are.
> 
> My hope eventually is to use the following ~treesit-range-settings~: -
> 
> #+begin_src emacs-lisp
> (setq-local treesit-range-settings
>                 (treesit-range-rules
>                  :embed 'html
>                  :host 'php
>                  '((program (text) @capture)
>                    (text_interpolation (text) @capture))
> 
>                  :embed 'css
>                  :host 'html
>                  '((style_element (raw_text) @capture))
> 
>                  :embed 'typescript
>                  :host 'html
>                  '((script_element (raw_text) @capture))))
> #+end_src
> 
> As well as defining these rules, I require =css-mode= and 
> =typescript-ts-mode= and append their own font-locking rules to my own. My 
> hope is that this will allow CSS and JavaScript embedded within HTML regions 
> to be font-locked according to those separate major modes too. This appears 
> to work for simple files but does not work reliably for more complex files. 
> Also when using the above I get ='typescript= whenever I evaluate 
> ~(treesit-language-at (point))~. I'm not sure if this is just a bug with the 
> language grammars that I'm using or if perhaps because I'm not using the 
> treesit library correctly. Because of the issue with ~treesit-language-at~ 
> above I'm concerned that it's the latter.
> 
> So my questions are: -
> 
> 1. Based on my rules for embedding ='html= within ='php= above, should I 
> expect ~(treesit-language-at (point))~ to return ='php= when the point is 
> within a PHP region?

Because we don’t have much experience with tree-sitter and its interfaces, I 
made treesit-language-at simply delegate work to 
treesit-language-at-point-function, which can be an arbitrary function, giving 
developers maximum flexibility. You need to set that variables to a function, 
otherwise treesit-language-at simply returns the first parser in the parser 
list. 

> 2. Is my goal of embedding HTML within PHP, then embedding CSS and 
> JavaScript/TypeScript within HTML feasible and if so am I going about this in 
> the right way?

It should be. Although I didn’t thought of having multiple layers of embedded 
language (in this case PHP embedding HTML embedding CSS/Javascript), if you 
order the entries in treesit-range-rules like you do now (outer most host 
language, then embedded language, then embedded embedded language), it should 
work. Try setting treesit-language-at-point-function and it should work right. 
If not… then we need to look into it.

Yuan




reply via email to

[Prev in Thread] Current Thread [Next in Thread]