How to make the correct coloring of the code on "Habr" and why it is so difficult





A few months ago I published my first post on Habré. Perhaps some of you will notice that the code in the article is painted in an unusual way, and most importantly - colored correctly, despite the fact that the built-in text editor on the site does not support the original markup of the code and often highlights its elements incorrectly. At the same time, the code is not inserted by the picture, as some completely desperate writers do.



In my case, preserving the markup was especially important, since the article was a description of how to work on the code. To solve the problem, I created a tool that allows you to transfer code highlighting in the selected scheme from IDEA to an article on Habré. I will talk about the process of creating a tool and the features of its use.



Why all this



At first glance, it might seem that this is done out of mischief, simply because the standard illumination implemented through the <source>



tag does not suit.



In a way, this is, of course, true, but not quite.



Firstly, the highlighting inside the <source>



cannot work with pieces of code, since there will not be enough information for coloring. All elements declared outside the scope of the piece will be painted at random. This problem has no solution, since, as far as I know, none of the services for online coloring allows you to do one of the following:



  1. Paste the full project code into the article without displaying it in its entirety, or paste the link to the commit on GitHub. In specific places in the article, use line cuts from the full code (with a range). In this case, the backlight should be determined based on the full code, of course.
  2. Indicate explicit meta information for undefined elements. Quite a difficult way for the user, but I would agree to such a thing.


Secondly, the highlighting inside the <source>



never equals the number of different types of elements with a regular IDE. And because of the problem described above, making advanced coloring makes no sense: no one inserts the full project code into the article, so this functionality will not work.



At the same time, the reality is that pieces of code need to be inserted into articles, and the smaller they are, the better.



You can read the code without highlighting, but why.



Features of Habr



IntelliJ IDEA has built-in support for exporting code to HTML. Regular copying of the code puts on the clipboard, including colored code, which can be read as HTML.



Unfortunately, Habr does not allow the use of HTML markup in articles directly. The reasons for this are a mystery covered in darkness, but perhaps this is due to the unification of the appearance of the articles. If you allow the use of HTML in articles, it will be possible to catch up with such that there will be problems with viewing.



I generally support the idea of ​​banning HTML in articles, but there is a caveat. A resource for IT professionals, where the code is often discussed and there is no way to correctly insert it into the article, is somehow strange.



So, we have at our disposal tags <b>



, <i>



, <font>



. In addition, all this works inside the <code>



, which is needed for formatting. Well, &nbsp



;



They also saved us, which is useful for long lines of code and indentation.



Needless to say, all the standard ways to get HTML code from IDEA do not give such HTML at all, so the conversion work is going to be rather big.



An approach



To begin with, it is worth saying thanks to the author capslocky for his material on this topic. I did not use the tool proposed in the article directly, and it would hardly have happened, but thanks to this material I understood the whole depth of the problem and at the same time felt the wind of hope.



The only minus of this publication is the large amount of code combined with a very meager explanation of what it does and why.



I’ll try to correct the situation and describe what you’ll have to do with your HTML markup if you want to bring it to a form that is ready for insertion into Habr.



  1. Before exporting, you need to set the desired color scheme in IDEA, for example, from the Color Themes website . The code will be exported with the selected scheme. It is better to choose a scheme with a white background (since the background cannot be set on the Habré) and without underlining. I didn’t think of how easy it is to drag them, because I didn’t really want to.
  2. We work only with the internals of the <pre>



    . Even if you are using export not from IDEA, but some other, there will probably be this tag in the HTML markup, since without it it is difficult to correctly format the code. The tag itself is removed, replacing it with <code>



    .
  3. The text will most likely be presented as a <span>



    with different styles. They all have to get rid of. Many coloring services bring styles into a style sheet, which is logical, and use links to style names. Specifically, IDEA does not do this yet, which makes the task a little easier (style settings lie directly in the <span>



    ).
  4. Set the font color through the <font>



    . Unfortunately, the background color cannot be set.
  5. We turn the font-style:italic



    property into a pair of <i>



    </i>



    tags, and font-weight:bold



    into <b>



    </b>



    .
  6. Replace all spaces with &nbsp



    ;



    .
  7. Line feeds in the form of <br>



    replaced by \n



    .
  8. HTML markup in IDEA produces blank lines with styles and spaces from spaces with styles. It’s better to throw out such styles: this will greatly reduce the length and increase the comprehensibility of the code.
  9. Make sure that line feeds do not have any style. Otherwise, there will be problems with empty lines.


The last paragraph is illustrated by an example:



<code>





1



<font color="000000">





</font>



2





</code>







The given code will be turned by "Habr" in 12



. The same applies to the <b>



and <i>



tags, as well as any combination thereof. Line breaks should not have a style, and then everything will be fine.



Implementation



At first, the task of writing a converter for arbitrary HTML code seemed rather complicated to me. However, if you make a decision for a specific version of HTML, then everything is not so bad. I managed to do everything on pure RegExp, that is, even without HTML parsing. The main problem turned out to be the identification of the Habr markup.



To prevent line breaks from having styles, I had to make rather tricky replacements, which are probably the most incomprehensible (see the popupBr function). The idea is that the tags <br>



after each replacement “pop up” from the depth of the formatting tags to the outside. Thus, after all replacements, the tag <br>



is out of formatting.



In addition, it turned out that IDEA puts on the clipboard not only Rich Text, but also rather tricky objects like application/x-java-jvm-local-objectref



. The trouble is that the presence of such objects on the clipboard leads to persistent errors in my console on the topic of constructing DataFlavor. Unfortunately, there is nothing you can do: JDK just works with the clipboard. It was a discovery for me to have such a code . Apparently, the smart uncles who wrote this believe that it will do. In general, do not be afraid of errors that may occur when working with the tool.



The project is written in Kotlin and lives on GitHub .



Suggestions for improvement are welcome! For example, it would be nice to design this tool as a plugin for IDEA. I have not yet found a simple way to do this: the sources of the Copy as HTML plugin, unfortunately, are closed, and it takes too long to figure out how to write such a plugin from scratch.








All Articles