Fast JS Pagination for long texts

Refresh

February 2019

Views

1.8k time

1

I'm trying to create a pagination system with JavaScript.

Basic situation: I have a database, which holds fairly long texts (story chapters, 5000 words+). I want to display these chapters on a website...however not the entire text at once, because that would pretty much kill the readability, but in pages. I have no problem displaying the text, but rather with getting the pages right.

I've been looking around, and came across a JQuery code, which does about what I want it to do...however there's a major caveat for this method. It takes about 10 seconds to finish paginating the text, which is far too long a wait.

What the code basically does: It splits the text into words (separated by spaces). It then tries adding one word after the other to a innerHTML, checking back if the text is now bigger than the container it's supposed to fit in.

Each time it breaks the boundary, it reverts back to the previous string and creates a new page. (By encapsulating the text into a span, which can then be hidden/shown at a moments notice) This works, however it is too slow, because it has to run these checks 5000+ times.

I have tried creating an approximation system, which basically takes the amount of words, divides it by the factor 0.5, checks if the buffer is larger than the required size, and repeats this process, until the buffer is 'smaller' than the required size for the first time, and from that position on, it fills the buffer, until it's full.

However it just doesn't seem to work right (double words, lines, which aren't completely full, and it's still too slow.)

This is the code I'm currently using, I'd be grateful for any fixes and suggestions how to make it easier, and especially: Faster. Oh and: No, paging it serverside is not an option, since it's supposed to fit into variable browser formats...in a fullscreen browser at 1280x768 resolution it will be less pages, than in a small browser at a 1024x768 resolution.

function CreateChild(contentBox, Len, pageText, words) {
    var Child = document.createElement("span");
Child.innerHTML = pageText;
contentBox.appendChild(Child);
if(Len == 0) ++Len;
words.splice(0, Len);
    return words.length;
}

$(document).ready(function(){  
    var src = document.getElementById('Source');
    var contentBox = document.getElementById('content');
var inner = document.getElementById('inner');
    //get the text as an array of word-like things
    var words = src.innerHTML.replace(/ +/g, " ").split(' '), wCount = words.length;

    //start off with no page text
    var pageText = null, cHeight = contentBox.offsetHeight;

    while(words.length > 0) {
        var Found = false;
        pageText = words[0];    //Prevents constant checking for empty
        wCount *= 0.5;      //Searches, until the words fit in.
        for(var i = 1; i < wCount; ++i) pageText += ' ' + words[i];
        inner.innerHTML = pageText;
        Distance = inner.offsetHeight - cHeight;
        if(Distance < 40) {         //Less than two lines
            wCount = Math.floor(wCount);
            if(Distance < 0) {      //Already shorter than required. Fill.
                for(var i = wCount; i < words.length; ++i) {
                    //add the next word to the pageText
                    var betterPageText = pageText + ' ' + words[i];
                    inner.innerHTML = betterPageText;
                    //Checks, whether the new words makes the buffer too big.
                    if(inner.offsetHeight > cHeight) {
                        wCount = CreateChild(contentBox, i, pageText, words);
                        Found = true;
                        break;
                    } else {
                        //this longer text still fits
                        pageText = betterPageText;             
        }
        }
    } else {
        for(var i = wCount; i >= 0; --i) {
            //Removes the last word from the text
        var betterPageText = pageText.slice(0, pageText.length - words[i].length - 1);
        inner.innerHTML = betterPageText;

        //Is the text now short enough?
        if(inner.offsetHeight <= cHeight) {
            wCount = CreateChild(contentBox, i, pageText, words);
            Found = true;
            break;
        } else {
            pageText = betterPageText;             
        }
        }   
    }
    if(!Found) CreateChild(contentBox, i, pageText, words);
    }
}

//Creates the final block with the remaining text.  
Child = document.createElement("span");
Child.innerHTML = pageText;
contentBox.appendChild(Child);

//Removes the source and the temporary buffer, only the result remains.     
contentBox.removeChild(inner);
src.parentNode.removeChild(src);

    //The rest is the actual pagination code, but not the issue
});

3 answers

0

Having not searched in advance, I worked out an alternative solution with getClientRects (https://developer.mozilla.org/en-US/docs/Web/API/Element/getClientRects). If someone's interested in the details, I'll post more.

3

I managed to solve my problem, also thanks to Rich's suggestion. What I'm doing: First off, I'm getting the text from the 'Source' (alternatively, I could write the entire text straight into the JS, the effect is the same).

Next I'm getting references to my target any my temporary buffer, the temporary buffer is located inside the target buffer, so it will retain the width information.

After that, I split the entire text into words (standard RegEx, after replacing multiple spaces with a single one). After this, I create some variables, which are meant to buffer function results, so the function calls won't have to be repeated unnecessarily.

Now the main difference: I take chunks of 20 words, checking whether the current chunk exceeds the boundary (again, buffering the results in variables, so they don't get called multiple times, function calls equal valuable microseconds).

Once the boundary is crossed (or the total number of characters is reached), the loop is stopped, and (assuming the boundary caused the 'stop'), the text is shortened by one word per run, until the text fits in again.

Finally, the new text gets added to a new span-element, which is added to the content box (but made invisible, I'll explain why in a bit), the words I just 'used' get removed from the word array and the wCount variable gets decremented by the number of words.

Rinse and repeat, until all pages are rendered. You can exchange the '20' with any other value, the script will work with any arbitrary number, however please remember, that a too low number will cause a lot of runs in the 'adding segment', and a too big number will cause a lot of runs in the 'backtracking segment'.

As for the invisible: If the span is left visible, sooner or later it WILL cause scrollbars to appear, effectively narrowing the width of the browser window. In turn, this will allow less words to fit in, and all following pages will be distorted (because they will be matched to the window with scrollbars, while the 'paged result' will not have scrollbars).

Below is the code I used, I hope it will help someone in the future.

var src = document.getElementById('Source');
var contentBox = document.getElementById('content');
var inner = document.getElementById('inner');
//get the text as an array of word-like things
var words = src.innerHTML.replace(/ +/g, " ").split(' ');

//start off with no page text
var cHeight = contentBox.offsetHeight, wCount = words.length;

while(wCount > 0) {
    var Len = 1, Overflow = false;
    var pageText = words[0];                        //Prevents the continued check on 'is pageText set'.
    while(!Overflow && Len < wCount) {              //Adds to the text, until the boundary is breached.
        //20 words per run, but never more than the total amount of words.
        for(var j = 0; j < 20 && Len < wCount; ++Len, ++j) pageText += ' ' + words[Len];
        inner.innerHTML = pageText;
        Overflow = (inner.offsetHeight > cHeight);  //Determines, whether the boundary has been crossed.
    }
    if(Overflow) {                                  //Will only be executed, if the boundary has been broken.
        for(--Len; Len >= 0; --Len) {               //Removes the last word of the text, until it fits again.
            var pageText = pageText.slice(0, -(words[Len].length + 1)); //Shortens the text in question.
            inner.innerHTML = pageText;

            //Checks, whether the text still is too long.
            if(inner.offsetHeight <= cHeight) break;//Breaks the loop
        }
    }
    var Child = document.createElement("span");
    Child.style.display = "none";                   //Prevents the sidebars from showing (and distorting the following pages)
            Child.innerHTML = pageText;
            contentBox.appendChild(Child);
            words.splice(0, Len);
            wCount -= Len;
}   
1

Create an absolutely-positioned container that is the width of a single page. Give it height of 'auto'. Position the container somewhere off screen, like left: -10000px so users can't see it. Split the original text into 20-word chunks. (Look up the regex that accomplishes this.) Append one chunk at a time to the string in the container until the height of the container reaches the max height of a single page. Once it reaches the max height, the string in the container is basically one page of text. Push the string in the container onto an array called 'pages'. Empty the container and start creating page 2 by appending the 20-word chunks again, continuing to iterate through the array from where you left off on the previous page. Continue this process until you reach the end of the 20-word array, pushing each new page onto the array of pages whenever the container's string reaches the max height. You should now have an array of pages, each item of which contains the text of each page.