Hello! In this article I would like to talk about two things: what difficulties did I encounter during the work on the Price Archive and what came of it. My story, I will build, maybe not quite usual. I almost will not give answers to your questions. I will only voice those questions and problems that have arisen and arise, which are solved or which are in the process of solution. Inside clean water, you can say.
As they say - all of the earth, first hand. I assume that you will get tired simply from reading what needs to be done.
Honestly, if I were shown a similar list of problems that need to be solved and told me how difficult I would be, I would probably leave this venture and not develop the Price Archive from scratch. But I decided to test myself, or something.
If you are interested in information on prices on AliExpress.com goods, then I ask you immediately to the second part. There I will tell all the most interesting things that I noticed.
Who is this article for?
Probably for more students or people gathering thoughts about the opening of their project. Such an article a year ago wouldn't hurt me.
A year ago?
Yes, it was 11/11/2016 that I decided to do something useful for people. I understand that for some it turned out to be useful, for others it is not very good, others will have their own particular point of view. But the goal was this. And now I still urge to think of just something
useful, otherwise your work is doomed to failure. But at once I will make a reservation - even something useful can easily fail.
Something from this is done, something is not, something I forgot and this is not written here. But I tried to write about everything that happened.
Sorry for the vinaigrette, I insert here, among other things, pieces from my to do sheet, which I have been conducting for some time. In general, as it was in fact, so here and in places inserted.
So, here is a list of cases I have turned out.
1. Find a problem that worries many people.
2. To study the subject area. Find similar services and competitors.
3. Convince yourself that people need a project and it has a chance to take off.
4. Convince yourself that there is enough money, effort and desire to do everything before the onset of self-sufficiency.
5. Make a list of service tools that should be implemented in the future.
6. Consider the design of the site, what functionality it will have. The initial choice was made in favor of the site, not the application.
7. Create a detailed design specification. Order and follow the development. This is the only thing I ordered. Design is very difficult even with good designers and layout designers. It is necessary to check and recheck everything a thousand times, if you want to achieve a decently made design and layout.
8. Centrally refresh in memory or learn some commands in Linux.
Once at the university studied, but it was a long time and not true. Mount-unmount the disk, ntpdate, tune2fs, screen, man, mkfs, df -i, lsof, ps aux, top, du -sh *, date, blkid / dev / sda1, fdisk -l and more like two or three dozen.
9. To study the theory on the speed of writing and reading data on disks. Tests of disks of different manufacturers, models, etc.
10. To study the theory of file systems. ext-family, xfs, reiser, brtfs, zfs, others.
To drive into all this, provided that there is really comprehensive information like that, for my own case - I would not find that. Everything had to be rechecked, and as a result, on some things, Google showed only 3 pages discussing the really important points, which turned out to be cornerstones. Stop on a file system that can easily cope with tens of millions of files and at the same time be very easily scalable. Examine various aspects of file system tuning, such as disabling the directory index and checking the speed of reading and writing, the noatime directive, and a few more.
11. Write scripts for disk tests on the speed of writing, reading, rewriting: everything is consistent and everything is different. Initially, dozens of tests were conducted on conventional disks with all possible block sizes and the number of inodes. It was necessary to develop suitable tests so that they would model all possible situations, including a very strong fragmentation.
12. It turned out that conventional disks by default do not cope in time with a large amount of data. We had to look for a way out. He was in good SSD drives. Yes, they are more expensive, but it turned out without them.
13. Carry out again all the disk tests on the speed of writing, reading, rewriting: everything is consistent and everything is different.
14. To fit the expected size of the file system block, the number of inodes so that the percentage of their use increases synchronously. Otherwise, there will be a bias either in space or in the number of inodes, which will lead to a faster spending of the capacity of the disk in terms of capacity. And this is extra money due to the flawed architecture at the beginning. Not very much.
15. Examine the technology of RAID data virtualization or come up with your own version in case of disk failure.
16. Develop your own scheme for the future regarding the expansion of service N times and data storage during force majeure.
17. Choose a reliable hoster with more or less reasonable technical support. Calculate the size of the required RAM, the number and size of disks, so as not to overpay for a long time. Wait for the arrival of the drive and server settings. The problem was that SSD drives were in short supply. But lucky, not waiting for a long time. Server order with the possibility of increasing, both memory and disk.
18. When receiving a server, perform all possible and necessary tests - whether everything works, everything is configured correctly. Check disks for errors, there was a case even with a new disk. Change ssh port to your own. Configure a large number of simultaneously open connections.
Increase the open file limit for admin and apache users. Check for the latest versions of bash, etc. - it is necessary that they would stand with patches from attacks that are already known, such as heartbleed. apache maxclients - it is necessary to raise from 256 to (I will not say). Put directamin-y simple_disk_usage = 1. Check that both server IP addresses were connected and did not fall off during reboot due to the fact that they were only stored in memory. Check the connection putty. Delete FTP server. Close all unnecessary ports. In iptables, add permissive rules for the ports: * All others are prohibited. Find useful commands for the server and understand their syntax: iptables -I INPUT -s 1.1.1.1 -j DROP, iptables -nvL INPUT.
Install ntpdate. Set UTC time for everything: for the server, for php. hwclock is also UTC. The time zone for both hwclock and php should be changed to Europe / London. Install screen, man, and other necessary things. Remove phpmyadmin.
19. Much time was lost when working with CentOS 6.
Do not install CentOS 6 - it sometimes has problems with a network card if there is a lot of traffic through it - about 15-20 Gb per hour.
20. Find - how to solve problems with scalability, high performance and reliability now. Found Cassandra. Examine Cassandra and its applicability in your situation.
21. Decide how the data will be stored on the basis of the tests carried out and the available information on the databases.
22. Put all the necessary for the security of the site and the CEO headers, remove unnecessary, revealing - what exactly is installed on the server. Vary User-Agent, X-Accel-Version, X-Frame-Options and others.
23. Start developing the site. To decide on what will be developed and why. Develop an architecture that allows you to safely add unaccounted things.
What at the same time to be easy, quickly accessible and extremely understandable in the code, and that where it lies on the server and why it is here.
24. To determine the number of languages into which the site will be translated based on financial possibilities and prices for translations. Find translators for each language or translation agency where they will do their work so that they do not need to be rechecked every time. Here I am faced with different cases. Some people are so peculiar that they could not accept email with the task for translation - they needed a personal visit and an explanation on their fingers. And everything was served in such a way that they do me a favor. I am writing here about social moments, so that I would not dwell on them. And they were. I understood only one thing. If there are no already proven people in any field, then finding those with whom you can work is not quite a quick job.
25. Thoroughly examine the rules of working with AliExpress.com as an affiliate.
26. Explore the AliExpress API. Understand that it is not very informative. Search for a way out. Spend hours of communication with support and all those parties that can help AliExpress make the API better.
27. Collect all the names of categories and subcategories in the AliExpress directory in English. Save with notes which section refers to which section. Write a script for this.
28. Write a script to collect a database for the English language. Based on the data obtained, conduct tests by location, number of blocks occupied, inodes.
Estimate - for how long is enough disk. What to do then when the disk is full?
29. Develop a proper data storage structure for quick writing, searching, reading and deleting. Time to full return of the page should not be more than 0.8 seconds with tens of millions of records on the disk.
30. Monitor the missing items on AliExpress. If the product disappears from the sale - is it for a while or forever? There were many more features to deal with.
31. Write scripts daily analysis of all products. When developing, consider all possible and necessary statistical data.
32. Collect the names of categories and subcategories for the Russian language + for everyone else, except English. It is already assembled.
33. Write a script to collect the names of goods in all languages and collect with it the necessary goods from the base formed during the first pass. Take into account the limits on the number of requests to the API.
34. Write the site code. Make all the features, including product search, category display, subscriptions, personal account, blog, registration, tracking, email alerts, etc. etc.
35. Make a list of all the phrases and words that are used for the site in Russian.
36. Contact translators and give them lists of phrases and words to translate.
37. Process the received lists in different languages and configure the site code to display the corresponding language.
38. It turned out that you need to write one script for the initial collection. But it is necessary to write another script for all subsequent fees, bypasses the base, because the first script will run slower, but faster.
39. All this time there was a connection with the designers. Modify the resulting design, because it is faster than asking to correct that very moment.
In this case, better understand the css.
40. Write a script to collect user browsers. On the basis of them, to watch the layout - correctly or not it works under all popular browsers. Conduct testing on all possible browsers. To do this, you can just go to the salon apple or samsung, they have a lot of devices with different diagonals there. Test - I do not want + to test through the service that takes screenshots of screens for dozens of operating systems. Localize css files for each language. It turned out to be the best option, provided that in different languages the same phrases take place on the screen in different ways.
41. From html pages of the modified design to make templates with macros. Consider the syntax of macros, because two approaches will be used to form the page. The second approach is related to internal macros in the localization of language files.
42. Set up the domain at the registrar and on the server + all subdomains.
43. Deal with charts. How are formed, how should change. Select suitable graphics and customize them.
44. Constantly when reading news, articles, etc. think about the possibility of creating similar articles on the site, so that various media sources refer to the site. This is an important point, but it lacks strength at all.
45. Develop a template for analyzing statistical indicators. Translate it into all languages, giving it to translators.
46. Develop a structure for the analysis, storage of data analysis, storage of articles, etc.
47. Write scripts to collect the names of goods in the desired languages.
48. Put restrictions, for example, on the number of products that the user can track.
49. It does not quite work correctly on phones tipsy - then figure out why and fix it.
50. Understand and set up SPF, DKIM, DMARK records. I do not know why, but this moment was very difficult. Maybe due to the fact that I did not find an intelligent guide for the first time seeing this person.
51. To think over the submission of not charts, but tables in some cases.
52. Examine highcharts graphics.
53. Understand certificates and set up an https connection.
54. Understand and configure htaccess.
55. Understand and configure ptr records.
56. Make literally a couple of hundred improvements in the functionality, design and operation of the site. I now have 80 more points just to improve what is, stored in a file.
And this list is updated every day with the wishes of users and their own thoughts.
57. Work on the SEO topic. Configure all the necessary headers on all pages: canonical, dns-prefetch, preconnect, og: *,
product: *, twitter: *, alternate, and so on. So far, not everything is set up, a couple of important ones remain.
58. Make updated maps of the site and maps of cards every day.
59. Post a few articles on the forums, thus tell about the service. Then I will say "thank you" to some forum owners. No, of course your business and your rules on your portals. But due to the fact that my good articles were deleted - the young site without a reference was pessimized by Google. And it does not budge. In the tops of hundreds of sites, doorways, scrap - everything blooms and smells in the first positions. But the Price Archive of the shit, because the links to it were deleted. I want to say here a couple of kind words to Yandex. He also does not see the links too much, but at least he leads some users to the site. Of course, they have little effect on payback in principle, because there are very few of them. I'm talking now about the search engine algorithm. Yandex gives a chance, at least a ghostly to young projects, and Google - a little bit of a beginner's bonus and let's goodbye. But on the scrap and doorways - yes, everything is in the top. The guys from the search engines, there is a project, it is useful to people. I don’t want to promote it, I want to work on improving it. And it is necessary to grab one for one, then for another. I understand quite well in the CEO and could, with a certain risk, try to raise him to the top with not very white methods. But should it be so? Ok, as it is. And, no, I will add more. I didn’t believe in it, but here in the CIS the owners of the forums are ready to strangle themselves for a detailed story even about a non-competing project on their resource. Everything is cleaned up quickly. They want money for advertising.
The case, as he said, is yours of course. But immediately striking is the difference in the approach of the CIS owners and foreign ones, no matter what. You can come to them and without posts and karma to tell about yourself and almost everyone perceives it as normal. What to do? Be kinder and not greedy.
60. Fight with procrastination and burnout, provided that there is enough business and offline.
61. Send out 100 letters to news sites and 10 letters to top bloggers. With the proposal to talk about the service for free and with the proposal to talk about the service with post payment in the form of a fixed percentage. Do you know what the answer was? No Well, not at all.
Pieces of the 5 largest sites responded with an offer of regular advertising. Thank them for that. No wonder they are large. The rest were silent.
I wanted to know for myself what would happen - I found out. And now I can share with you. If you do not have money for advertising - writing to someone is an empty business. It is useless for many more to write, even if there is money. I don't know why, it's not my business - what is on their mind. Just the fact itself is such. In general, a different approach is needed here. To write to a contact email, as practice shows, is useless. Perhaps because the project is unknown.
62. Create project pages in social networks. Design them and lead.
63. The mistake that many make, and which I made, did not calculate the amount of money available for living, developing the site and promoting it. At the promotion of money is completely gone. But promotion is perhaps the more important part,
than all the work done. The financial moment must be very well weighed.
Some items are not work for one week. The project had to deal with several people, if in the good. Each has its own business. But it was interesting for me to do everything myself, from beginning to end, to plunge into the work of everyone. There is still a lot to be done to bring the project to a normal state according to my ideas. Self-sufficiency is not there - it means the story does not end there. Something like that.
Ps. Of course, I do not pretend to any good opinion about everything written or about me. I did not even write all this for this. I understand well that opinions are different and very different. For me, the purpose of this article is to tell how it is about the work done, to tell people what you can buy and at the same time know exactly what you are buying at the lowest price, and not at an inflated price today. Price Archive provides free tracking of prices and notification of price reduction via email.
And finally, my position - I need to share information with each other - I hope someone will find it useful, maybe as some kind of initial draft.
Now the second part.
What interesting things can I say about the sale and in general about the goods on AliExpress.com.
Every day the Price Archive collects data on about 12 million products. At the moment, the number of goods about which there is information - more than 37 million. About 5% of the most popular products can not get information. Because it is not yet on the site,
but in this issue there is already progress.
We look here. We leave active only two circles - “Cheaper” and “Grew up in price”. On November 1, 2.2 million products out of nearly 12 million analyzed increased in price. November 2, about the same fell. But on November 4, more than 4 million of the nearly 12 million products analyzed increased in price. And so far they have not become cheaper so massively. Hence, the answer to the question of interest is that commodities rose in price en masse precisely on November 4, one week before the sale, and not just before the sale on the 10th. So, if you buy before the sale, in many cases it should have been done before November 4 and not the first of November.
There is such an interesting page.. Product information is collected every day on AliExpress.com. This page presents products that were sold yesterday without a discount, and today they are sold at a discount of 5 to 99%. Also here are the goods with discounts, information about which we received today for the first time. The page copes with its purpose, but is in the first stage of development, if I may say so. Later, some filters and functionality will be added, which will make this page a good tool for finding products with big discounts.
Why bother watching these price charts? There can be nothing better than examples. Look at the charts and how much the prices change. One , two , three. Prices change very often and in very many products. The reasons may be different: the time of the year, the proximity of sales, competition, etc. If you do not want to overpay, then just look at the price change chart and draw conclusions: when it is better to buy - today or should you wait. According to the charts everything immediately becomes clear.
I would like to tell about one more function of the site. On each product page there is a form for notifications. Enter your email and the desired price, based on the information on the chart. When the price drops to the specified by you, we will send you an email notification about this to your mail.
You do not need to watch cheaper or more expensive every day or save data to an Exelev file, as some people do. Everything is easy and convenient to look in your office, which can be accessed after registration.
Thanks for attention!